In a world of increasingly punitive regulations like GDPR, the combination of unstructured data and human error represents one of the greatest risks an organization faces. Understanding the differences between unstructured and structured data – and the different approaches needed to secure it – is critical to achieve compliance with the many data privacy regulations that businesses in the U.S. now face.
Structured data is comprised of individual elements of information organized to be accessible, repeatable, predictable, and governed and secured by machines in a highly automated manner. A database containing identity information — name, address, Social Security number — is an example of structured data.
Unstructured data is free-range data – living outside confines of a database. This is represented by the day-to-day business communications, operational files, spreadsheets, videos, PDFs, Word docs, emails and the hundreds of other applications present on our laptops, phones and other devices.
Gartner now estimates that close to 80 percent of all data in the enterprise is unstructured. In a world where more and more stringent data privacy regulations like the NYDFS cybersecurity regulation, the new California citizen privacy regulation AB 375, and the GDPR, are being enacted, it is critical that organizations minimize this potential for risk to prevent data breaches that now come with hefty financial and reputational costs.
The human challenges of unstructured data
Unstructured data is the greater risk primarily because this information is handled by humans as opposed to purely machine-based processes. Adding humans to the equation creates a host of potential risks due to the way we share, hoard, store and propagate information. Additionally, structured data can often be easily exported by users and IT administrators, and end up in an unstructured format.
This is why new and innovative approaches are needed to effectively handle the risks of unstructured data. Too often, enterprises rely on strategies that are transmuted from structured data security protocols and either forget to deal with the risk of human error or don’t actually know how to in the first place.
Typically, the tools applied in this method are clunky, cumbersome and difficult to use for a non-technical user. If the user is not empowered with simple ways to secure their data, they are more likely to expose information to potential risks without even being aware they’re doing so.
Another challenge is posed by workarounds people might use in business operations. For example, employees using a cloud file sharing system might accomplish the tasks they need to do as part of their job, while at the same time exposing the business to untold risks and compromise because they don’t understand the security protocols of the applications they use.
These system risks are compounded by the challenge posed by human error. Common automation of tools built into email applications such as Outlook and Gmail help people communicate freely and easily. However, the autocomplete function that enters addresses as you type can also lead to embarrassing mistakes and, too often, errors that lead to data compromises and breaches.
These are common problems that every organization faces and struggles with, but there are best practices and new technologies that can help minimize the threat of unstructured data.
Start with data detection
The first step is to identify the data at risk. This was one of the biggest issues companies had leading up to the deadline for GDPR compliance – understanding everywhere sensitive data is being used and stored. This is crucial to complying with regulations and securing data – particularly when a single organization may store and process data that is subject to multiple regulations.
New technologies can automate the detection and classification process of unstructured data– sifting through the vast quantities of emails, files and folders that users create to map where sensitive data lives. This classification process should drive policies for who in an organization can access and share this information.
Organizations can also add metadata tags to documents to ‘fingerprint’ sensitive information and follow it wherever it goes. This provides an understanding of the magnitude of the risk an organization faces as data travels from user to user and directs the policies for how the data should be secured to comply with all required regulations.
As part of the discovery and classification process, organizations can enforce automated encryption on any information that is deemed sensitive. If the data is not secure, then every other step to achieving security and compliance is at risk.
Encryption has been around for a long time – but typically falls under the ‘hard to use’ category of technologies that non-technical users avoid. Enforcing the use of encryption starts with ensuring that it’s embedded within the user workflow and doesn’t represent another step, application or process they need to add on. It needs to be seamless with the way employees currently work with and share information.
Encrypting data is a big step to ensuring that a lost device or accidental email does not put your organization at financial risk.
Predicting and stopping human error
We’re all going to make mistakes – whether it’s accidentally uploading the wrong file, sharing permissions with people who are not approved to review information or simply sending an email to the wrong person. Stopping unforced errors is one of the hardest parts of security.
Fortunately, one area we’re seeing great advancement is the application of AI to predict user error before it happens. For example, much like Outlook predicts and auto-inserts email addresses, AI can understand the email patterns and behaviors users exhibit to prevent the wrong email address from being inserted – or the user sharing information with someone they typically don’t communicate with.
It can identify anomalous downloads and access, and combined with rights management, can stop employees from sharing sensitive files with cloud applications, eliminate the ‘copy and paste’ practice for sensitive data, and other ways that we accidentally leak our own data.
Whether they realize it yet or not, organizations are at a tipping point. Volumes of unstructured data are only going to increase, and inevitably so too will the risk of accidental loss. New laws like the NYDFS Cybersecurity Regulation, California AB 375 and GDPR have changed the game for compliance, and organizations need to start protecting unstructured data by default rather than as an after-thought.
The right way to do this is to look at the users creating, storing and interacting with this data, understanding the different levels of sensitivity, and making sure the right level of security and control is applied.
Technologies need to be adopted that empower users to work securely, enabling privacy as a natural part of business that builds customer trust and is seen as a critical to the way work is carried out. Otherwise, organizations will leave themselves and their customers exposed to the ever-increasing risk of a data breach, which now comes with an even higher price tag attached.