When it comes to security, Big Data can be the cause of many obstacles. As Big Data often contains enormous quantities of personally identifiable information, privacy becomes a very real and primary concern.
The consequences of a security breach affecting Big Data can be more devastating than the consequences of other breaches as it will affect a significantly larger group of people. As a result, not only will the damage be reputational, but there will also be significant legal ramifications that an organization then has to deal with.
Therefore, organizations must be certain that they have the appropriate balance between privacy and the use of the data.
Anonymity, encryption, information ownership, access control mechanisms, and software are several of the most common factors that cause issues with Big Data security.
Prior to storage, the data should be sufficiently anonymized so that any unique identifier to a user is entirely removed and privacy concerns are confronted. Even so, this doesn’t guarantee that the data will stay anonymous and thus in itself can also present a security issue. Any and all sensitive information should be removed from the set of records obtained.
A further problem that can come about is that of encryption. Many organizations use cloud services, but if the cloud needs to perform operations over the data, then it cannot be sent encrypted. A way of sidestepping this issue is to use Fully Homomorphic Encryption – this allows the data stored in the cloud to perform operations over the encrypted data so that new data is created. When the data is decrypted the results will then be the same as if the operations were carried out over plain text. Satisfactory encryption is the main solution to security concerns.
Another key part of protecting the data is access control mechanisms. Typically, access control has been provided by applications or operating systems that can restrict the access to the information and usually exposes all the information if the application or system is breached. Rather than the typical means of protecting information, a better approach is to use encryption that only allows decryption authorized by an access control policy.
Software, like Hadoop for example that is commonly used to store Big Data doesn’t necessarily come with user authentication by default. This can be problematic as it leaves the information open to unauthenticated persons. Instead, traditional firewalls at the application layer are relied upon to restrict access. By implementing stronger access control and authentication policies, companies can help overcome this potential weak point.
Big Data isn’t entirely a security hazard, however; there are steps that can be taken to manage Big Data from an adequate security standpoint.
Real-time monitoring is also an integral component for keeping the information in a Big Data project secure. It is important for organizations to monitor access so they can make sure all access is only made by authorized users. Threat intelligence can also be beneficial to detect more sophisticated attacks and enables organizations to react to these threats appropriately.
Ownership of information can also become an issue if a trust boundary between the data owners and the data storage owners isn’t ascertained beforehand. When collecting data, organizations should run a risk assessment. Consideration must go into whether they are collection user information that needs to be kept private and establish the appropriate policies accordingly, so that data remains protected as well as their clients right to privacy.
If the data is to be shared with other organizations, then it should be considered how this is done. Data that has been purposely released and actually infringes on privacy, the consequences will have a big impact on an organization’s reputation as well as economic standing.
Identifying sensitive pieces of information stored within the unstructured Big Data set is a challenge that has to be overcome. Organizations need to ensure they isolate any sensitive information and prove that they have the necessary processes in place to accomplish this.
Governance frameworks when applied to handling Big Data can prevent the data collected from being misleading and also prevent unexpected costs. A big problem is that as of yet, nobody has created procedures and policies, from a governance point of view, for handling Big Data.
As a relatively new concept, there isn’t a set list of best practices that are widely recognized by the security community. There are, however, general recommendations that can definitely be applied to the storage of Big Data:
- Vet your cloud providers: If you’re storing your Big Data in the cloud, you must be certain that your provider has sufficient protection mechanisms in place. Make sure that the provider carries out periodic security audits and agree penalties in case those adequate security standards are not met.
- Create an adequate access control policy that only allows access to authorized users.
- Protect the data: Both the raw data and the outcome from analytics should be effectively protected. Encryption should be used accordingly to ensure no sensitive data is leaked.
- Protect communications: Data in transit should be sufficiently protected to maintain its confidentiality and integrity.
- Use real-time security monitoring: Monitor the access to the data. Threat intelligence should be used to prevent unauthorized access.
It must always be remembered that security is a process, not a product. Therefore with a combination of the proper policies and adequate processes in place, it is entirely possible for organizations to effectively handle and protect Big Data from security breaches.