In this interview, Mark Cusack, Chief Architect at RainStor, talks about the main challenges of handling petabyte-scale volumes of data, illustrates the most obvious mistakes that companies make with their Big Data projects and offers advice to organizations about to welcome Big Data into their cloud storage environments.
Companies tend to underestimate the importance of Big Data security. What advice would you give to organizations about to welcome Big Data into their cloud storage environments?
Good security practices apply regardless of data volume, velocity and variety. The most important consideration is to weigh up the pros of retaining the data, with the cons in terms of the financial and reputational damage your business could suffer in the event of any breach. In the past, some cloud providers haven’t fully appreciated that a low probability or low impact security breach can still lead to a disastrous outcome. In short, do you really need to keep the data?
In some cases, keeping data may not only be useful in generating business value, it may also be mandatory from a regulatory standpoint. For example, SEC17a-4 demands that certain financial records should be retained for up to six years. Other regulations such as PCI-DSS and HIPAA have stringent rules around how personal, sensitive information should be stored and accessed. So it’s important to consider what regulations apply to your data and how your approach to security meets the compliance requirements.
In summary, the advice I would give is to first, establish what data you need to store and why you need to store it. Second, determine what compliance regulations is the data subject to, and how procedures and systems need to change to meet the requirements. I can’t recommend highly enough calling in third-party experts, who understand the security and governance regulations in your domain, to review the approach taken. When retaining some types of data, a third-party assessment is mandatory.
What are the main challenges of handling petabyte-scale volumes of data?
Big Data presents some unique challenges when it comes to ensuring that the data supply chain remains secure from producer to consumer. The security framework must be able to operate effectively when faced with the three Vs of Big Data: volume, velocity and variety. For Big Data subject to the sorts of regulations discussed earlier, the security challenges can be daunting. The problem can boil down to securing access to a single record buried amongst trillions and trillions of others.
Unfortunately, many of the tools and technologies for managing data at scale do not provide the fine-grained security needed to protect sensitive data. Part of the problem is that Big Data platforms, such as Hadoop, have treated enterprise security as an afterthought. Hadoop grew out of web businesses for which perimeter security and simple file-based access permissions were good enough. In the face of tough data retention regulations that can apply at the individual record level, these approaches are insufficient. Right now, core Hadoop alone does not provide a full security and governance solution, and those wishing to secure large volumes of data on Hadoop must look to third parties to achieve this.
In general terms, what’s needed is an approach to Big Data management that secures the data from top-to-bottom; an approach that is designed specifically to protect the data that it is managing. The solution should “own” the data in the sense that it is fully responsible for governing access to the data and does not delegate parts of that task to other services that may not have been designed for the task. At the same time, the chosen data management solution must integrate with the rest of the established security infrastructure within the enterprise. In the case of user account management and authentication, for example, then the solution must be able to integrate with LDAP and Active Directory services.
What are some of the most obvious mistakes that companies make with their Big Data projects?
Many companies make the mistake of trying to build a secure big data solution themselves from open source software projects. For example, it is tempting to take projects from the Hadoop ecosystem and integrate them together to provide a data management system. After all, the software is free, right? The harsh truth is that such approaches are free if your time isn’t valuable. There are so many moving parts to a Hadoop system that it makes securing data from end-to-end extremely difficult.
In the long run, the most cost effective way of securing Big Data is to select a data management solution that is designed from the ground up to protect data at scale. It takes a long time to develop a robust, secure data management system, and it is far better to choose a proven solution that is tailored to meet the security and compliance requirements of your business, on your storage platform of choice, rather than attempting to build one yourself.