Big Data security and privacy challenges

Data from over 200 Pen Tests Shows Most Common Vulnerabilities. Learn more now.

In this interview, Leighton Johnson, CTO, Senior Security Engineer for Information Security and Forensics Management Team (ISFMT), talks about how Big Data is transforming the way organizations deal with information security threats, offers tips for those interested in taking advantage of Big Data, and much more.

He’ll be speaking about targeted cyber attacks at ISACA’s North America CACS conference.

How is Big Data transforming the way organizations deal with information security threats?
Many organizations are converting their SIEM efforts for monitoring and governance into Big Data efforts for both tracking threats and malicious activities on their networks and to hunting for APTs running in their environment. We have found larger organizations (larger than 5,000 users) have already moved over their continuous monitoring of systems and networks to include assembling the data collected into analysis arenas (cubes, etc.) to account for the extreme proliferation of new and variable threats to their data security.

These efforts are including both event reporting coming from their own systems and direct “real-time” data feeds from external resources in their analysis efforts to watch the perimeter and the external inputs to their systems.

What are the hardware and software pre-requisites for getting the most out of Big Data for IT security purposes?
The extent of modification to the hardware and software needed to perform the Security Big Data efforts has been relatively straight-forward. Adding new database components, the No-SQL or HADOOP components has been budgeted, installed along with the corporate Big Data projects, and not as difficult as would be expected for those organizations that already embrace the Big Data efforts for other parts of their enterprise. However, those organizations that are implementing Big Data efforts for security as their first fore’ into this realm are finding great difficulties in scope, size, and analysis efforts.

Let’s say you’re working with a network that gathers massive amounts of data on a daily basis. What structural challenges do you have to tackle in order to gather valuable information?
The absolute first need is to ensure the Data SAN can handle for full scope of the Volume and the Velocity of the data getting loaded during the collection efforts. Then comes the analysis engine lying on top of the data. Can it see all of the data during the queries? Are there any missing datasets due to structure issues?

Therefore, we have seen basic indexing of databases being either modified or removed and new “No-SQL” type query and analysis engines being designed and implemented for use. The “MapReduce” and other components have to be evaluated and structured closely to the business focus in order to get the right answers to the multitude and varied types of questions being asked of these systems.

What type of training would you suggest for anyone interested in taking advantage of Big Data?
The biggest area of training is teaching the analysts how to ask the right question for the data at hand. They have many areas they want answers in, but often do not understand how to get the answer they are looking for or need from the assembled data at hand.

The IT support staff needs to understand the data types and database structures in great detail to assist in the analyst efforts, or there is often great frustration on the part of the analyst and business unit personnel that they are not getting what they need from the Big Data project and then blame the IT staff for that problem.

What’s your take on the privacy implications of Big Data?
The aggregation issue that surrounds putting disparate data together into a Big Data system, therefore, creating relationships among data sets that did not previously exist, is now a major governance and privacy concern for larger organizations. I have seen several organizations, especially within the government and health care industries, needing to step back and address these issues before full implementation. They are finding their data, once assembled into the Big Data components, is now being reclassified at a higher security level due to the legal and external requirements of governmental agencies and liability concerns. This is changing the data querying efforts and constructs allowed for uses and analysis of the data.