How security analytics help identify and manage breaches

In this interview, Steve Dodson, CTO at Prelert, illustrates the importance of security analytics in today’s complex security architectures, talks about the most significant challenges involved in getting usable information from massive data sets, and much more.

How important are security analytics in today’s complex security architectures? What are the benefits?
It has become a near ‘mission impossible’ to totally prevent breaches because of the increasingly large and complex environment security professionals are tasked with protecting. We’re even to the point where many organizations already assume they have been successfully breached by advanced persistent attacks, and in this difficult state of affairs, security analytics are extremely important to help us learn everything we can about our environments and the threats they face.

Analytics can help identify and manage breaches in a timely manner to significantly reduce the ultimate cost that malicious activity will have on a business or other organization.

For nearly a decade, IBM has been tracking the costs of a data breach, and its most recent report found that the cost per stolen or lost record and the average total cost of a breach are both on the rise. In addition, the report found that fewer customers are remaining loyal after a breach.

The importance of security analytics are directly proportional to how much a breach will cost an organization, and in the current environment, they’re becoming essential. Amid the perpetual race of hackers looking to break through a perimeter versus security professionals moving to patch the newfound vulnerabilities – and the cycle beginning over again – security analytics have become invaluable.

What are the most significant challenges involved in getting usable information from massive data sets?
Often the fingerprints of a successful breach are only visible in massive sets of machine data being generated by web proxies or network flow collectors. However, getting usable and actionable information from these data sets has significant challenges. First and foremost, the tools and techniques used to collect, store and search this data must scale to the size of the data. This may seem fairly obvious, but again, because the size and complexity of the average environment is getting so big, it bears repeating.

When the data in question comes from sources such as web proxy servers, the fact that almost all the data within these massive data sets relates to non-malicious, standard business activity is another significant challenge to consider. Differentiating malicious activity from non-malicious activity is extremely difficult as there may only be a small handful of malicious activities each day that are hidden in the billions of interactions that take place every minute.

Traditional methods of extracting usable information from this data involves searching for known signatures of an attack. Unfortunately, advanced hackers and criminal enterprises know enough to modify the threat signature so as to avoid detection. In the end, however, the attack is going to generate outlier behaviors, so a complementary approach to signature and rule based-intrusion detection is analyzing internal and outgoing traffic for statistically unusual behavior.

However, the level of statistical analysis required far exceeds the capabilities of even the more advanced security architects or analysts. For instance, there are generally statistically unusual interactions happening all the time in a typical organization. Trying to scan for unusual websites visited by employees of a large enterprise can generate thousands of false alerts a day.

As organizations scale in size, more advanced analyses of interactions across multiple dimensions are required. As an example, the fact that an employee visits a new website only becomes a valid concern if the interaction also involves an unusual protocol for that user and while that user is usually a consumer of data, they are now sending substantial volumes of data in small bursts.

Statistically, modelling data for unusual patterns across multiple dimensions – and doing it accurately – is a complex task even for small data sets, let alone massive data sets. Appropriate modelling techniques and computationally stable and scalable implementations are beyond the scope of simple tools and analyses. Finally, the analysis needs to be executed in real-time, which places additional constraints on the system because it has to be online during the process.

Can detecting unknown attack profiles keep an organization ahead of the bad guys?
Statistical techniques are the only approach that can identify unknown attacks, and even when applied properly will still require a certain amount of human intervention. Security teams can definitely react a lot faster if they are immediately aware of previously unknown threats, so staying ahead of the bad guys really comes down to two things: the speed of a real-time analysis solution and the reaction time of the security team. In the end, this requires that both the right technology and organizational processes are in place.

How do you expect security technologies to evolve as the amount of data increases?
As more and more data and data sets become available, the challenge of gaining actionable insight becomes more and more complex. For example, in a smaller office with a couple hundred employees, identifying a user exfiltrating data to an unusual website can be achieved by simple reporting. However, the same report within a large enterprise that employs thousands or tens-of-thousands of people may contain 500 unusual events an hour, which becomes too large to effectively triage and analyze.

As the data increases, effective, accurate and scalable statistical analyses become more and more important as simple reports and rules generate too much information to triage and action. Since humans are unable to effectively process this volume of information, the only way we’ll be able to do it is by relying on machine learning.

While humans become less effective as data sets get bigger, machines actually become more effective, as they have more data to analyze and learn what normal behavior looks like. As a result, they’ll become even better at flagging the anomalies. There’s no doubt that machine learning will become a much larger part of an effective security strategy as the amount of data increases.