So what is machine learning? Machine learning in an integral part of the “umbrella term” artificial intelligence. Put simply, it is the science of enabling computers to learn and take action without being explicitly programmed. This is achieved through complex algorithmic models applied to data. From this are derived data-driven predictions or decisions.
What has this to do with information security? Currently, not that much. But this is set to change.
With the plethora of security solutions coupled with our ever growing networks pumping out larger reams of event data, we have reached the point at which the human brain is simply overwhelmed in trying to parse the information.
Many readers at this point will be rolling their eyes awaiting my inevitable story about artificial intelligence and machine learning being the solution to all of our security woes. I should know. I too have been on the receiving end of a sales pitch proclaiming their artificial intelligence solution as the magic elixir, heralding the demise of the human security analyst.
But let’s move away from the hubris and marketing spin for a moment.
Machine learning has been leveraged for some time within the financial sector utilising algorithmic models to identify potential fraud. Credit card fraud is an ideal example in which machine learning shines. Credit card fraud represents a tiny percentage of all credit card use; however, this allows for large datasets of “normal” behaviour against which risk engines can deduce anomalies (Outliers). Although statistics are hard to come by as they’re a closely guarded secret within the industry, it is estimated machine learning fraud detection systems could save card issuers and banks up to $12bn annually. IBM researchers working with a large US bank claimed a 15% increase in fraud detection with a 50% reduction in false alarms and a total savings increase of 60%.
Can this be applied to information security? If we consider the problem of the “insider” threat, research does show promise. Behavior-Based Access Control (BBAC) models user behaviour and look for “untrustworthy” activity based on anomalies.
Much of this “anomaly detection” is based upon “unsupervised” machine learning and although research is encouraging many problems exist, the most prevalent of which are false-positive alerts.
Real-world enterprise networks are complex environments, add to the mix humans and it turns out baselining “normal” activity is not as easy as it might first appear. Also, not every anomalous event is necessarily malicious. Added to this, it is of course possible for a malicious actor to either operate “low and slow” and “live off the land” within the boundaries of “normal” activity, or alternatively trick the system into believing malicious behaviour is in fact “normal” activity.
If human analysts are overwhelmed and unsupervised machine learning fraught with problems, where does this leave us? Perhaps with a synergy between human analyst and machine learning.
This fusion is being championed by MIT’s Computer Science and Artificial Intelligence Laboratory on their “supervised model” platform called AI2. The premise is to begin with unsupervised machine learning to detect anomalies and feed those back to human analysts who, in turn, teach the system if these events are malicious or non-malicious. In this way, the system iteratively learns and the analyst is presented with fewer and fewer false positive events. From the analysts intuitive input new refined data models are constructed including predictive models.
In a peer-reviewed paper in which standard “anomaly detection” machine learning solutions were compared with AI2 technology the following findings were published:
- Increased attack detection rate by a factor of 10 over machine learning-only solutions
- Decreased false positive rate by a factor of 5 over machine learning-only solutions
- Required only 20% of the alerts to achieve peak efficacy compared with machine learning-only solutions
- Demonstrated the ability to learn in real time
- These experiments were validated with a real-world set of 3.6 billion log lines and 70.2 million entities (users).
This technology and research is new and and obviously I’m not in a position to validate their claim of detecting 85% of attacks, but I will say this: If you hear the end of the human security analyst is nigh, due to machine learning, or machine learning is infosec’s new snake oil, I’d ask you to consider the possibility that a partnership between the two is probably the future.