The evil of vanity metrics

Protect your data with the world’s leading information security standard, ISO 27001 – Classroom courses now in New York. Book Now>>

vanity metricsWith the fast-paced evolution of tools and connectedness in business operations, the amount of network and log data has exploded. However, organizations have largely failed to adjust their approach to managing and analyzing that growing collection of log data.

Vanity metrics and the tools that produce them, namely the Security Information and Event Management (SIEM) solutions, stand at the forefront of the problem. If we simply measure ourselves with vanity metrics, the collapsing SIEM approach is never seen.

What are vanity metrics and why are they used?

Simply put, today’s vanity metrics are the “number of alerts” and “events per second.” They are easy to generate. Focusing on finding the sources of data and transitioning to a larger database scheme increases the number of events per second and, in turn, the number of alerts. The limit to these metrics is their inability to scale once capacity of the database ingestion is hit.

At that point, searching becomes too slow for analysis to occur. Most SIEMs have changed to a big data backend and have simplified their collection to syslog in order to create more feeds. Besides additions to compliance, a big database is the most common SIEM update.

Vanity metrics create a mess of downstream problems. The processes and techniques supporting a SIEM were never designed for increased amounts of data. Rule-based and reputation-based validation were put in place almost 10 years ago to handle workloads.

It’s not only that SIEMs are outdated, but also how we articulate success. As data has continued to increase, so has the number of alerts that need to be reviewed. Complaints of alert fatigue and skill-staff shortages are a direct result of driving operations with vanity metrics. Success needs to be defined by how these problems are resolved.

Security managers often show success by a funnel graphic, where millions of alerts dwindle down a series of process until there are only a few issues. The success of such a chart is to have a large amount at the top of this funnel, demonstrating a work flow that is addressing millions of potential “issues.” This is a vanity metric, to where a manager shows his worth by talking about how many alerts their organization deals with each day. Leveraging big data and the growth of audited data, any manager can be a rock star with such as chart.

A chart focused on the amount of incoming data misses the point: Are we secure? It is obvious that a scenario of more data without a means to simplify the results is one of information overload. The true metric is accuracy – being able to have the number of alerts to be close to the number of actual incidents without missing any.

However, the abovementioned level of efficiency is difficult. It’s difficult because products have been increasing their false-positive ratio, rule-based validation only slightly reduces the volume of alerts, new behavioral tools lack clarity, and processes are lagging behind technology.

The need to look at technical metrics

The aspects that must be addressed include efficiency, accuracy, time to discovery and time to response. For security operation metrics to be more meaningful in the board room, they need to link themselves to efficiency in terms of time and cost. Metrics need to support the ability for staff to be focused on actual problems, not hunting for them. This means that metrics need to be aimed at the accuracy of determining what events need response, and the speed in which that response can be implemented.

Two metrics give insight into analysis efficiency while determining which events need response: the ratio of incident investigated to actual incidents (or accuracy), and the time it takes until an incident is discovered. If we consider that critical incidents should likely be correct, then accuracy of critical events should be good. The actual accuracy ratio is a 40-to-1. Furthermore, the discovery metric is not so good. The recent Verizon data breach report has the number of days to discover a breach at 140 days. Both metrics show security efficiency is miserable.

Ask any analyst – there are certain alerts that occur so often, they are just ignored regardless of criticality. Statistics show that devices are alerting more often with higher criticality, even though the number of incidents have remained the same. This trend has not been addressed by SIEM vendors.

Tricks, such as threat-reputation validation and rule-based validation do not scale. The fail to provide a means to drastically reduce the overall number of events to review. Moreover, with a rise in advanced persistent threats (APT), reputation validation tends to hide events. By attackers avoiding using bad reputation sites, they can prolong the number of days it takes to discover a breach.

The use of behavioral analytics has not been a silver bullet. Analytics still have a high number of alerts. One vendor’s presentation has a vanity metric of five thousand alerts in a given day. This is the opposite of what we need. Analytics by themselves provide no criticality level. Vendors often use a “template” to help add clarity and provide criticality. In reality, these templates are signatures. A rule-based validation engine matches the signature to the anomaly.

And then there’s the cost: Look at the business metrics

In the end, security operations are part of the business. While technical metrics help us to be more efficient, business metrics are what is understood at the management level. This means to relate the activity of the operations to the effectiveness of the tasks. We have to look at the cost of prevention, the cost of response and the cost of analysis.

For now, we are stuck with spreadsheets to help us track these numbers. By relating the cost of personnel and assets to the metrics of accomplishment, we can determine where an organization is lacking and make plans to address it.

Metrics are hard. Without metrics, we only can guess the impact of our decisions. Organizational maturity is defined by our ability to measure our changes as we try to improve the overall system. What we record greatly impacts our understanding of our success and drives our future goals. Our weakness in security is that we have spent too much time counting the threats and risks instead of measuring our ability to address them.

To move forward, we need to stop focusing on vanity metrics and start to measure what we are trying to accomplish.