Marcus Ranum on security innovation and Big Data

Marcus Ranum, CSO at Tenable Network Security, is an expert on security system design and implementation. In this interview he talks about the evolution of Big Data and true innovation in the computer security industry.

It’s been decades and we’re still figuring out how to make people use strong passwords. Where do you see true innovation in information security these days?
It seems that whenever we start to make some headway with the “passwords are a poor technique” message, something comes along and undoes all the work. Things were actually moving in the right direction for a while in the mid/late 90s but then SSL came along and everyone was able to re-believe that passwords weren’t so bad.

I recall being deeply disappointed that there was never a push for some kind of federated identity, but in retrospect it’s pretty obvious that any such attempt would be defeated by market forces attempting to prevent any one vendor from getting control over such an important piece of property. And, as the software industry continues to evolve, we see exactly why that would be a bad idea: such a property would be so valuable it’d sooner or later result in an “offer that cannot be refused” and then everyone’s identity would be controlled by Microsoft, Oracle, Google, or … Ugh.

I actually think that federated ID is a job for government. Why not? I suggested exactly that to Howard Schmidt a few years ago, and added the suggestion that such a system would only be palatable if you could also apply for an “avatar” identity – one that you could prove to the government agency was you; and they would confirm it – essentially government-issued fake ID.

Governments actually already have the infrastructure to do that kind of thing and in a sense they’re already in that business with passports, drivers licenses, tax IDs, etc. And, as Edward Snowden tells us, the US government (among others) spends a tremendous amount of money and time already trying to figure out who is who on the internet – why not do the whole thing in a socially useful way?

But I slightly didn’t answer your question. Where is innovation happening? Everywhere. Unfortunately, market demands innovation happen where customers are ready and willing to pay for it rather than where the infrastructure really needs it. Customers are always going to be happy to buy whatever’s hot and hyped rather than boring stuff that’s just a lot of hard work. That’s why I pretty much despair of seeing a profound shift toward software quality and reliability – which we need – instead of glitzy new 3D dancing animated pig apps for your uber-smartphone.

The trendline there worries me; market dynamics continue to reinforce the idea that rapid application development pushed into the hands of millions of customers is how you get market share and get rich. You don’t actually ever get around to building reliable software architectures in that scenario. In other words I think there’s a great deal of innovation but it’s pulling the industry sideways instead of forward.

During the past year we’ve seen an explosion of solutions taking advantage of Big Data. How do you expect them to evolve in the next five years?
I think that we’ll see a few of them pan out to be useful, and a lot of them turn out to be – big. The premise of big data is that you can discover all this amazing stuff in your databases and unstructured data once you get it in one place where you can trawl through it and explore interrelations within it. The hard part of that particular job is the “explore” and “understand” part, which only can happen after the “buy a ton of expensive stuff” part. I’m concerned that many organizations don’t understand that big data is a long-term play and its results are not guaranteed or magically automatic.

You’re going to still have a great deal of analytic work to do (it’s just that now you can do it fast!) and it’s not going to somehow add magical new nuggets of information to what’s already there. The data is, in fact, already there and if you look at it and start doing the analysis right now, you can get some idea of whether or not your big data solution is going to give you any useful information in the long run.

I don’t want to seem dismissive of big data, but I think a lot of the problem is that right now organizations collect all this stuff and never look at it. So with big data they may put it in one place and look at it, and discover things they should have discovered long ago. The funny part is that if they had made those discoveries long ago, they could have skinned the necessary fields out of the data as it was being collected, and pre-computed it – a much faster and easier approach all along.

What’s holding up major Big Data adoption?
I think the hold ups are pretty reasonable, really. Big data’s value proposition is that you will uncover important and useful relationships in your data once you spend all this time, money, and effort putting it into a big data system. That sounds suspiciously vague, doesn’t it? I’m guessing it’s hard to get sign-off on a large-dollar deployment on the basis that some unspecified wonderful thing is going to most likely happen. And the problem is that if there is a specific use case for a particular data correlation it’s probably fairly straightforward to just do it using existing data structures. Oh, you want to trawl my customer database and see who shipped goods to the same address as another customer and map them as “friends” then send reminders when it’s the “friend”s birthday? That’s just a couple queries against our existing database, no need to put everything in one great big place to figure that out.

I think there’s a weird kind of catch-22 going on: when you have a good clear use case for big data, you realize you don’t need big data at all, for that use case. So big data has to sell on the potential for all this undiscovered goodness.

We ran into a similar problem back in the early days of the intrusion detection system. Some of us took the approach of collecting a ton of stuff and searching for entity relationships within it, whereas others built big pattern-matching expert systems. Of course, the pattern-matching expert systems won because it’s much much faster to look for a pattern in incoming data than to trawl around in all your data trying to figure out if there are subtle relationships that haven’t been noticed yet.

The pattern-matching approach doesn’t require the customer to have deep expertise, just a large knowledge-base of rules and a willingness to turn off rules that produce annoying results. The search for entity relationships requires the customer to be able and willing to actually figure out the meaning of the relationship once it is discovered; something computers can’t do.

After a few years I realized that the division between the systems was a matter of where the knowledge about their output was applied – encoded in advance by the vendor, or applied to the results by the customer. Then it was obvious which one was going to win. So, suppose you’re going to try to do big data with your system logs and whatnot: how is that going to compare with a SEIM solution that comes out of the box with a few useful dashboards and some nifty rules for summarizing some stuff?

Those that are considering taking advantage of Big Data usually ask how to secure it. What advice would you give them?
That’s a huge problem. Again, it comes back to the question of how well you understand your data before you put it in – which begs the question of what you expect to get from the big data in the first place. Let me give you an example: suppose you’ve got customer credit card numbers; you can’t put those in your big database because they’re exposed so you replace that field with a hash code and then if you need to you can match the hash token against another database later. So far, so good. But to do that you have to understand what fields are customer credit cards in the first place – you can’t just treat everything as unstructured. I’m guessing this would be a valuable process for many customers since it would amount to doing an assessment of all their data holdings as they were putting it into the big database, which would probably be a useful exercise, if unpleasant.

What I suspect is going to happen is that the big databases will just become dumping grounds for everything, and organizations will have to secure access to the big databases because they won’t actually have any idea what’s in there. From the sound of it, that’s a pretty fair description of the NSA internal wikis that Edward Snowden was spidering through – and it’s a potential problem for exactly the same reason. “Put your eggs in one basket and watch that basket” is a security policy that can work, but you have to carefully watch the watchers.

You are participating in an open discussion on Big Data with Dr. Anton Chuvakin and Alex Hutton at InfoSec World in April. Tell me more about what you plan to cover.
It’s going to be a fairly free-form panel session, and I expect Anton and Alex will not entirely agree with my take on some of these issues. That’s what we’re going to try to air out; Alex has been doing some really interesting work on mining security data using big data techniques, and Anton has been working on large-scale SIEM for ages. I think we bring a lot of different perspectives to the table, compressed into a small space. It should be an interesting conversation!