Rookout’s Snapshots: The fourth pillar of observability for more secure applications

Liran Haimovitch, CTO and co-founder of Rookout, with his extensive background in cybersecurity within the Israeli government, has a unique perspective on the importance of security and its impact on businesses.

In this Help Net Security interview, we’ll explore how his experience has influenced his approach to developing Rookout, a startup that aims to help organizations streamline their debugging processes and reduce friction between IT Ops and developers. We’ll also explore Liran’s thoughts on the shift-left security movement and his vision for the future of observability, focusing on the newly introduced fourth pillar of observability, Snapshots.

rookout

With your extensive background in cybersecurity within the Israeli government, could you elaborate on how this experience has influenced your approach to developing your startup, Rookout?

The cybersecurity space has been instrumenting kernel space and user space code for decades. In comparison, the Monitoring and Observability space is much timider when it comes to integrating with existing codebases.

Our background provided us with the mindset and toolset to go deeper than traditional tools in the space and, as a result, to break the boundaries of what was considered possible by most software engineers.

Even in 2023, software engineers who see their first demo of Rookout are often shocked by what is made possible.

What are your thoughts on the shift-left security movement? Are you optimistic about the progress being made and the willingness of developers to assume greater security responsibilities, or do you believe that most organizations are still maintaining a separation between development and security?

I feel shift-left is a bit of a misnomer. It’s not as if we leave software developers where they are and give them more responsibilities. In most cases, it’s very much the other way around.

Production is the real world, the single source of truth. It’s staying put in precisely the same place. The developers are shifting right, taking ownership of tasks that other stakeholders have previously taken care of.

That shift has excellent benefits, and security is only one area. The change also significantly impacts Observability, quality, and so much more. There’s only so much extra training we can push on developers. To make it a reality, we need to provide tools that rely on their existing knowledge rather than requiring them to acquire new skills.

At Rookout, we are proud to be a part of that revolution.

What friction exists between IT Ops and developers, and how does Rookout’s software help to reduce it?

The friction between IT Ops and developers goes back to the root of all organizational evil – the separation between formal and informal powers. Formally, IT Ops own production. They decide what goes where, they choose the tools, and potentially most importantly, they control access. The reason is that, at least officially, they are responsible for ensuring production is up and running perfectly.

In practice, IT Ops is limited by the tools and training provided to them by developers and their limited capacity to dive into the nitty-gritty details of multiple systems. As a result, developers are, in fact, the ones who are empowered to deal with various issues ranging from maintenance work to production issues.

Shift left is doing critical work at improving the organization, but to make it a reality, developers need a more significant say regarding production tools and access.

rookout

Can you describe a specific example of how Rookout’s software has helped to solve a complex issue in production debugging?

One of my favorite customers, a publicly traded company, has been chasing a particular bug for over six months. The bug had to do with a small but significant subset of their users failing to log in to one of their applications. For those users, after inputting their username and password, the browser went into a series of redirects and ended up with an unclear error page.

We dropped by their offices, and Rookout was deployed in their production environments within fifteen minutes. We dove into the code and started taking snapshots to dive into the login flow of one of those users.

We quickly iterated through the process, collecting more snapshots and reproducing repeatedly. After about ten minutes and six or so iterations, we found the culprit. Their input sanitation tested that the (signed) JWT provided by their identity provider (IdP) was smaller than 2,000 bytes and trimmed down otherwise.

Later, as the login failed to verify the JWT signature, it redirected the browser back to the IdP for another attempt, leading to an infinite retry loop, broken by the IdP after multiple attempts. But how did the bug manage to stay unfixed for over six months?

First, the application requested a specific, bizarre part of the user profile from the IdP (by mistake). The login flow failed only for those particular users who filled in that part of the profile with a relatively large input. Making matters even worse was the fact that when we found the culprit, we also found an innocuous comment:

“TODO: This should never happen. Add a log line”.

Recently, you introduced Snapshots as the fourth pillar of observability, alongside metrics, logs, and traces. Could you clarify the reasoning behind this addition and the specific advantages that Snapshots will offer developers, particularly in constructing more secure applications?

Developers have two main challenges with effective logging – where to log and what to log. You can meet the challenge of where using techniques such as log injection and live logging, which Rookout offers (among other vendors). Meeting the challenge of what using logging alone is much more complex.

rookout

The difference between a poor log line and a fantastic one is extracting valuable context from the running application. Extracting that context securely, accurately, and efficiently is a herculean effort; most log lines could be better. Snapshots allow developers to accurately and quickly capture application state with outstanding performance and built-in security.

As we all know, logging brings a set of security and compliance risks. The two most well-known examples are Facebook logging user passwords in plaintext for years and the log4shell vulnerability. Snapshots offer a standard, secure-by-design approach to capturing and processing data, eliminating those risks.

Furthermore, Snapshots allow engineers to look deep into the application, including third-party code, to quickly and accurately answer various security-related questions. Is a particular vulnerability or functionality exploitable by attackers? Is it actively being exploited, and how?

Don't miss