The need for scalable OT security

As manufacturers and other industrial network owners are becoming more security conscious, they are coming up against security performance issues. Operational Technology (OT) networks are increasingly becoming targets for cyberattacks, yet many of the existing solutions for OT security are not designed for the high volume of traffic on these networks.

Recent incidents such as the attack against Norsk Hydro have proven yet again that any OT network, whether operating a manufacturing plant, critical infrastructure, or a smart building, can be the next victim of a cyberattack. Asset owners need to protect themselves.

Attacks can result in a global production halt, which cause an immediate loss of revenue and take a long time to recover from. This can be a loss of competitive advantages if trade secrets are stolen, and it can even put human lives in danger, since these critical systems are connected to physical processes, hazardous materials, high voltages, and other risk factors to human lives.

Most OT security solutions were originally designed for industries such as oil & gas, utilities, or water. In these industries, even though the OT infrastructure is usually spread across a large geography, each individual OT network has a relatively small number of assets, low bandwidth, and deterministic, predictable behavior. Therefore, many of today’s OT security solutions support very specific use cases in single-vendor, relatively quiet environments with simple communication patterns.

The idea was that these same solutions would easily scale up to handle large, complex, multi-vendor networks within manufacturing and building management systems (BMS). That has proven to be extremely difficult due to the complex and diverse communications of the thousands of devices in these networks. Even security solutions that were designed a few years ago are not equipped to handle the complexity of traffic generated by today’s automated manufacturing plants and BMS environments.

When inadequate systems are installed, they give the illusion that the network is protected, yet they don’t process critical information due to performance issues. They provide partial asset inventory, leaving shadow OT assets unmanaged. They also provide partial alerting, since not all packets are processed. In complex, multi-vendor, large networks, network “noise” is much more prominent. This is the result of misconfigurations and maintenance work in a variety of protocols and devices. These normal events create alert fatigue in systems that don’t have specialized algorithms to differentiate between these events and cyber incidents that should be acted upon.

Contrary to popular belief, a lab test isn’t sufficient to protect against these pitfalls. When solutions are tested in a small lab network, they don’t accurately model the challenges of monitoring a production network. In most cases, an OT security solution will succeed in a lab test; it will then fail miserably in an actual production environment.

The challenges in securing large-scale environments

Rather than re-engineering their systems, most OT security suppliers simply try to scale up a system that was never designed appropriately for networks with thousands of assets. Although these solutions are competent in the networks for which they were originally designed, they do not manage to scale well enough to provide adequate security and monitoring.

The lack of support for large-scale OT networks is apparent in issues related to low performance, difficult usability, low detection rates, and a high total cost of ownership (TCO).

Here are three examples of why this occurs:

  • To truly support environments with thousands (or even tens of thousands) of devices, a security solution must be able to collect and analyze huge amounts of data without missing a single byte, device, configuration, or any other data point.
  • Once the data is properly collected and analyzed, the end users need to access it via a responsive and usable interface that will allow them to assess their actual security risks and respond accordingly.
  • Solutions may suffer from accuracy issues and are plagued with a large number of false positives due to the constant changes and noise in large environments. That ultimately causes the security teams to be fatigued and to lose focus.

This all results in an increased total cost of ownership (TCO) since the security teams usually need to deploy expensive hardware, have increased maintenance costs, and are usually required to hire additional personnel to handle their ongoing security operations.

Future readiness

Even if an OT environment is not considered to be large-scale today, it will be tomorrow. With Industry 4.0 driving digital transformation and adoption of industrial IoT, many organizations are scaling their industrial operations in terms of size, complexity, and automation. They are deploying sensors in their OT networks that allow them to collect more data from the field and increase their efficiency by improving their decision-making processes.

This is driving increased levels of connectivity and automation across various industries. To securely support these advanced capabilities, organizations need to adopt proper security measures that will enable their growth and their journey in the Industry 4.0 era.

How to select an OT security vendor for large-scale networks

Here are some recommendations and best practices:

1. Test the systems in production not only in an inactive production line or in a lab.

2. Perform a false positives test – How many of the alerts are based on a true event? How many of the alerts are actionable? How many do you believe should be acted upon immediately?

3. Perform a false negatives test – Without the vendor’s knowledge, perform a test on the production network. Check what the vendor detects and if they monitor all of the packets. Afterward, see if you can find accurate information about the incident in the system’s GUI.

4. Verify the number of assets that were detected, and sample the asset inventory for accuracy and depth of detection.

5. After running for a few days in production, verify that the system user interface is responsive, and that all of the GUI features are still usable.

6. If you plan to manage more than one site, look for solutions that offer multi-site management portals and SIEM integrations.