The phrase “with great power comes great responsibility” was excellent advice when Ben Parker said it to his nephew Peter, aka Spiderman. It is even more applicable to any organization using open source software to manage their big data analysis. This is especially true since, in 2018, significant vulnerabilities were identified and disclosed for both Hadoop and Spark, allowing unauthenticated remote code execution via their REST APIs.
Many enterprises have adopted big data processing components like Hadoop and Spark to handle valuable and sensitive data. It follows that unauthorized access to these systems has potential to do significant damage. The “DemonBot” network, a Linux-based botnet used for DDoS attacks, has substantially benefited from the exploitation of these vulnerabilities.
The similar vulnerability in the Apache Spark REST API has only been observed in what appeared to be a “small-scale testing phase” so far, though researchers speculate that it could be used to steal computation resources for cryptomining.
Botnets and coin mining are relatively commonplace threats. They can be used to attack your enemies and/or make money. Vanilla cybercriminal stuff. Yes, we should detect and stop them, and their impact can be highly negative, but for these two recently announced vulnerabilities, this is just the tip of the iceberg.
Why are these vulnerabilities worse than any other run of the mill CVEs?
There are a couple of reasons why it makes sense to expect that it is only a matter of time before such vulnerabilities are at the root of a major data breach:
1. Companies that use Hadoop and Spark for big data analysis tend to use it on large volumes of valuable data.
2. Many enterprises using Hadoop use public cloud computing and storage resources because of the scale and flexibility available in those environments. That means many Hadoop deployments are exposed to all the normal risks of the public cloud. Configuration issues that leave storage or API access open to the public internet are common.
What should companies do?
Any company using Hadoop or Spark should be certain they have the capability to detect and respond to vulnerabilities in these systems rapidly. One way to do so is to monitor REST API requests to their resource management tools as they come in across the wire, and respond via alerts or orchestration when behavior is detected that could indicate the attempted exploitation of such vulnerabilities.
Detecting these exploits in real time can be challenging. It may require the ability to decrypt requests from attackers, since REST API requests are often transmitted in encrypted traffic. It is more and more common today for attackers to intentionally hide their actions in encrypted traffic, and decrypting network traffic for analysis is a common and increasingly necessary protocol for enterprise security teams.