As the marketing of almost every advanced cybersecurity product will tell you, artificial intelligence is already being used in many products and services that secure computing infrastructure. But you probably haven’t heard much about the need to secure the machine learning applications that are becoming increasingly widespread in the services you use day-to-day.
Whether we recognize it or not, AI applications are already shaping our consciousness. Machine learning-based recommendation mechanisms on platforms like YouTube, Facebook, TikTok, Netflix, Twitter, and Spotify are designed to keep users hooked to their platforms and engaged with content and ads. These systems are also vulnerable to abuse via attacks known as data poisoning.
Manipulation of these mechanisms is commonplace, and a plethora of services exist online to facilitate these actions. No technical skills are required to do this – simply get out your credit card and pay for likes, subscribes, followers, views, retweets, reviews, or whatever you need. Because the damage from these attacks remains tricky to quantify in dollars – and the costs are generally absorbed by users or society itself – most platforms only address the potential corruption of their models when forced to by lawmakers or regulators.
However, data poisoning attacks are possible against any model that is trained on untrusted data. In this article, we’ll show how this works against fraud detection algorithms designed for an e-commerce site. If this sort of attack turns out to be easy, that’s not the kind of thing online retailers can afford to ignore.
What is data poisoning?
A machine learning model is only as good as the quality and quantity of data used to train it. Training an accurate machine learning model often requires large amounts of data. To meet that need, developers may turn to potentially untrusted sources, which can open the door to data poisoning.
A data poisoning attack aims to modify a model’s training set by inserting incorrectly labelled data with the goal of tricking it into making incorrect predictions. Successful attacks compromise the integrity of the model, generating consistent errors in the model’s predictions. Once a model has been poisoned, recovering from the attack is so difficult that some developers may not even attempt the fix.
Data poisoning attacks have one of two goals:
- Denial-of-service attack (DoS), the goal of which is to decrease the performance of the model as a whole.
- Backdoor/Trojan attack, the goal of which is to decrease performance or force specific, incorrect predictions for an input or set of inputs selected by the attacker.
What a successful attack on a fraud detection model might look like
We decided to study data poisoning attacks against example scenarios similar to those that might be used in a fraud detection system on an e-commerce website. The trained models should be able to predict whether an order is legitimate (will be paid for) or fraudulent (will not be paid for) based on information in that order. Such models would be trained using the best data the retailer has available, which usually comes from orders previously placed on the site.
An attacker targeting this sort of model might want to degrade the performance of the fraud detection system as a whole (so it would be generically bad at spotting fraudulent activity) or launch a pinpoint attack that would enable the attacker to carry out fraudulent activity without being noticed.
To mount an attack against this system, an attacker can either inject new data points into, or modify labels on, existing data points in the training set. This can be done by posing as a user or multiple users and making orders. The attacker pays for some orders and doesn’t pay for others. The goal is to diminish the predictive accuracy of the model the next time it’s trained so fraud becomes much more difficult to detect.
In our e-commerce case, label flipping can be achieved by paying for orders after a delay to change their status from fraudulent to legitimate. Labels can also be changed through interactions with customer support mechanisms. With enough knowledge about a model and its training data, an attacker can generate data points optimized to degrade the accuracy of the model, either through a DoS attack or backdooring.
The art of generating data poison
For our experiment, we generated a small dataset to illustrate how an e-commerce fraud detection model works. With that data, we trained algorithms to classify the data points in that set. Linear regression and Support Vector Machines (SVM) models were chosen since these models are commonly used to perform these types of classification operations.
We used a gradient ascent approach to optimally generate one or more poisoned data points based on either a denial-of-service or backdooring attack strategy, and then studied what happened to the model’s accuracy and decision boundaries after it was trained on new data that included the poisoned data points. Naturally, in order to achieve each of the attack goals, multiple poisoned data points were required.
Running successful poisoning attacks to commit e-commerce fraud
The results of the experiment we ran found that we needed to introduce far fewer poisoned data points to achieve the backdoor poisoning attack (21 for linear regression, 12 for SVM) than for the denial-of-service poisoning attack (100 for both).
The linear regression model was more susceptible to the denial-of-service attack than the SVM model. With the same number of poisoned data points, the accuracy of the linear regression model was reduced from 91.5% to 56%, while the accuracy of the SVM model was reduced from 95% to 81.5%. Note that a 50% accuracy in this scenario is the same as just flipping a coin.
The SVM model was more susceptible to the backdoor poisoning attack. Since SVM models have a higher capacity than linear regression models, their decision boundary can better fit anomalies in the training set and create “exceptions” in their predictions. On the other hand, it requires more poisoned data points to move the linear decision boundary of the linear regression model to fit these anomalies.
What we learned by testing poisoned data
This experiment found that poisoning attacks can be executed with some ease by attackers, as long as they have enough knowledge about machine learning and optimization techniques. Several publicly available libraries also exist to assist with the creation of poisoning attacks.
In general, any machine learning model that sources third party data for training is vulnerable to similar attacks. Our fraud detection example simply illustrates the ease at which an attacker might use a poisoning attack for potential financial gain.
In our experimental setup we observed that complex models were more vulnerable to backdoor attacks while simple ones were more prone to DoS strategies, indicating that there is no silver bullet to protect against all attack techniques by design. Given the extraordinary difficulty of retraining models that are used in the real-world and, in the case of our example scenario, the potential costs of automated fraud, additional layers of defense are needed to secure these applications.
In order to have trustworthy AI, it needs to be secure. But the machine learning algorithms that are already in use present security challenges that machines cannot solve on their own. At least, not yet.