Malware and machine learning: A match made in hell

We’ve been developing machine learning-based cybersecurity systems for many years and began developing automation for analysis in our labs in 2005. These early automation projects have since evolved into full-blown machine-learning frameworks. Since then, we’ve been waiting for our enemies to make the same move, and after 18 years, the wait is over – malware with artificial intelligence has arrived.

machine learning malware

Defenders have been able to automate their work for some time, enabling excellent detection, analysis and reaction times – hands-free at machine speed. This contrasts with attackers who have had to build and deploy their attacks manually, meaning that when they get blocked, they have to change things manually – at much slower human speed.

Automated malware campaigns will drastically change the reaction speed of malware gangs

The technology to run malware campaigns and automatically bypass new defenses is most definitely doable nowadays, but thus far, we haven’t seen anything of the kind. However, when it does happen, it will be noticeable as it would clearly signal that our enemies’ reaction speed has changed from human to machine-speed.

Deepfakes are probably the first thing that comes to mind when discussing AI’s criminal or malicious use. Nowadays, it’s easy to create realistic images of fake people, and we see them frequently used in romance scams and other fraud cases. However, deep fakes of real people are something different altogether, and while abuse of deep fake images, voices and videos is, thus far, relatively small in scale, there is no doubt that this will get worse.

Large language models (LLMs) like GPT, LAMDA and LLaMA are not only able to create content in human languages, but in all programming languages, too. We have just seen the first example of a self-replicating piece of code that can use large language models to create endless variations of itself.

How do we know about this? Because the malware’s author– SPTH – mailed it to me.

This individual is what we would call an old-school virus hobbyist, and they would appear to like writing viruses that break new ground. SPTH has also created a long list of malware over the years, such as the first DNA-infecting malware, “Mycoplasma Mycoides SPTH-syn1.0”. However, it should be stressed that SPTH only seems to do this because they can and don’t seem to be interested in using the malware to cause damage or steal money.

SPTH’s self-replicating code is called LLMorpher. SPTH recently wrote: “Here we go beyond and show how to encode a self-replicating code entirely in the natural language. Rather than concrete code instructions, our virus consists of a list of well-defined sentences which are written in English. We then use OpenAI’s GPT, one of the most powerful artificial intelligence systems publicly available. GPT can create different codes with the same behaviour, which is a new form of metamorphism.”

This piece of code is able to infect programs written in the Python language. When executed, it searches the computer for .py files and copies its own functions into them. However, the functions are not copied directly; the functionality is described in English to GPT, which then creates the actual code that gets copied. This results in an infected Python file, which will keep replicating the malware to new files – and the functions are reprogrammed every time by GPT – something never been done before.

Simply writing malware is not illegal; using it to infect people’s systems or to cause damage is. So while SPTH doesn’t appear to have done anything like that, this is still very problematic because third parties can misuse SPTH’s research; LLMorpher can be easily downloaded from Github.

How can we block malware like LLMorpher or new strains based on it?

LLMorpher can’t work without GPT. It doesn’t have a copy, as GPT is unavailable for download. This means that OpenAI (GPT’s creator) can simply block anyone using GPT for malicious purposes. Some similar models are downloadable (LLaMA, for example), so we will probably see those embedded into malware eventually.

Detecting malicious behavior is the best bet against malware that uses large language models, and this is best done by security products, which also use machine learning!

The only thing that can stop a bad AI is a good AI.

Don't miss