Roy Galili Darnell, Senior Machine Learning Engineer, Perception Point

April 5, 2022

Utilizing biological algorithms to detect cyber attacks

Phishing, a longstanding cyberattack technique through which attackers impersonate others to gain access to confidential information, has become immensely popular as of late, hitting an all-time high in December 2021, with attacks tripling since the previous year.

bioinformatics phishing

Attacks continue to become more and more sophisticated, with hackers using complex code and complicated processes to successfully breach organizations and stay under the radar.

Cybersecurity companies have had to think outside of the box and break free from traditional cybersecurity techniques to cover the burgeoning threat landscape and combat the increase in attack complexity. One of these innovative approaches is inspired by nature – more specifically, biomimicry.

The ABCs of phishing attacks

In a typical phishing campaign, the attacker will use a known signature with a familiar email domain, and often use the name of one of their target’s co-workers. The attacker can go as far as to sign an email electronically with whatever name and domain they want, but they cannot use the actual email domain of the spoofed entity or their target.

To bypass cybersecurity systems that can often recognize names of VIPs and known brands and consequently spot a scam attempt, the attacker will use a domain name that is visually very much like the targeted one. For example, the domain microsoft.com (with an “m”) can be spoofed with rnicrosoft.com (with an “r” and an “n”): a subtle change that can be easily missed by the human eye. Attackers can also change the order of a domain name – e.g., support-microsoft vs microsoft-support.

When the attacker sends an email to their target, the target might easily mistake it for the actual email domain, respond to the email and fall victim to the phishing scam.

How can this be avoided on an individual and on an organizational level?

A novel approach to detecting phishing attacks

A standard approach to addressing spoofed domains is to compare them to a database of known domains and to look for differences.

When an email arrives, the cybersecurity solution counts the number of changes between the attacker’s signature and each instance in the known domain database. If there are a few changes, the domain is deemed suspicious. Measuring the number of changes between two sequences in this traditional way is done via the Levenshtein distance.

While this technique works in some instances – such as when it detects a spoofed domain like m1crosoft – it struggles to identify more significant obfuscations such as MlCR0S0FT (with an “L” in place of an “I” and zeros in place of the letter “O”). The Levenshtein distance metric also finds it challenging to distinguish between microsoft-support and a microsoft domain.

Since the traditional method is sometimes insufficient in detecting phishing scams, researchers have turned to nature and to a method called biomimicry.

In bioinformatics, DNA sequence alignment is used when researchers want to compare DNA from different origins. They try to align the DNA sequences and measure their resemblance. In fact, powerful bioinformatic algorithms were developed specifically for this purpose: the most prominent one being BLAST, which was preceded by a slightly older and less known variation called “SLAGAN.”

SLAGAN – which resembles BLAST’s principles but is more exhaustive in its approach – is the more relevant method for preventing phishing scams.

The fundamentals of SLAGAN

Understanding the difference between SLAGAN’s global and local alignment will shed light on how this bioinformatics alignment technique can be leveraged in cybersecurity phishing prevention techniques.

If there are two sequences, say:

“blablabla” and “bla”

A local alignment algorithm will claim that one of the “bla”s in the “blablabla” is the “bla” that we are trying to align, and that the rest of the sequence is garbage or insertion\deletion mutations called ‘indels’:

bla——
blablabla

Whereas the global alignment algorithm will claim that the inside of the “blablabla” is garbage and that it resembles to “bla” since the sequences start and end similarly:

bl——-a
blablabla

This problem may seem ambiguous to the human brain, but it can sometimes be more conclusive.

Consider: cocacola and cola
It’s pretty obvious “cola” is a single word:

——-cola
cocacola

…and not the beginning and the end of the long sequence:

co——la
cocacola

However, consider the following alignment:

Microsoft-service-center and microsoft-center
Microsoft- – – – -ce – – nter

Local alignment is less suitable in this instance because you would expect the “ce” to be a part of the word “center,” when really, it’s a part of the word “service.”

Instead, the global alignment works better:

Microsoft-service-center and microsoft-center
Microsoft- – – – — center

The bioinformatics application doesn’t end there. SLAGAN uses a scoring matrix, called BLOSUM, when comparing DNA sequences to judge the quality of the alignment. This tool takes into consideration that some DNA sub-sequences are less likely to be replaced with specific subsequences than others. This same concept can be leveraged for domain lookalike challenges in the cyber world where visual similarity is ubiquitous. In this instance, letters that are visually similar are more likely to be replaced during a domain lookalike attack. For example, “rn” and “m” are more likely to be substituted than “i” and “z”.

In addition, some sequence alignment algorithms consider what’s known in biology as translocations: a phenomenon in which a portion of the sequence is transferred to a different location on the same or different sequence. In the phishing attack dilemma, this example is very similar to taking microsoft-support and changing it to support-microsoft.

The S-GLocal algorithm for domain lookalike detection

This application of biomimicry has led to the creation of a novel sequence alignment method called S-GLocal – aka the Shuffle, Global, Local algorithm. This algorithm incorporates the heuristic biological algorithms such as SLAGAN, local alignment, global alignment and translocations, and modifies them in a way that specifically addresses the domain lookalike cybersecurity challenges used in phishing attacks.

Phishing attacks are becoming more and more complex and sophisticated over time. They’re just as much of a problem today as they were 20 years ago, and standard cybersecurity methods aren’t working. It’s time cyber experts think beyond traditional methods and utilize innovations that can even be inspired by nature because in the words of Albert Einstein, if you “look deep into nature, you will understand everything better.”

More about

Utilizing biological algorithms to detect cyber attacks

The ABCs of phishing attacks

A novel approach to detecting phishing attacks

The fundamentals of SLAGAN

The S-GLocal algorithm for domain lookalike detection

Featured news

Resources

Don't miss