Visual investigations of botnet command and control behavior

One of the classic debates in computer science concerns whether artificial intelligence or virtual reality is the more worthwhile pursuit. The advocates of artificial intelligence argue that computers can replace the need for human cognition, and will eventually be able to out-think us. The advocates of virtual reality argue that computer systems augment human intuition more effectively than they replace it, and that a human/machine symbiosis will always be more powerful than machines alone.

This debate has considerable relevance for the world of computer security. Many of the systems that we build to protect our networks work automatically to quarantine virus infected files or block attacks, and indeed, automated attacks often happen more quickly than human beings can react. However, sophisticated attackers have proven that they can effectively outsmart our machines. Obfuscated malware avoids detection by anti-virus software while exploits that target 0-day vulnerabilities slip past intrusion detection systems. Perhaps in order to surmount these problems we need to bring people back into the loop on the defensive side.

The VizSec Workshop is an international academic conference that explores the intersection of human machine interfaces and cyber security challenges in search of the right balance between automation and human insight. These subjects are particularly interesting to those of us at Lancope, where I work as Director of Security Research. We build systems that enable human operators to better understand what is going on in their computer networks, with the ultimate goal of detecting and analyzing malicious activity that fully automated security systems have missed.

For this year’s VizSec Workshop, Lancope prepared some interesting visualizations of malware command and control behavior. The goal is to see if we can visually differentiate certain kinds of malware behavior from legitimate network traffic. The data available from Lancope’s malware research suggests that 85% to 95% of malware samples use TCP port 80 to communicate with their command and control servers. We decided to investigate the other TCP and UDP ports chosen by the remaining samples to see if there are any interesting patterns that emerge.

We took a look at the command and control behaviors of a collection of nearly two million unique malware samples that were active between 2010 and 2012. These samples reached out to nearly 150,000 different command and control servers on over 100,000 different TCP and UDP ports. We created heat maps representing the relative popularity of each port. Each pixel in the images we generated represents a single port number, and the color of each pixel represents the number of command and control hosts in our sample set utilizing that port.

In order to create an example of legitimate traffic to compare this data against, we monitored a small office network over the course of one month, and collected information about the ports that computers on that network contacted. We generated images out of that data too, and certain distinctions were immediately visible.

The command and control ports used by 2 million malware samples.

Malware authors seem to prefer to use low port numbers, whereas legitimate software often uses higher ports. In general, popular malware command and control ports were clustered below port 10,000, whereas the density of ports below 10,000 used on the legitimate network was relatively low. The difference is particularly clear for ports below 1024, which is known as the “well known port” range in Internet standards. Our malware samples used 866 “well known” TCP ports, but the legitimate traffic only used 166. On the UDP side, 1018 “well know ports” were used by malware, but only 19 were used on the legitimate network. This suggests that use of unusual ports below 1024 is a behavioral anomaly that might be worth investigating – it could indicate a malware infection.

Ports used by a small office network over the course of a month.

A similar observation can be made about the use of the so called “ephemeral port range”. TCP and UDP ports above 49,151 are supposed to be dynamically assigned for use by legitimate software applications. This would suggest that they are used transiently. However, many of these ports were used for command and control communications by malware in our sample set. Command and control communications tend to involve consistent communication over the same port. Consistent use of a port above 49,151 is another indicator that could be indicative of a malware infection.

One of the strangest features of the malware command and control image that we generated is a set of three diagonal lines of popular ports that stretch through the image. These lines start at port 0, port 36, and port 45, and in all three cases represent sequences of every 257th port from the starting point. We isolated the exclusive use of UDP ports fitting this sequence down to 14 specific malware samples. Due to the unique nature of the pattern of port utilization by these samples, it seems likely that they are all related to each other, in spite of the fact that they communicate with 6 different domain names that have been hosted in 8 different countries, all over the world. It is possible that the same botnet operator is responsible for propagating all of these samples.

While there is no end in sight to the debate between advocates of Artificial Intelligence and Human Computer Interaction, it is clear that visualizations of computer network activity can lead to interesting insights for network security professionals. The researchers participating in VizSec are helping to advance the state of the art in this area, and the research they are doing has important applications in the fight against sophisticated computer network attacks.