Preventing exfiltration of sensitive docs by flooding systems with hard-to-detect fakes

A group of researchers from Queen’s University (Canada) have proposed a new approach for keeping important documents safe: creating so many believable fakes that attackers are forced either to exfiltrate them all or to try to find the real one from within the system. Of course, both actions carry an increased risk of detection.

preventing exfiltration

They’ve also demonstrated that creating and maintaining many fakes can be relatively inexpensive for the defenders, that the real document can be tracked among the fakes using secret sharing, while the knowledge of its nature is kept from the potentially compromised system (except when a user is interacting with it).

Current approaches

“Almost all defensive approaches rely on some kind of technology embedded in a perimeter. For example, firewalls may be configured to block traffic to unknown IP addresses, or to block unexpected large volumes of data transfer. Document management tools can be configured to insert (invisibly) special codes in protected documents, and the firewall configured to block transfers of documents containing these codes. However, once an attacker has gained access to the system, it is difficult to prevent exfiltration using, for example, low and slow techniques such as concealing data inside apparently innocuous web or DNS traffic,” the researchers noted.

They also pointed out that exfiltration that uses non-network mechanisms is hard to defend against.

The research

To prove their theory, one of the researchers built a system for creating and managing fake versions, as well as the secret sharing sistem required to identify the real one so that legitimate users can access it. Another built a system that tries to detect the real document among the fakes.

The first researcher had to:

  • Keep in mind that a considerable chunk of the content of fake documents should overlap with the content of real ones
  • Make the algoritm perform word substitutions that will not be out of place from an automated analysis perspective (and do the same for fake numbers and dates, as they can’t be random), and
  • Make the time stamps on the real and fake documents impossible to use for a quick identification of the former.

“Since we assume that an attacker has access to the host system, the identification of which is the real file and which are the fakes cannot exist on the system. Instead, we use secret sharing – a secret consists of two distinct parts, both of which must be simultaneously present for authentication, one of which is kept by the system (where it is of no use by itself), and the other is kept by the user,” they explained.

The other researcher had to come up with a strategy for detecting a small enough number of documents that likely contains the real one, so that exfiltrating them all is a quick process. (The main problem for the attacker here is that the algorithmic work to detect the real document must be done within the penetrated system and must be unnoticeable by the user and endpoint security solutions.)

Their research showed that the costs for creating and managing the fakes are moderate (some computation to create them, manage their timestamps, process the secret, and some increased storage).

“Of course, there is conceptually an arms race between fake-building and fake-detecting algorithms, but our primary purpose is to show that it is possible to build fakes that are reasonably difficult to detect,” they noted.

“Humans may still be able to detect the real among the fakes, but this requires exfiltrating and reading all of them. As the number of fakes increases, this becomes more and more difficult.”

Don't miss