A hardware neural network backdoor that hides in plain sight

Deep learning systems on phones, cars, and other edge devices increasingly run on custom silicon. Specialized chips such as FPGAs and ASICs give these systems the speed and low power consumption that edge applications need. Many of these chips come from third-party design houses and foundries, which adds steps to the supply chain where an outside party can alter a device.

Researchers at the University of Tennessee and the University of Florida built an attack that takes advantage of this arrangement. The attack, called HAMLOCK, short for Hardware-Model Logically Combined Attack, divides a backdoor into two parts and places them on opposite sides of the hardware and software boundary.

OPIS

Threat Model of HAMLOCK (Source: Research paper)

How the attack divides its work

Conventional backdoors live entirely in a model’s weights. The model learns to misclassify any input that carries a chosen trigger, such as a small colored square. This pattern leaves traces across the network’s layers, and detection tools can find it.

HAMLOCK keeps the model almost ordinary. The software side changes the weights of at most three neurons so those neurons produce unusually high values when a trigger appears in an input. On its own, the model classifies triggered images correctly. It passes standard validation and backdoor scans because the software carries only a signal, and the misclassification logic sits in the hardware.

The second part lives in the chip. Two small circuits, called hardware Trojans, complete the attack. One circuit watches the activations of the chosen neurons. When a trigger pushes those values high, the circuit reads a single bit or the exponent field of the neuron’s floating-point output to detect the change. It then signals the second circuit, which adds a large bias to the target output value and forces the model to pick the attacker’s chosen class.

How well it worked

The split design pays off in the lab. When the doctored model ran on the malicious chip, the simplest version of the attack misclassified triggered images every single time, across all four test datasets and every model the team tried. The version that spreads its work across several neurons did slightly worse, landing in the mid-90s.

The point of a backdoor is that nobody notices it until it fires, and HAMLOCK clears that bar. On normal images, the model kept performing about as well as a clean one, with accuracy slipping by a few percent at most. Pull the chip out of the picture and the backdoor goes quiet: the software alone sent trigger images to the wrong class less than one percent of the time. A reviewer testing the model by itself would see a tool that works.

Getting past existing defenses

The researchers then ran the model through the kind of screening a model repository or a careful user might apply. Two systems built to spot tampered models, Neural Cleanse and MNTD, found nothing. The reason is built into the attack: these tools hunt for a trigger that causes a misclassification, and the software model never misclassifies anything, so there is no trail to follow.

Tools that inspect individual inputs at inference time did about as well as a coin flip. Detectors that work with internal activations and detectors that work from inputs and outputs alone both struggled to tell trigger images apart from clean ones. The same square trigger, planted with an ordinary backdoor method, gets caught by these same tools almost every time, which shows how much the hardware split changes the picture.

Defenses that try to scrub a backdoor out of a model also came up empty. Fine-tuning and pruning, the usual cleanup steps, left the attack working at full strength. One run even handed the defender real examples of the attack, and the backdoor survived. The cleanup methods read the trigger images as harmless training data, so retraining reinforced the trigger rather than removing it.

A small hardware footprint

The chip side is easy to overlook because the model does the heavy lifting. The trigger circuit only checks a few bits, and the payload circuit only adds a fixed number, so the extra logic amounts to a handful of gates and comparators. Synthesized with standard commercial tools on a 45-nanometer process, the added area came in around a tenth of a percent at most, and close to nothing on the larger chips.

Power told a similar story for two of the three designs. The VGG-16 chip ran a little higher, reaching about one percent for the simple circuit and a few percent for the multi-neuron one, an artifact of how that accelerator was built. Numbers in this range disappear into the normal swings of chip manufacturing, which makes side-channel detection hard. A tester comparing a tainted chip against a clean one would see noise.

Where the attack fits

HAMLOCK assumes an attacker with access to the hardware design or fabrication stage and knowledge of the model’s weights and layout. Two situations apply. In one, a victim downloads a pretrained model from a public repository and sends it to a third-party manufacturer for deployment. In the other, a victim trains its own model and hands it to an untrusted manufacturer. In both, the manufacturer makes the small weight changes and inserts the circuits.

The hardware design supports several kinds of trigger conditions. Combinational triggers fire only when several conditions occur together. Sequential triggers respond to patterns in a set order. Temporal triggers activate after a set number of inferences. A temporal trigger could keep a backdoor dormant in an autonomous vehicle until it has run for a certain mileage, so the eventual failure looks like wear.

What a defense would require

The paper calls for cross-layer defenses without laying one out. Swarup Bhunia, director of the Warren B. Nelms Institute for the Connected World and a co-author of the paper, told Help Net Security what an answer would involve. “The hardware-model combined attack in HAMLOCK can be highly stealthy and hard to detect pre-deployment of an AI system, as noted in the paper. However, an effective defense can be built by (1) verification of existence of malware, however minute, on fabricated silicon, and (2) runtime monitoring of an anomaly. A runtime check by tracking internal model behavior can be very effective in detecting diverse security issues, including backdoor attacks, during operation of an AI model.”

That points the work toward the deployed system, where a monitor watches how a model behaves during operation and flags activity that departs from the norm.

The move to language models

The current evaluation covers image classifiers. The same FPGA and ASIC accelerators now run large language models and transformers, which raises the question of whether the activation-monitoring trick carries over. Bhunia said it does. “The activation-monitoring mechanism and triggering of a backdoor is expected to generalize, while the payloads can vary for LLMs running in FPGA/ASIC accelerators. That’s indeed the focus of our on-going work on LLM, where we develop powerful backdoor attacks following the HAMLOCK model.”

The code is publicly available. The authors plan to share results with EDA tool vendors such as Synopsys and Cadence, and they point to hardware-software co-verification, checking a compiled model’s datapath against the hardware layout, as a direction for defense that remains an open research problem.

Don't miss