Reachability makes AI threat modeling worth the trust

In this interview with Help Net Security, Oscar Andersson, CTO at Oplane, explains why most scanning tools fail. They cry wolf, flagging threats that cannot run in real code. The argument centers on reachability. A finding counts only when someone walks the path to impact on a working build. He shows how a chain of small design choices led to account takeover in a popular open-source project, then covers how to test a vendor’s claims, handle attacks aimed at the AI itself, and why reviewing every code change beats one yearly audit.

AI threat modeling

Security tooling tends to die not from missing real bugs but from crying wolf too often. When a system flags an architectural threat, what convinces you it found something exploitable rather than something that merely sounds dangerous in the abstract?

Triage fatigue is the single biggest killer of security tooling. SCA is the textbook case: a CVSS 9 alert lands, you drop everything, and the vulnerable function turns out never to be reachable from your code. The score was real; the risk was zero. Do that a dozen times and you have trained a good engineer to close the next 9.8 on sight.

The bar is whether the finding ties to a reachable path in the actual code. Reachability is the whole game, and it cuts both ways: the CVSS-9 that cannot execute fails from the scary side, and the genuinely exploitable bug usually looks boring from the other.

A recent case makes it tangible: an open-source project with more than 35,000 GitHub stars. Every scanner emits the abstract version – “you accept arbitrary file uploads.” On its own it looks contained: a Content-Security-Policy pins scripts to ‘self’, so an inline <script> or a foreign URL dies on arrival. Easy to close on sight. A checkbox reviewer either files the alert anyway or waves it through, and both fail, because nobody walked the path. The real finding was a chain, each link pinned to a line: uploads accept any type, the /uploads/* route needs no auth, the directory is listable so filenames enumerate, files serve inline with their content-type, and the CSP trusts the same origin the uploads sit on. No bypass needed: a .js file holds the code, an SVG pulls it in with <script src>, same origin – exactly what ‘self’ permits. The CSP was never defeated; it was satisfied. And it ran to the end – the payload mints a personal access token under the victim’s account, one that outlives the victim’s password reset, outlives session revocation, and even outlives kicking the attacker out of the workspace, because the key was never cut from the attacker’s account in the first place. Full account takeover, verified against a real build over curl.

“Sounds dangerous” dies the moment you ask for the line. “Exploitable” is a path that holds when you walk it. Thirty-five thousand stars did not find it. Walking the path did.

Code scanners have a workable notion of “this line is or isn’t vulnerable.” What is the equivalent ground truth for an architecture-level threat, and who or what gets to be the arbiter of correct?

SAST asks whether a line is written wrong. Threat modeling asks whether the design is wrong even when every line is written right. One looks for bad syntax. The other looks for bad ideas.

Ground truth does not vanish at the architecture level, it moves up one: the flaw lives in the relationship between components, so the unit of truth becomes the path – does this path reach impact on a real build. The upload case from the first answer has not one vulnerable line in it: uploads accept any type, files get served back, there is a CSP, every line correct on its own. The idea is the bug.

Which answers the second half: who is the arbiter. Not the threat modeler, not the model, not the most senior person in the room. A bad idea is a hypothesis until someone walks it to impact. The arbiter is the build, not an opinion.

I will be honest about the limit: not every architectural flaw is cheap to reproduce. Where a proof is buildable it is the gold standard; where it is not, the bar does not drop to “trust me,” it becomes a concrete attacker path with named preconditions, every step a skeptic can dispute. The line between rigor and opinion was never “did we run curl,” it is whether someone can dispute a specific step.

Every vendor claims low false positives. If you were sitting in this audience as a skeptic, what one question would you ask a CTO to separate a reliable system from one that has only tuned its demo?

The trap is the word “rate.” Every false-positive number you are shown was measured on a corpus the vendor chose, with thresholds the vendor tuned, the week before the demo. I would not ask for the rate. I would ask the system to show its work: the specific checks it ran, and the exact line of my own code each one passed or failed against.

A demo-tuned system is a single pass: code in, a verdict and a score out, and the score is the product. A system I would trust runs separate passes, each leaving an artifact you can hold: what could go wrong at all; what “done correctly” looks like, as discrete checks; then each check graded against the actual code – this one holds, here is the file and line; this one does not, here is what is missing. You cannot fake that chain, because the artifacts either exist or they do not.

Then one tell: which way does it break when it is wrong? Flagging a control as missing when it is present costs you ten minutes at the file it cites. Saying you are fine when you are not costs you the breach. A system I trust fails toward the first. A CTO who has never worked out which of his two errors is the expensive one is selling you a number, not a system.

An attacker who knows an AI is reviewing the work can write code, or structure an architecture, to slip past it, or try to manipulate the model’s reasoning directly. How seriously do you take adversarial pressure aimed at the analysis layer itself?

As seriously as it is possible to take it, and I show that by assuming the attacker wins. The model cannot be made injection-proof; it is a probabilistic reasoner reading attacker-supplied text. The question is what that manipulation buys the attacker – which moves the problem from prompting, where it cannot be solved, to architecture, where it can.

Split the attacker’s goal in two. The first is to suppress a finding, shaping the code so the reviewer calls the backdoor clean. I will not pretend that is solved, but the bar is “does the path reproduce,” and code that is both exploitable and reproduces as benign is hard to write. When the system cannot confirm a control it says “go check this,” not “you’re covered,” and it sits on top of human review, not in place of it.

The second goal has real blast radius, and almost nobody asks about it: owning the process running the model and pivoting, because that process holds credentials. The part that reads untrusted code is treated as already compromised – it holds exactly one credential, an ephemeral key scoped to the single job it is already doing, reviewing the attacker’s own work. It cannot reach another tenant, another job, or the platform’s keys; provider calls are brokered by the trusted side, so raw credentials never touch the box reading hostile input. And this is enforced: a build check fails if a credential-bearing variable ever appears there. An attacker can sometimes win the argument with the model; he cannot turn that win into access to anything that was not already his.

Some argue any probabilistic system is unfit for security decisions and that determinism is non-negotiable. Make the opposite case: why might a system that is right most of the time still beat the deterministic tools people lean on now?

Let me concede the hard part: a probabilistic system will not make you bulletproof, and I doubt any ever will. But the demand for determinism assumes the deterministic tools we lean on now deliver the security the probabilistic ones cannot. They do not. No one was ever made bulletproof by an annual pen test, a design-time threat model, or a SAST run; those fail the same bar, just on a schedule everyone got used to.

And the determinism is about the wrong axis: a scanner is certain about whether a pattern is present, not whether it is reachable – certainty about a proxy for the thing you actually care about.

Deep reasoning about what could go wrong with a design used to live only in the annual pen test and the threat model, both photographs: true the day they are taken, drifting with every merge after. Bringing that reasoning to every pull request, even right most of the time rather than all of the time, turns “unexamined until next year” into “examined before it merges.”

And you do not have to choose. The per-PR check covers the 364 days the pen tester is not there, and you are not betting on a coin flip inside it: discovery is probabilistic, confirmation is not – the path reproduces on a real build or it does not. A team that moves from one architectural review a year to one on every change it ships has improved its posture in a way no added accuracy on the yearly review could match. That is the trade, and it is not close.

Guide: What automated pentesting alone cannot see

Don't miss