91% noise: A look at what’s wrong with traditional SAST tools
Traditional static application security testing (SAST) tools are falling short. That’s the key takeaway from a recent report that tested these tools against nearly 3,000 open-source code repositories. The results: more than 91% of flagged vulnerabilities were false positives.
The Exorcising the SAST Demons report comes from Ghost Security, which scanned public GitHub projects in Go, Python, and PHP. The study focused on three vulnerability types commonly found in real-world apps: SQL injection, command injection, and arbitrary file upload.
The research also looked at how much time teams could save by using AI to triage alerts. On average, manual review takes about 10 minutes per finding. That adds up fast when you’re looking at thousands of alerts. Across the three language and framework combinations, AI-assisted triage saved over 350 hours.
Key findings
- Of 2,116 flagged issues, only 180 turned out to be real vulnerabilities.
- Python/Flask command injection checks were the worst offenders: 99.5% of the alerts were false positives.
- In Go projects using the Gin framework, 80% of SQL injection alerts were false.
- PHP/Laravel fared slightly better, with 10% of file upload warnings confirmed as real.
False positives slow down triage, burn analyst time, and force teams to either tune their tools aggressively or ignore anything rated below “high severity.”
Why traditional SAST falls short
Legacy SAST engines mostly rely on pattern matching and rule-based scanning. These approaches struggle with context. A static scanner may see a risky function call but can’t tell if the input is actually controlled by an attacker or if mitigating controls are in place.
That’s how you end up with thousands of alerts triggered by safe code paths, test files, or internal scripts. Teams get overwhelmed, and important issues slip through while they’re buried in reviews.
Why AI sees what traditional tools miss
Researchers used large language models to validate findings from traditional SAST scans. The AI pipeline checked each alert for three things:
- Is the vulnerable code reachable in real execution?
- Does user-controlled input drive the risky behavior?
- Are there mitigations in place that make the flaw non-exploitable?
Only alerts that met all three criteria were marked as true positives. Human analysts still made the final call, but the AI did most of the heavy lifting.
This setup helped cut review time significantly. In the Python/Flask example, 1,166 potential issues were flagged. Only six were real. Manually reviewing that list would have taken close to 200 hours. The AI narrowed it down to a manageable handful.
The report also points out that many of the most serious application security flaws don’t follow predictable patterns. Bugs like broken access control, race conditions, or logic flaws don’t show up as a risky function or tainted input. They show up when the system behaves in an unintended way.
One example from the report showed a money transfer function that didn’t check whether the source account belonged to the authenticated user. In another case, database writes weren’t wrapped in a transaction, creating a race condition where attackers could overdraw an account by flooding requests. These aren’t issues a regular SAST tool is built to catch.
Toward context-aware scanning
The report ends with a call for a new approach: Contextual Application Security Testing (CAST). The idea is to build systems that understand how an application works, not just what its code looks like. That means modeling user roles, execution paths, data flows, and logic.
This kind of context-aware detection is still emerging, but it points to where the industry is heading. It’s not about replacing human analysts or traditional tools. It’s about reducing the noise and helping teams zero in on real risk faster.
The shift toward AI-assisted triage could reshape the role of the security team in the years ahead.
“As AppSec programs mature their capabilities to detect and triage more meaningful issues powered by AI, they will be spending far less time extracting meaningful value from their tools and far less time spent by developers fixing the wrong issues,” Brad Geesman, Chief Architect Ghost Security, told Help Net Security.
That shift, he explained, is strategic. “This will allow analysts to escape ‘firefighting mode’ and open the door to allow them to do what they should be doing: fostering better security to developer relationships and tooling integrations, implementing strategic initiatives to pay down business specific risks, and maximizing the effectiveness of their AppSec program in their organization.”
Looking further ahead, the implications are even more dramatic. “That being said in 5 years it’s feasible AI is automating 98% of this work for teams.”
For CISOs, this underscores the need to start preparing for a model where AI doesn’t just assist. It leads, and human talent shifts to oversight, design, and long-term risk reduction.
Measuring what matters in AppSec
For CISOs evaluating application security performance, traditional metrics like the number of vulnerabilities found may not paint a full picture. Geesman says more meaningful measures should focus on outcomes, not just activity.
“Many teams have low efficiency because they are spending too much time triaging noise,” Geesman explains. “Tracking the true positive to false positive ratio helps you understand the quality of the findings and the cost of acting on them.”
Another critical metric is how fast real issues get resolved. It’s about identifying which issues actually matter and ensuring they’re fixed quickly.
“Severity-adjusted time to remediate shows how well your technical and human processes are working. If you can close the loop fast on a high-risk issue, that’s a sign of a mature capability,” he says.
Geesman also suggests looking beyond findings and remediation to the systems that prevent bad code from going live in the first place.
“You want to measure the coverage and effectiveness of your security guardrails across deployment pipelines. That tells you whether your code ships with the right protections in place, and whether those protections work.”
And finally, for large organizations managing many services, Geesman recommends a service-level view.
“Track per-app security posture—scorecards that reflect the health of an application’s code, dependencies, and infrastructure. That helps teams and executives alike see where the risks really are.”
These metrics, Geesman notes, give organizations a clearer sense of whether their AppSec investments are reducing risk, not just producing alerts.