What the AI patch gap means for enterprise security

Open-source maintainers are receiving more vulnerability reports than they can act on, and a rising share now comes from an AI system working at machine speed. Over roughly two months this spring, Anthropic’s Claude Mythos Preview combed through more than 23,000 open-source code paths and routed verified findings to the projects that own them. Tuskira studied what happens to those findings once they reach human hands.

AI patch gap

The program reported 1,596 verified vulnerabilities, spread across hundreds of projects over a window of about nine weeks. Six external security research firms triaged the findings before they reached maintainers.

The findings hold up under review. Outside firms confirmed a true-positive rate of 90.8 percent on the subset they checked, a sign that the volume reflects real bugs. Severity is the softer edge. Anthropic and vendor reviewers agreed on exact severity in well over half of cases and landed within one level almost every time, with the model grading issues as more critical than maintainers did. Anthropic itself describes confirmed true positives as one measure of impact, and points to patch counts as the more telling lagging indicator.

The trouble starts after disclosure. Discovery ran at roughly twenty-five verified vulnerabilities a day, and credited repairs moved at closer to one and a half. Tuskira distills that imbalance into a single ratio of about 16.5 to one.

The researchers call the growing backlog the vulnerability deficit. Each day the program hands maintainers far more findings than they visibly close, and the pile of open issues widens by roughly two dozen. “Enterprises need to operate at discovery cadence, not only remediation cadence,” the firm writes.

Maintainers themselves are responsive. The median time to acknowledge a report came in near a fifth of a day. Acknowledgment and repair sit far apart. About 6 percent of the disclosed vulnerabilities carried an upstream patch at the snapshot, a figure the researchers treat as a lower bound because some maintainers ship fixes quietly.

A second delay sits downstream of the maintainer. Advisory databases need time to ingest a fix, commercial scanners need time to refresh, and enterprises need time to test a patch before it touches production. Most programs begin serious work once an advisory goes public. About 95 percent of the Mythos disclosures had no public advisory on the snapshot date. The span from private disclosure to a deployed enterprise fix runs, in the report’s structural estimate, somewhere between three and five months.

Deploying a patch carries its own hazard. A fix to a memory-safety bug can change timing. A stricter input check can reject payloads that worked before. A dependency upgrade can force a chain of version changes. Validation for ordinary language packages commonly runs two to six weeks, and longer for embedded, cryptographic, or regulated components. During that stretch the vulnerability may be public, exploit tooling may circulate, and production may still run the old code.

One upstream finding rarely stays one alert. A single ImageMagick flaw can propagate to eighteen or more downstream package variants, and distribution rebuilds carry source-only fixes across many separate feeds. The count that matters for a defender is every reachable affected instance in production, which climbs higher than the upstream tally suggests.

Tuskira’s answer reframes patching as a decision problem. Its model rests on four questions: whether the vulnerable code path runs in production, who can reach the exposed instance, whether the environment shows signs of active exploitation, and whether existing controls already block the exploit. Those answers route each finding into an emergency lane, a staged lane, or a documented deferral.

A worked example carries the point. A critical flaw in nginx might appear to threaten an entire fleet, illustrated as 1,200 instances. Successive filters narrow that population quickly. Counting the compiled module, the enabled feature, the vulnerable configuration, and public exposure leaves three instances that are public, unauthenticated, and missing a web application firewall. Emergency action goes to those three. The rest move through slower lanes with recorded reasoning.

The pattern reaches past any single program. OSS-Fuzz logged more than 13,000 vulnerabilities across nine years of operation, and Mythos reached a meaningful fraction of that corpus in a sliver of the time. More AI discovery efforts are coming online, which raises the prospect of even higher discovery cadence across the ecosystem.

The CVE feed now arrives late, and the useful signal shows up earlier in upstream commits, transparency-log changes, and the security firms credited on advisories. Reading that early signal, and knowing in advance which dependencies run in production, is becoming the core of the work.

Guide: What automated pentesting alone cannot see

Don't miss