We know GenAI is risky, so why aren’t we fixing its flaws?
Even though GenAI threats are a top concern for both security teams and leadership, the current level of testing and remediation for LLM and AI-powered applications isn’t keeping up with the risks, according to Cobalt.
GenAl as a threat or a tool (Source: Cobalt)
Pentest data reveals industry divide in LLM security
Pentesting data from the report highlights a troubling reality: LLM applications often have serious security vulnerabilities. These high-risk issues appear more frequently in LLMs than in any other type of system, showing that LLM deployments carry a particularly elevated risk.
What’s even more concerning is how rarely these serious vulnerabilities are fixed. LLMs have the lowest rate of remediation across all tested systems, leaving many critical risks unresolved. While some issues are addressed quickly, this likely reflects the easier fixes, more complex and dangerous flaws continue to pile up, creating a growing security gap.
The presence of serious vulnerabilities in pentests varies by industry, offering insight into where the greatest risks may lie. According to the report, sectors like administrative services, transportation, hospitality, manufacturing, and education show higher rates of critical security issues. In contrast, industries such as entertainment, financial services, and information services tend to have fewer serious vulnerabilities. These differences may reflect each industry’s security maturity, the complexity of their systems, or how strictly they’re regulated.
Leaders push for a GenAI pause while practitioners press on
This concerning vulnerability landscape persists despite widespread awareness of GenAI-related risks. Most organizations recognize generative AI as a top IT threat, with common concerns including sensitive data exposure and model manipulation. Yet, proactive testing and security practices haven’t kept pace.
Only 66% of organizations are regularly testing their GenAI-powered products, leaving a significant portion unprotected.
48% of respondents believe a “strategic pause” is needed to recalibrate defenses against genAI-driven threats. But that pause isn’t coming.
“Threat actors aren’t waiting around, and neither can security teams,” said Gunter Ollmann, CTO, Cobalt. “Our research shows that while GenAI is reshaping how we work, it’s also rewriting the rules of risk. The foundations of security must evolve in parallel, or we risk building tomorrow’s innovation on outdated safeguards.”
Security leaders are more likely than frontline practitioners to support a pause in GenAI deployment and to view GenAI as a threat to cybersecurity, rather than a helpful tool. This suggests a divide in perception between leadership and hands-on teams, and highlights potential overconfidence that isn’t reflected in the testing data.
Understanding key LLM vulnerabilities
When it comes to LLM vulnerabilities, there’s a disconnect between what security teams worry about and what’s actually being found in real-world testing.
Many professionals are focused on protecting data, things like sensitive information leaks, model manipulation, and training data exposure are top of mind. It’s understandable, given the high stakes around data security in AI systems.
But penetration tests are uncovering something different. The most common issues aren’t direct data leaks, they’re weaknesses like prompt injection or insecure outputs. These might seem less urgent on the surface, but they can be the very paths attackers use to get to that sensitive data.
“Much like the rush to cloud adoption, genAI has exposed a fundamental gap between innovation and security readiness,” Ollmann added. “Mature controls were not built for a world of LLMs. Security teams must shift from reactive audits to programmatic, proactive AI testing—and fast.”