AI hallucinations and their risk to cybersecurity operations
AI systems can sometimes produce outputs that are incorrect or misleading, a phenomenon known as hallucinations. These errors can range from minor inaccuracies to misrepresentations that can misguide decision-making processes.
Real world implications
“If a company’s AI agent leverages outdated or inaccurate data, AI hallucinations might fabricate non-existent vulnerabilities or misinterpret threat intelligence, leading to unnecessary alerts or overlooked risks. Such errors can divert resources from genuine threats, creating new vulnerabilities and wasting already-constrained SecOps team resources,” said Harman Kaur, VP of AI at Tanium, told Help Net Security.
One emerging concern is the phenomenon of package hallucinations, where AI models suggest non-existent software packages. This issue has been identified as a potential vector for supply chain attacks, termed “slopsquatting.” Attackers can exploit these hallucinations by creating malicious packages with the suggested names, leading developers to inadvertently incorporate harmful code into their systems.
“If used without a thorough verification and manual validation, AI-generated code can introduce substantial risks and complexities. Junior developers are particularly susceptible to the risks of erroneous code or config files because they lack sufficient skills to properly audit the code. As to senior developers, they will likely spot an error in a timely manner, however, the increasing number of them over-rely on GenAI, blindly trusting its output,” said Ilia Kolochenko, CEO of ImmuniWeb.
Another concern is the potential for AI to produce fake threat intelligence. These reports, if taken at face value, can divert attention from actual threats, allowing real vulnerabilities to go unaddressed. The risk is compounded when AI outputs are not cross-verified with reliable sources.
Strategies to mitigate AI hallucinations
“AI hallucinations are an expected byproduct of probabilistic models,” explains Chetan Conikee, CTO at Qwiet AI, emphasizing that the focus shouldn’t be on eliminating them entirely but on minimizing operational disruption. “The CISO’s priority should be limiting operational impact through design, monitoring, and policy.”
That starts with intentional architecture. Conikee recommends implementing a structured trust framework around AI systems, an approach that includes practical middleware to vet inputs and outputs through deterministic checks and domain-specific filters. This step ensures that models don’t operate in isolation but within clearly defined bounds that reflect enterprise needs and security postures.
Traceability is another cornerstone. “All AI-generated responses must carry metadata including source context, model version, prompt structure, and timestamp,” Conikee notes. Such metadata enables faster audits and root cause analysis when inaccuracies occur, a critical safeguard when AI output is integrated into business operations or customer-facing tools.
For enterprises deploying LLMs, Conikee advises steering clear of open-ended generation unless necessary. Instead, organizations should lean on RAG grounded in curated, internal knowledge bases. “This ensures the model draws from verified information and maintains consistency with internal standards,” Conikee explains.
Testing rigor also matters. “Hallucination detection tools should be incorporated during the testing phases,” says Conikee. Before a model ever touches a live environment, security leaders should define thresholds for acceptable risk and failure modes. “The goal is not perfect accuracy, but a measurable and auditable control over where and how generative AI is used.”
By embedding trust, traceability, and control into AI deployment, CISOs can balance innovation with accountability, keeping hallucinations in check without slowing progress:
1. Implement Retrieval-Augmented Generation (RAG): RAG combines AI’s generative capabilities with a retrieval system that pulls information from verified data sources. This approach grounds AI outputs in factual data, reducing the likelihood of hallucinations.
2. Employ automated reasoning tools: Companies like Amazon are developing tools that use mathematical proofs to verify AI outputs, ensuring they align with established rules and policies. These tools can provide a layer of assurance, especially in critical applications.
WSJ
3. Regularly update training data: Ensuring that AI systems are trained on current and accurate data can minimize the risk of hallucinations. Outdated or biased data can lead AI to generate incorrect outputs.
4. Incorporate human oversight: Human experts should review AI-generated outputs, especially in high-stakes scenarios. This oversight can catch errors that AI might miss and provide context that AI lacks.
5. Educate users on AI limitations: Training users to understand AI’s capabilities and limitations can foster a healthy skepticism of AI outputs. Encouraging users to verify AI-generated information can prevent the spread of inaccuracies.
Victor Wieczorek, SVP, Offensive Security, GuidePoint Security explains: “We need practical guardrails. That means tying AI responses directly to documented policies, flagging or logging high-risk outputs, and making sure a human reviews anything significant before it reaches customers. Treat the model like a new intern: it can help draft ideas and handle routine questions, but shouldn’t make the final call on anything sensitive.”