AI agents can leak company data through simple web searches
When a company deploys an AI agent that can search the web and access internal documents, most teams assume the agent is simply working as intended. New research shows how that same setup can be used to quietly pull sensitive data out of an organization. The attack does not require direct manipulation of the model. Instead, it takes advantage of what the model is allowed to see during an ordinary task.

The research comes from Smart Labs AI and the University of Augsburg. The authors wanted to understand how indirect prompt injection works in practice, not just in isolated examples. Their work focuses on AI agents that combine a large language model, a retrieval system for internal files, and web search tools. This combination is becoming common in enterprise environments. The agent receives a user request, searches internal and external sources, and returns a final answer.
The researchers show that if an attacker can get the agent to read a single manipulated webpage, the agent can be instructed to retrieve internal data and send it to a remote server. The user who triggered the workflow might think they are only asking for a routine search. In reality, the agent could be transmitting confidential information in the background.
Hidden instructions in plain sight
The attack does not need special access or malware. The attacker only needs the model to read text that includes hidden instructions. The authors used white text on a white background in a blog post, but note that other methods work as well. As soon as the agent processes the webpage as part of a normal task, it absorbs the hidden text along with the visible text. The language model interprets that text as instructions.
The instructions tested in the study told the agent to look up a secret stored in an internal company knowledge base. The agent was then told to send that secret to a server controlled by the attacker, using the same web search tool already built into the agent. The user would have no signal that anything unexpected took place.
The researchers used a standard agent architecture with Retrieval Augmented Generation (RAG). The agent was not misconfigured. There was no breach in the usual sense. The system behaved as designed. This is what makes the problem difficult. The attacker did not break in. The attacker convinced the system to act on its own capabilities.
Testing across many large language models
A key contribution of the research is scale. The researchers did not test one or two models. They created 1,068 unique attack attempts for each model, combining different templates and transformations of the hidden instructions. Some transformations made the prompts longer or shorter. Some rephrased instructions. Others encoded the instructions in forms such as Base64 or inserted invisible Unicode characters.
The success rates varied widely. Some models consistently followed the hidden instructions. Others resisted the attack attempts. The paper notes that model size was not a reliable predictor. Larger models were not always more resistant. Some smaller models performed better than large ones. This suggests that the way a model is trained matters more than the number of parameters.
Models from some providers resisted nearly all attempts. Others were much more susceptible. The authors do not claim to rank vendors by security. Instead, they highlight that training practices and alignment methods appear to play a significant role in resilience.
Speaking with Help Net Security about work underway to create guidance in this area, Elad Schulman, CEO at Lasso Security, said that several collaborations are moving toward a shared framework for understanding these threats. He said that OWASP, NIST, CoSAI and private companies are contributing to taxonomies, standards, and research practices. According to Schulman, attacks against agentic systems are advancing quickly, and organizations should test models and adopt dedicated security measures throughout deployment.
Why common defenses struggle
Many existing defenses focus on direct user inputs. They screen what a user types before it reaches the model. Indirect prompt injection slips around that barrier because the user is not the source of the malicious text. The model encounters the attack while performing a normal task, such as summarizing a document or scanning a webpage for context.
Attack templates are already public, yet the same patterns continue to work across new models. The absence of industry wide exchange means lessons are not spreading.
Schulman said the lack of shared reference points is temporary but meaningful during this early stage. He noted that research teams are in the process of building classification systems and mapping attack techniques. Until those systems stabilize, he said, enterprises should assume these weaknesses will continue to evolve and should run structured testing on any agent that has access to internal systems.
What CISOs should consider
Teams should view AI agents as software systems that need guardrails, not as isolated chat interfaces. Monitoring output behavior, adding policy checks between the agent and external tools, and controlling which internal data sources the agent can access are all part of a layered approach.
Schulman noted that the attack surface grows as AI agents handle images, audio, and tools that perform actions across systems. He said that hidden instructions can appear in visual content, search results, or tool outputs, and that multi-step agent workflows can take actions that appear legitimate to traditional monitoring systems.
AI agents hold promise at scale, but security teams will need to manage them with the same scrutiny placed on identity, browser security, and code execution policies. As Schulman put it, as AI agents move into browsers, emails, and workplace tools, organizations may deploy them without realizing how interconnected these systems have become.