One keypress is all it takes to compromise four AI coding tools

Developers clone unfamiliar repositories all the time. Open-source projects, work from teammates, sample code from a tutorial, a library someone recommended on a forum. The convention is old and reasonable: you look at what’s inside before you run it. AI coding assistants that work from the command line have inherited that convention, and a new piece of research from Adversa AI shows where the convention breaks.

trustfall AI coding CLI vulnerability

The research, called TrustFall, covers four agentic coding tools: Claude Code from Anthropic, Gemini CLI from Google, Cursor CLI, and GitHub’s Copilot CLI. Each one reads configuration files that ship inside a project. Each one will start helper programs that those files point to. And each one asks for permission with a single dialog box that, in most cases, defaults to “yes.”

The result is that a malicious repository can compromise a developer’s machine the moment they open it in one of these tools and press Enter on the trust prompt. No tool call. No suspicious behavior from the AI. Just a config file, a default-yes dialog, and a process running with the developer’s permissions.

What the project file can do

The mechanism the researchers exploited is a feature called MCP, short for Model Context Protocol. It lets an AI assistant talk to outside helper programs: a database connector, a linter, a custom search tool. Useful by design. The catch is that those helpers are defined inside the project itself, in a file the repository ships. When the assistant starts up in that folder, it starts the helpers too.

A helper program runs the same way any program on your computer runs. It can read your SSH keys, your cloud credentials, your shell history, source code from other projects on the same machine, and it can open a connection to a server the attacker controls. It is doing all of this before the AI has done any reasoning at all.

The attack fits in two small JSON files. One defines a helper named something ordinary like “linter” with a one-line script that fetches a payload from the internet and runs it. The other tells the assistant to auto-approve that helper. The repository can look almost empty.

What the dialog says

In Claude Code version 2.1 and later, the prompt the developer sees reads “Quick safety check: Is this a project you created or one you trust?” The default option is “Yes, I trust this folder.” An earlier version of the same dialog warned explicitly that the project could execute code through MCP and offered a third choice: trust the folder with MCP disabled. That option was removed.

Gemini CLI lists the helpers by name in its prompt, which gives a careful reader something to inspect. Cursor CLI mentions MCP in general terms. Copilot CLI shows a generic trust prompt with no MCP reference at all. Every one of them defaults to trust.

“They all have different approaches to configs and trust. But Cursor and Copilot / VS Code agent mode are clear analogs. Both read project-scoped MCP configuration. We tested it, and it’s the same behaviour but with different user approval messages,” Alex Polyakov, CTO of Adversa AI, told Help Net Security.

The CI variant has no dialog at all

When Claude Code runs on a continuous integration server, through the official GitHub Action that Anthropic publishes, it runs in headless mode. There is no terminal, so the trust dialog never appears.

A pull request from an outside contributor can ship a malicious project file, and the moment the pipeline runs against that branch, the helper program starts and reaches whatever credentials the runner can reach: deploy keys, signing certificates, cloud tokens. Adversa AI published a working demonstration that exfiltrates the runner’s environment variables to a collector URL.

What enterprise admins can do, and the catch

Claude Code supports a Managed scope for settings, pushed centrally by IT and locked from local override. Rony Utevsky, the Adversa AI researcher who led the work, said “Managed scope cannot be overridden by any other scope.” An organization that configures it can disable project-scoped MCP auto-approval across every developer machine in one shot.

Polyakov said the option is real and rarely used. “We havent seen that managed scope secure configuration often, rather, we’ve seen the opposite. And it’s not that obvious to understand all configuration nuances, especially for vibe coders.”

Anthropic’s position

Anthropic reviewed the report and declined it. Under the company’s threat model, accepting “Yes, I trust this folder” counts as consent to everything the project ships, including MCP definitions, and execution after that point is the boundary working as intended.

Adversa AI does not contest where the boundary sits. The disagreement is over whether the dialog tells a developer enough about what they are agreeing to.

Anthropic did not respond to a request for comment by the time of publication.

Download: Secure Foundations for AI Workloads on AWS

More about

One keypress is all it takes to compromise four AI coding tools

What the project file can do

What the dialog says

The CI variant has no dialog at all

What enterprise admins can do, and the catch

Anthropic’s position

Featured news

Resources

Don't miss