Sophos uncovers AI-powered malware lab built for EDR evasion
A threat actor used AI technologies to build a malware-testing framework for developing and refining endpoint detection and response (EDR) evasion techniques, according to Sophos.
The investigation began after an anomalous endpoint in a customer environment triggered alerts tied to malicious payloads originating from a testing directory. The files pointed to a broader framework focused on evading detection.
The environment contained Cobalt Strike profiles designed to disguise beacon traffic as legitimate web requests, a Telegram-based command-and-control mechanism, shellcode injection tools, and a Cloudflare Worker used to conceal backend infrastructure.
Sophos linked the activity to ransomware deployment and data theft operations but did not identify the group involved.
“We are not disclosing the ransomware group at this time due to ongoing active investigations related to this threat actor. However, it is a group that is currently active and impacting organisations globally, including in the United States,” Rafe Pilling, Director of Threat Intelligence at Sophos, told Help Net Security.
AI-generated scripts and automated discovery
Researchers found multiple Python scripts, many of them written in Russian, that appeared to be partially AI-generated, along with a Git repository containing an automated Active Directory discovery panel and a malware-testing lab used to evaluate payloads against Sophos, CrowdStrike, and Microsoft Defender protections.
The Active Directory discovery component collected information from completed tasks, selected follow-up actions from predefined workflows, dispatched tasks to remote agents, and reevaluated results as they were returned. While the behavior resembled AI-driven automation, it did not represent an autonomously reasoning LLM.
“Artifacts within the Git repository suggest that the threat actor identified potential bypass techniques from research blogs published by organizations such as Kaspersky, Palo Alto Networks, and Bishop Fox,” Sophos researchers wrote.
“Information was also sourced from X and Telegram, although it is unclear if these sources influenced the tool development.”
Dedicated testing lab
The lab consisted of several Windows Server 2022 virtual machines used to test payloads against different EDR products. One system was dedicated to Sophos, another to CrowdStrike, while a third served as a control environment without EDR software installed. A fourth Ubuntu virtual machine hosted a Sliver command-and-control server.
Multiple AI agents operated within the framework. A Claude Opus 4.5 agent coordinated activity and set rules for the other agents, while additional agents handled EDR testing, documentation, OPSEC hardening, proxy stress testing, and virtual machine deployment.
The setup relied on Model Context Protocol (MCP), an open standard that enables AI assistants to interact with external tools and data sources, connecting the agents to Git repositories.
The threat actor used Ludus, a platform for rapidly deploying and managing virtualized security testing environments, to provision the lab infrastructure and relied on Cursor, an AI-native integrated development environment, during the malware development process.
The AI agents were tasked with reading security research, extracting attack techniques, mapping them to the MITRE ATT&CK framework, preparing test environments, executing experiments, and reporting the results.
The findings suggest the threat actor presented the project as a red-team framework while interacting with Claude. Asked about the use of such framing in attempts to bypass safeguards, Sophos pointed to a broader pattern observed in recent attacks.
“Attempts to bypass model safeguards using benign framing for malicious prompts, such as the use of a red team pretext, have been observed in a number of cases over the past year, including in attacks recently reported targeting government entities in Mexico. We have been in touch with Anthropic regarding our observations,” Pilling noted.
At the core of the framework was a Python-based payload generation tool that produced custom Windows executables and DLLs, a type of Windows library file that programs can load and execute. The payloads incorporated encryption, evasion, and alternative execution techniques and were then used for testing.

Diagram showing AI’s role in the malware development workflow (Source: Sophos)
Sophos said the tool supported nearly 80 modules used to test more than 70 evasion techniques.
Questions over reported success rates
Documentation generated within the framework suggested the evasion modules became increasingly successful after repeated testing and refinement. However, the available test data reviewed during the investigation did not support those claims.
“We don’t have the data to fully account for the discrepancies, but it’s likely that common large language model issues, such as hallucinations, played a role in the differences observed,” Pilling concluded.
Despite the use of AI agents, Sophos said the defensive fundamentals remain unchanged, including patching, MFA, passkeys, and endpoint protection.