What happens when AI teams compete against human hackers

A cybersecurity competition produced what may be the largest controlled dataset comparing AI-augmented teams to human-only teams on professional-grade offensive security tasks.

The event, called NeuroGrid, ran for 72 hours on the Hack The Box platform and drew 1,337 registered human-only teams and 156 registered AI-agent teams competing across 36 challenges in nine security domains at four difficulty levels. AI teams operated through Model Context Protocol with human oversight in the loop. The analysis covers 958 human teams and 120 AI-agent teams that each attempted at least one challenge.

AI vs human hackers: Completion rates and solve ratios

AI-augmented teams completed challenges at a significantly higher rate than human-only teams, with roughly 73 percent finishing at least one challenge compared to 46 percent for human-only participants.

The solve rate advantage was largest among lower-ranked teams and narrowed steadily as skill level increased, dropping from 3.2x across all participants to 1.69x among the top 5 percent. At the elite tier, the best human team outscored the top AI-augmented team on total challenges solved.

Difficulty level shapes the advantage

The AI performance edge was not uniform across difficulty tiers. The advantage peaked at medium complexity, where mid-career analysts typically operate, then retreated on the hardest challenges. AI teams failed to complete three challenges entirely. On the easiest challenges, AI solve rates were more than double those of human teams, a gap the report identifies as an automation risk for entry-level roles.

Speed diverges by skill tier

Across all solving teams, AI-augmented teams were marginally slower on average. At the elite tier, the picture reversed sharply. Top AI-augmented teams completed challenges several times faster than their human-only counterparts, making speed the clearest operational differentiator at that level.

Domain performance varies widely

The performance gap varied by a factor of roughly three depending on the security domain. Structured and systematic domains such as Secure Coding and Blockchain showed the largest AI advantages. Creative domains such as Coding and Reversing showed the smallest gaps, particularly among elite performers where the two groups reached near-parity.

AI vs human hackers

Al advantage narrows as human expertise increases (Source: Hack The Box)

Workforce planning across career tiers

At the entry level, AI solve rates on routine tasks indicate that standard analyst work is automatable with current tooling. Entry-level staff using AI tools can produce higher output counts without developing the underlying skills to verify that output or direct agents on harder problems. Automating the tasks historically used to train junior analysts creates a gap in the pipeline that produces future senior practitioners.

At the mid-career tier, the AI advantage on medium-difficulty tasks is the strongest of any difficulty level and the highest-return target for AI tooling deployment. Speed gains at this level compound across incident response workflows.

At the elite tier, AI functions as a speed multiplier. Senior practitioners retain a capability edge on the hardest problems. Pairing elite analysts with AI co-pilots and routing the most complex incidents to human-led teams preserves that edge.

Implications for security operations

Organizations that model AI output multipliers into red-team threat scenarios will set more realistic assumptions about adversary speed and capability. Incident response windows and service level agreements built around human-only attacker timelines will underestimate the threat from AI-augmented adversaries.

AI tooling deployed by domain, starting with structured exploitation categories, delivers faster returns than uniform rollout. At the same time, retaining and developing senior operators remains a priority. Novel reasoning on hard problems is where human expertise still leads, and that capability requires sustained investment in real-world challenge training to maintain.

More about