What vibe hunting gets right about AI threat hunting, and where it breaks down
In this Help Net Security interview, Aqsa Taylor, Chief Security Evangelist, Exaforce, explains vibe hunting, an AI-driven approach to threat detection that inverts traditional hypothesis-driven methods.
Instead of analysts defining attack vectors upfront, the AI scans datasets for anomalous patterns and surfaces potential threats. Taylor draws a firm line on responsibility: analysts must be able to explain their reasoning. When they cannot, the AI is steering the hunt. She also addresses enrichment, junior analyst development, and the failure modes that emerge when teams follow AI output without questioning it.

Hypothesis-driven hunting has been the gold standard for years. Does vibe hunting challenge that model, or does it just change who, or what, generates the hypothesis?
Hypothesis-driven hunting still applies for threat hunting but what changes is the validation to prove hypotheses. For example, the analyst thinks an adversary with initial access via a compromised identity would use a CreateAccessKey action to establish persistence. The analyst would then start looking for evidence to support that hypothesis. The hypothesis itself is legible. You can critique it, analyze it, refine it, document it, and quantify it.
Whereas when you’re doing vibe hunting, it’s a bit different. You invert the approach slightly. You let the AI find patterns in the dataset you have. Specifically, if the AI or an LLM is trained on secure data and focused on security analysis, you ask it to look for patterns within that data. From those patterns, it then identifies what it considers malicious or anomalous.
In other words, when you’re doing hypothesis-driven hunting, you have a defined set of hypotheses and attack vectors that you’re searching for, ones you think may or may not apply in a given environment. Your goal is to verify whether they apply.
When you’re doing vibe hunting, the approach is different. You consider the entire dataset and ask the LLM, “What could be applicable in this specific use case? What could be a potential attack vector? Is there anything here that doesn’t fit within the dataset?” By doing so, you invert the traditional hunting approach, making the hypothesis implicit rather than explicit.
There’s a difference between AI accelerating a hunt and AI steering one. Where do you draw that line, and who is responsible when it gets crossed?
This is a tricky question because, in some cases, the hunt can begin with the AI itself. The AI may flag or identify activity it considers malicious based on patterns that the analyst, engineer, or hunter may not be aware of. In that situation, the AI is effectively steering the initial direction of the hunt.
As the process continues, the analyst or detection engineer starts to build context and develop an understanding of what is happening. At that point, they begin contributing their own reasoning and use the AI to accelerate the investigation rather than define it.
So where do you draw the line, and who is responsible when it gets crossed?
The line is drawn at the point where the analyst can no longer explain, in their own words, why they are pursuing a particular line of investigation. If they cannot articulate the reasoning behind the hunt, then they are no longer directing it. The AI is.
Responsibility follows that same boundary. The analyst is responsible when they are driving the reasoning and using AI as a tool to move faster. If they defer that reasoning to the AI and cannot independently justify the path they are taking, then the AI is effectively steering the hunt, even though accountability still rests with the human.
Enrichment is where hunts historically slow to a crawl. Mapping a single event, like a CreateAccessKey call, to whether that behavior is normal for a specific identity in a specific environment requires deep contextual knowledge. How does an AI system build that understanding without years of analyst institutional memory baked in?
Enrichment is where hunts historically slow down because that context is not readily available in a structured or accessible way. The key to solving this is not just better models, but better context.
AI models need to operate on a knowledge graph based on that institutional knowledge and turn it into a structured, queryable layer. This includes business context, ownership mappings, and operational patterns. More importantly, it requires a semantic context layer that maps identities, roles, resources, and their relationships across the environment. This semantic layer should also incorporate historical baselining, so the system understands what “normal” looks like for a specific identity over time.
Once you have that, the AI is reasoning over a rich graph of relationships and behavioral history. A CreateAccessKey event is no longer just an API call. It becomes an action performed by a specific identity, within a known role, tied to certain resources, compared against its historical behavior and peer group patterns.
At that point, enrichment becomes significantly more effective. The AI can make context-aware judgments that are much closer to what an experienced analyst would do. It is not replacing that expertise, but it is operationalizing it at scale.
Junior analysts have traditionally learned threat hunting by suffering through the slow, manual version first. If AI abstracts that pain away, what replaces it as the mechanism for building genuine analyst judgment?
I don’t see vibe hunting as replacing the knowledge that comes from “learning from scratch.” I see it as elevating and scaling that experience more quickly. Instead of spending hours sifting through noise to find the signals they need, analysts spend their time making judgment calls on whether the analysis presented to them will support the right decision.
They focus on investigating effectively and making correct judgments. This includes asking the right questions, ensuring the relevant signals are included in the context, and following the investigative path a seasoned analyst would take by leveraging institutionalized knowledge quickly as needed, learning through the steps, and benefiting from the explainability provided by the right AI model.
Security teams have been burned before by tools that promised to compress the hard parts and delivered false confidence instead. What would a failed vibe hunting implementation look like in practice, and how would you know you were inside one?
A failed vibe hunting implementation shows up when analysts stop thinking critically and start relying on the AI to drive the hunt end to end. Instead of forming hypotheses or asking targeted questions, they simply prompt the model and follow whatever leads it produces.
At that point, the hunt becomes AI-steered rather than analyst-driven. Analysts chase patterns flagged by the model without questioning them. They do not validate the data, examine the context, or ask basic questions like why the pattern is suspicious, where the signal came from, or whether it is grounded in real data versus model error.
This creates a false sense of productivity. Teams may appear to be running more hunts, but those hunts do not lead to meaningful outcomes. Instead of improving detection quality, they generate noise and shallow conclusions. This is where false confidence sets in.
There are warning signs that you are inside this failure mode.
One sign is that analysts spend most of their time closing AI-generated leads rather than developing or refining them. Hunt reports become summaries of what the AI suggested, not what the analyst concluded. The reasoning is missing. There is no articulation of what was tested or why.
Another sign is that analysts cannot explain the threat model behind a hunt. If they cannot answer what they are trying to validate or why a path was pursued, then the hunt is not grounded in intent. It is just following a trail.
A third sign is a breakdown in trust within the team. Senior analysts start re-running hunts manually because they do not trust the AI output. At the same time, they begin to question the quality of work produced by junior analysts who rely heavily on the model.
In practice, a failed implementation does not reduce effort or improve insight. It replaces critical thinking with automation and produces more activity, but less understanding.