Anthony Diaz, CISO, Exterro

May 28, 2025

Why data provenance must anchor every CISO’s AI governance strategy

Across the enterprise, artificial intelligence has crept into core functions – not through massive digital transformation programs, but through quiet, incremental adoption. Legal departments are summarizing contracts. HR is rewording sensitive employee communications. Compliance teams are experimenting with due diligence automation. Most of these functions are built on large language models (LLMs), and they’re often introduced under the radar, wrapped in SaaS platforms, productivity tools, or internal pilots.

It’s not the adoption that worries me. It’s the assumption of safety: the assumption that because a model is popular or “enterprise-ready,” it must also be compliant, secure, and governed. What I’ve seen instead is a dangerous blind spot: the complete disappearance of data provenance.

Why provenance, not policy, is the real line of defense

Provenance is more than a log. It’s the connective tissue of data governance. It answers fundamental questions: Where did this data originate? How was it transformed? Who touched it, and under what policy? And in the world of LLMs – where outputs are dynamic, context is fluid, and transformation is opaque – that chain of accountability often breaks the moment a prompt is submitted.

In traditional systems, we can usually trace data lineage. We can reconstruct what was done, when, and why. But in LLM-based environments, prompts aren’t always logged, outputs are sometimes copied across systems, and models themselves may retain information without clear consent. We’ve gone from structured, auditable workflows to a black-box decision loop. In highly regulated domains like legal, finance, or privacy, that’s a governance crisis.

AI sprawl and the myth of centralized control

It’s a mistake to think of AI adoption as a centralized effort. Most enterprises are already dealing with AI sprawl as dozens of tools, powered by different LLMs, are used in disconnected parts of the business. Some are approved and integrated. Others are experimented with under the radar. Each has its own model behavior, data handling policies, and jurisdictional complexity, and almost none of them were designed with security or compliance-first architecture.

This decentralization means that the security organization is no longer in control of how sensitive information is processed. A single employee might copy confidential data into a prompt, receive an output, and paste it into a system of record, effectively completing a full data cycle without triggering a single alert or audit trail.

The CISO’s challenge is no longer about access. It’s about intent, flow, and purpose, and those are nearly invisible in AI-enabled environments.

Regulations are not lagging, they’re evolving in parallel

There’s a popular belief that regulators haven’t caught up with AI. That’s only half-true. Most modern data protection laws – GDPR, CPRA, India’s DPDPA, and the Saudi PDPL – already contain principles that apply directly to LLM usage: purpose limitation, data minimization, transparency, consent specificity, and erasure rights.

The problem is not the regulation – it’s our systems’ inability to respond to it. LLMs blur roles: is the provider a processor or a controller? Is a generated output a derived product or a data transformation? When an AI tool enriches a user prompt with training data, who owns that enriched artifact, and who is liable if it leads to harm?

In audit scenarios, you won’t be asked if you used AI. You’ll be asked if you can prove what it did, and how. Most enterprises today can’t.

What modern AI governance should look like

To rebuild trust and defensibility, CISOs must push their organizations to rethink governance. That starts not with policy, but with infrastructure.

1. Continuous, automated data mapping

AI interactions don’t stop at static systems. They happen across chat interfaces, APIs, middleware, and internal scripts. Mapping must evolve to trace not just where data lives, but where it moves and what models touch it. If your mapping is snapshot-based or manual, it’s already obsolete.

2. AI-aware RoPA and processing visibility

Records of Processing Activities (RoPA) must now include model logic, AI tool behavior, and jurisdictional exposure. It’s not enough to know which vendor is used. You need to know where the model is hosted, how it was trained, and what risks it introduces in downstream processing.

3. Consent reconciliation that’s dynamic and contextual

Consent captured once is not consent for everything. Teams need mechanisms that align consent with model interaction: Has the user agreed to model-based enrichment? Is the AI system operating under the declared purpose of collection? If not, consent must be reverified or flagged.

4. Prompt and output audit logging

Where practical, interactions with AI systems should be logged, with a focus on the prompts themselves. Prompts often contain the most sensitive data and capturing them is key to understanding what information is being exposed. While logging outputs and downstream use is valuable, prompt-level logging should take priority, especially when full auditability isn’t feasible. If you can’t trace what was asked, you can’t fully assess the risk.

5. AI output classification and retention controls

Outputs from LLMs must be classified and governed. If an AI system rewrites a legal document, that output may need legal privilege controls. If it drafts internal HR language, retention timelines may apply. Outputs are not ephemeral – they are part of the data lifecycle.

The CISO’s role is changing, and that’s a good thing

AI is not just a data trend. It’s also a data event that redefines how we must think about control. Security leaders are no longer simply protecting systems or even data. We are protecting context: the metadata, intent, and legality that surround every interaction with a machine that learns and generates.

This requires CISOs to step deeper into privacy, compliance, ethics, and records governance. It means building bridges with legal teams and compliance officers to ensure that AI usage doesn’t just comply with policy but reflects the organization’s values and risk thresholds.

AI governance should not be owned by any single department. It must be led by those of us who understand risk, response, and resilience, and that makes it squarely our domain.

Traceability is the new trust

In the age of AI, it is no longer enough to say, “We didn’t know.” You will be asked like what went into the model, who approved its use, how was consent handled, can we reproduce the logic that led to that decision, where is the evidence
If your systems can’t answer those questions with confidence, you are not governing AI – you’re hoping for the best.

Trust in AI won’t come from policies. It will come from provenance. And that starts with visibility, rigor, and leadership from the very top of the security organization.

More about