AI got it wrong with high confidence. Now what?

In this Help Net Security interview, Christian Debes, Head of Data Analytics & AI at SPRYFOX, talks about the growing gap between what AI models do and what their operators can explain. He argues this gap is already a liability, particularly when decisions affect people or money and no one can say why a model produced a certain output.

Debes walks through how responsible teams approach confident wrong answers, why procurement leaders bear accountability when AI systems fail, and what explainability means as a translation layer between technical teams and business operators. He also addresses the EU AI Act and its risk of producing compliance theater, and closes with a frank assessment of where AI infrastructure is headed if explainability does not keep pace with model complexity.

AI explainability accountability

There’s a growing gap between what a model does and what its operators can articulate about why it does it. At what point does that gap become a liability rather than an acceptable engineering tradeoff?

I think that point is already here. It already is a liability in many cases although we don’t always notice it immediately. Machine learning models have always been difficult to fully explain (even relatively simple ones). But the scale of the models has changed. We went from models where you could at least inspect feature importances or trace a decision path to systems where even the people who built them can only give you approximations of why a particular output was produced.

The moment this becomes a real liability rather than just an acceptable engineering tradeoff is when decisions based on these outputs affect people or money and nobody in the room can answer the question “why did it say that.” Think of the risk when the reasons behind credit decisions, fraud flags, medical recommendations are not understood and cannot be challenged.

That might happen because the model performed well on an individual benchmark meaning that the team were satisfied and did not examine further. Traditional monitoring of machine learning solutions focus on drifts in the model (typically caused by data drifts) and hard performance numbers. They almost never focus on measuring explainability.

Of course we don’t even see this gap between what a model does and what operators can articulate most of the time. Everything looks fine when the model is right. The liability surfaces only when something goes wrong and by then you’re explaining to a regulator or a court why you deployed something that you couldn’t explain yourself.

When a transformer model produces a wrong answer with high confidence, and nobody on the team can reconstruct why, what does a responsible engineering team do next? What do most teams do?

A responsible team treats this as a serious incident, not an unfortunate, rare datapoint that doesn’t hurt the overall aggregated numbers. You stop and investigate. Experienced data scientists have many tools for this. Our approach is first to ask: Is this a training or an inference issue. Second, we look at similar inputs and check if the failure is systematic or isolated. Third, we try to understand the confidence calibration.

A model being confidently wrong is often a sign that something is fundamentally off in how it learned. Finally we go into explainability. For classical machine learning methods we can trace back to the data points that caused this wrong but confident decision (SHAP and LIME are such examples). In modern LLM systems we have methods such as mechanistic interpretability, that although technically very different to the classical methods, give answers similar questions: Which tokens of an input text led to this decision.

Unfortunately what most teams do is log it, maybe add it to a test set, and move on, because the model works well 98% of the time and there’s pressure to ship, to iterate, to deliver the next feature. From my experience debugging confident wrong answers is deep investigative work and to do it properly can take a long time. Many organizations don’t budget for that. They budget for building new features, not to deeply understand why existing models occasionally fail.

When there are issues, the wrong predictions with high confidence often tell you more about your model than all the thousand correct ones. They show you the limits of the model, not the happy path.

Procurement teams and executives often greenlight AI systems they don’t understand, trusting vendor assurances. How culpable are they when those systems fail, and how does explainability factor into accountability?

I see this constantly and I have some sympathy for both sides. Executives have to make purchasing decisions about technology that moves faster than anyone can reasonably follow, and they rely on vendor assurances as there is often no other alternative. They don’t have the in-house expertise to evaluate these systems technically (as well as the capacity to monitor this fast-moving space).

That said, “I trusted the vendor” has never been a great defense when something goes wrong, and it won’t be in AI either. If you procure a system that makes consequential decisions and you cannot explain (even at a high level) how it works, what data it was trained on, and what its known limitations are, that’s simply a governance failure.

Explainability plays a very important role here: I always think of explainability not as a common language but more as a translation layer. It allows the translation of technical aspects into domain language and as such acts as the bridge between the team that built the system and the business that operates it.

If a vendor cannot explain and document to a procurement team or executives how their model arrives at decisions in language that a non-specialist can follow, that’s an immediate red flag. There is no necessity for the buyer to understand transformer architectures, but a vendor who can’t explain their own system simply and clearly might not fully understand it themselves. Or worse, they understand it and are choosing not to be transparent about the limitations. The large scale models we operate with these days make this explainability more difficult, but that’s the investment a good vendor will make

The EU AI Act creates binding transparency obligations for high-risk systems. Is the industry technically prepared to meet those requirements, or are we heading toward widespread compliance theater?

I’ll be direct: for many organizations, this will start as compliance theater. The EU AI Act has requirements including transparency about training data, documentation of model limitations, human oversight mechanisms for high-risk systems, etc. These are reasonable requirements. But meeting them properly requires a level of ML engineering discipline and governance that many companies (including large ones) haven’t built up yet.

What I expect to see is a wave of documentation that looks thorough on paper but doesn’t actually help anyone understand or audit the system. The easiest way is to put a compliance wrapper around an operational black box to tick some checkboxes and I expect to see quite a few of these.

The companies that will do this well are the ones that already invested in understanding their own models before the regulation forced them to. Good documentation, proper experiment tracking, test cases, meaningful evaluation beyond accuracy metrics, building models with audits in mind. These are practices that good ML teams have followed for years. The AI Act doesn’t invent anything new, it just makes mandatory what a good ML team should do anyway. And maybe that’s actually a good thing, because it distinguishes between teams that follow good practice and teams that cut corners.

If explainability remains unsolved at the current pace of model complexity, what does the AI landscape look like in a decade? Are we building critical infrastructure on foundations we cannot audit?

The honest answer is: explainability will not keep pace with model complexity if we continue as we are. In a decade we will have many systems that are unauditable because they are built on top of models that are larger, more capable, and more opaque. But that doesn’t mean we should stop building, it means we have to get much more disciplined about how we build.

However, whether or not critical infrastructure is at risk is a choice. ML engineers need to choose between documenting and not; those responsible for the infrastructure (including purchasers of AI software) may need to choose between short-term monetary gains versus investing in robust systems; regulators need to choose where to draw the line in order to protect critical systems while not having a detrimental impact on innovation.

Don't miss