The Problem
Air Canada's chatbot told a grieving passenger he could buy a full-price ticket and claim a bereavement refund within 90 days. That policy did not exist. The chatbot invented it. When the passenger asked for his refund, Air Canada said no — the real policy prohibited refunds after travel. The passenger sued, and the airline tried something remarkable: it argued the chatbot was a "separate legal entity" responsible for its own mistakes.
The tribunal rejected that defense completely. Tribunal Member Christopher Rivers ruled that "Air Canada could not separate itself from the AI chatbot." The court said there is no meaningful difference between a human agent, a static webpage, and an interactive bot. All of them speak for your company. All of them bind your company.
This was not a one-off glitch. This was a predictable failure of how most enterprise AI systems work today. Your chatbot does not "know" your policies. It predicts what a good answer sounds like based on statistical patterns. When it sounds confident, your customers trust it. When it is wrong, you pay for it. The Moffatt v. Air Canada ruling established that AI hallucinations — responses the model fabricates with confident authority — are legally classified as negligent misrepresentation. If your AI says it, your company signed it.
Why This Matters to Your Business
The financial exposure here is not theoretical. Global losses from AI hallucinations reached $67.4 billion in 2024. That number includes direct payouts, regulatory fines, legal fees, brand damage, and the hidden cost of employees manually checking AI work.
That hidden cost is staggering on its own. Forrester Research estimates each enterprise employee spends roughly $14,200 per year on hallucination mitigation — verifying outputs from AI systems that cannot be trusted on their own.
The Moffatt damages were small — about $800. But the precedent is what matters. Here is what it means for your organization:
- Any chatbot discussing pricing, refunds, warranties, or service terms can now create binding contract variations. Your AI can rewrite your corporate commitments in real time.
- The "black box" defense is dead. You cannot argue that you did not understand how the AI arrived at its answer. The court does not care about the technical complexity of your system.
- The "correct info was on our website" defense failed. Air Canada pointed to its static pages with the real policy. The tribunal ruled that does not matter — your AI's statement counts too.
- Class-action risk is growing. The ruling signals to plaintiff attorneys worldwide that AI output is actionable. One hallucination about a financial product could generate millions in liability.
The market for hallucination detection tools grew 318% between 2023 and 2025. That growth reflects an industry in crisis mode. Your competitors are scrambling. Your regulators are watching. Your board needs to understand this is not an IT problem — it is a balance-sheet risk.
What's Actually Happening Under the Hood
To understand why this keeps happening, you need to understand one thing about how Large Language Models work: they predict the next word. That is it. They do not look up your refund policy in a database. They do not reason through your legal terms. They generate text that sounds statistically likely based on patterns in their training data.
Think of it this way. Imagine you hired a brilliant new employee who memorized thousands of company handbooks from hundreds of different airlines. When a customer asks about your refund policy, this employee does not check your specific handbook. Instead, they compose an answer that sounds like something a refund policy would say, drawing on everything they have ever read. Sometimes they get it right. Often enough, they do not.
Even the best models — Google's Gemini 2.0, OpenAI's GPT-4o — retain a baseline hallucination rate between 0.7% and over 25%, depending on task complexity. A 0.7% error rate sounds small. But if your AI handles one million customer queries per month, that is 7,000 potential regulatory violations, incorrect financial statements, or phantom refund promises.
The real danger: these models are "confident but wrong." They state fabrications with the same authoritative tone they use for facts. Your customers cannot tell the difference. Neither can your frontline staff reviewing transcripts after the fact.
Many vendors sell Retrieval-Augmented Generation (RAG) — a technique where you feed the AI your actual source documents — as the fix. But the Air Canada chatbot likely had access to the correct policy. It even provided a link to it. The bot still summarized the policy incorrectly. Giving the AI the right information is not enough when the reasoning engine itself is unreliable. RAG provides knowledge. It does not guarantee the AI will follow it.
What Works (And What Doesn't)
Before investing in a solution, you should understand what will not protect you:
Prompt engineering alone: Asking the AI to "only answer based on company policy" is a suggestion, not a constraint. The model can and will ignore it under certain conditions. Prompt instructions are not enforceable guardrails.
Disclaimers and fine print: Air Canada had correct information on its static webpages. The tribunal ruled that did not shield them. Telling customers to "verify with official sources" will not undo a binding promise your AI already made.
Naive RAG without validation: Simply feeding your documents into a vector database and hoping the AI reads them correctly failed in the very case that set this precedent. Semantic similarity search can retrieve the wrong document — pulling "refund processing times" when the question was about "refund eligibility."
What does work is a Deterministic Action Layer (DAL) — a system that separates conversation from compliance decisions. Here is how it works in practice:
Input classification: When a customer sends a message, a semantic router — a system that reads the intent behind the words — determines whether the question touches a restricted topic like refunds, pricing, legal terms, or warranties. This router sits outside the AI model, so prompt injection attacks designed to trick the AI cannot bypass it.
Deterministic processing: If the query hits a restricted topic, the AI model is not allowed to generate an answer. Instead, the system executes hard-coded logic — actual software code that checks your database. For a refund question, it runs something like: "If ticket status equals 'traveled,' return 'no refund permitted.'" The decision comes from your rules, not from probability.
Controlled output: The code returns a verified result. The AI's only job is to wrap that result in a polite sentence. Output validation then confirms the final response matches the data returned by the code. If the system encounters a question where no rule exists, it does not improvise. It says: "I cannot answer that directly. Let me connect you with a specialist."
This approach — called neuro-symbolic architecture, which combines AI's language skills with strict rule-based logic — gives you something no pure AI wrapper can: a complete audit trail. Every decision traces back to a specific rule, a specific database query, and a specific code path. When a regulator or a court asks "why did your system say that," you can show exactly what happened and why.
This audit capability directly addresses GDPR Article 22, which gives individuals the right to an explanation when automated systems make decisions that affect them. It also satisfies the EU AI Act's Article 14 requirement for human oversight and the documentation standards required by ISO 42001 for AI management systems.
For government and public sector organizations handling citizen-facing services, this architecture is not optional. When your AI discusses benefits eligibility, tax obligations, or permit requirements, every answer carries the weight of government authority. A hallucinated policy from a government chatbot does not just create liability — it erodes public trust in digital services.
The core principle is simple: let AI handle language. Never let AI handle decisions. Your policies belong in code, not in probability. You can read the full technical analysis for the detailed engineering blueprint, or explore the interactive version for a guided walkthrough of the architecture.
Your AI should be a translator, not a decision-maker. It translates customer questions into database queries and database answers into human language. The deciding happens in code you control, audit, and update — not inside a neural network you cannot inspect.
Key Takeaways
- Courts now treat AI chatbot statements as legally binding company promises — the 'it's just a bot' defense was rejected in Moffatt v. Air Canada.
- Global losses from AI hallucinations hit $67.4 billion in 2024, with enterprises spending roughly $14,200 per employee per year just verifying AI outputs.
- Feeding AI your correct documents (RAG) is not enough — Air Canada's chatbot had access to the right policy and still fabricated a wrong one.
- Deterministic Action Layers separate conversation from compliance by routing sensitive questions to hard-coded rules instead of AI-generated answers.
- Every compliance decision should produce an auditable logic trail, not a probabilistic guess — this is what regulators and courts will demand.
The Bottom Line
Your AI chatbot is already making promises on your company's behalf, and courts will hold you to them — even if the AI invented the promise. The fix is architectural: separate what the AI says from what your business decides, using hard-coded rules for anything involving money, policy, or legal terms. Ask your AI vendor: when your chatbot tells a customer they qualify for a refund, can you show me the exact rule and database query that produced that answer?