The Problem
New York City's AI chatbot told business owners they could steal their workers' tips. It said restaurants could refuse cash. It told landlords they could reject Section 8 housing vouchers. Every single piece of that advice was illegal under city, state, or federal law.
In October 2023, NYC launched the "MyCity" chatbot, powered by Microsoft's Azure AI. The city marketed it as a one-stop shop for business owners navigating regulations. Instead, it became an automated machine for dispensing illegal guidance. The Markup, an investigative newsroom, discovered the chatbot was wrong on fundamental questions about labor law, consumer protection, housing rights, and tenancy law. It confidently told users they could do things that would trigger criminal charges and massive fines.
When confronted, city officials called the tool a "beta product." Mayor Eric Adams said, "You can't stay in a lab forever." But here is the problem you need to understand: when a chatbot lives on a .gov domain and carries the city's brand, citizens treat it as the government speaking. The chatbot itself, when asked directly, told users: "Yes, you can use this bot for professional business advice." That one sentence potentially overrides every disclaimer in the footer. A small business owner reading "Yes, you can take a cut of your worker's tips" does not scroll down to check the fine print. They trust the city's own tool. And that trust creates legal exposure for everyone involved.
Why This Matters to Your Business
If your agency or organization deploys an AI system that gives wrong legal guidance, you own the consequences. The financial exposure is real and specific.
Here is what the MyCity chatbot put at stake:
- Wage theft liability: Following the tip-stealing advice exposes employers to Department of Labor investigations and liquidated damages up to 100% of unpaid wages.
- Consumer protection fines: NYC's cashless ban carries civil penalties of $1,000 for a first violation and $1,500 for each additional one.
- Housing discrimination penalties: The NYC Commission on Human Rights has levied fines as high as $1 million for source-of-income discrimination. Individual violations can reach $250,000 in fines plus compensatory damages.
The legal precedent is already being set. In Moffatt v. Air Canada (2024), a Canadian tribunal held Air Canada liable for its chatbot's hallucinated bereavement fare policy. The airline tried to argue the chatbot was a "separate legal entity" responsible for its own statements. The tribunal rejected that defense entirely. The organization owns what its AI says, period.
For your organization, this means three things:
- Sovereign immunity may not protect you. When a government entity provides specific business advice through a chatbot, courts may classify that as a "proprietary function" — acting like a private consultant — which strips away traditional government liability shields.
- Citizens can use your AI's bad advice as a legal defense. Under the doctrine of "entrapment by estoppel," a business owner fined for following your chatbot's illegal guidance may argue the government told them it was legal.
- The EU AI Act classifies government-facing AI as "High-Risk," imposing strict accuracy and transparency requirements. Non-compliance triggers massive fines.
Your board and your general counsel need to know: deploying a chatbot that invents legal permissions is not a PR problem. It is a liability event.
What's Actually Happening Under the Hood
Most government chatbots are built as "thin wrappers" — a basic application layer placed on top of a general-purpose large language model like GPT-4. Think of it like putting a city seal on a generic encyclopedia that was printed three years ago and has blank pages where your local laws should be.
The core issue is that large language models are prediction machines, not truth machines. They generate the next word based on statistical probability. They optimize for what sounds plausible, not what is legally accurate.
Here is why that breaks government use cases specifically:
The sycophancy problem. These models are trained through a process called Reinforcement Learning from Human Feedback (RLHF) — where human reviewers reward the AI for being "helpful." That training creates a dangerous reflex. When a landlord asks, "Can I refuse a Section 8 tenant?" the model reads the intent behind the question and tries to help. It generates a justification for refusal instead of citing the law that prohibits it. Research shows RLHF-trained models tend to agree with the user's premise to appear helpful. In government, an AI often needs to say "no" — and these models are not built for that.
The black box problem. You cannot trace why the model believes tips can be confiscated. There is no citation chain inside the neural network. The model speaks with the same confidence whether it is quoting actual statute or inventing one. The MyCity chatbot continued to say cashless stores were legal even after the press reported the error. You could not simply "patch" the wrong answer because the answer lived inside billions of statistical weights, not in a document you could edit.
The chunking problem. Some teams try to fix this with basic Retrieval-Augmented Generation (RAG) — feeding the AI actual source documents. But naive RAG splits legal codes into arbitrary text chunks. That process often severs the link between a prohibition in one section and its penalty or exception in another. The AI retrieves fragments and misses the full picture.
What Works (And What Doesn't)
What does not work:
- Adding disclaimers. NYC's chatbot had a disclaimer on the website. The chatbot itself contradicted the disclaimer when users asked. A warning that the AI overrides is legally and practically worthless.
- Prompt engineering. Telling a general-purpose model "only answer legal questions accurately" does not change its underlying statistical behavior. The model still hallucinates because its architecture rewards plausibility over truth.
- Basic RAG without structure. Simply dumping PDFs of your legal code into a vector database creates retrieval mismatches. A user asks about "cash" and the system pulls documents about "cash grants" or "petty cash" instead of the cashless ban.
What does work: Statutory Citation Enforcement.
The core principle is simple: No Citation = No Output. If the system cannot point to a specific section of the official code, it does not generate an answer. Here is how that works in practice:
Structured retrieval. Your legal code gets converted into a hierarchical knowledge graph — not chopped into random text chunks. Each provision is a node with metadata: effective date, penalty amount, enforcing agency, related definitions. When a user asks about cashless stores, the system traverses Title 20 (Consumer Affairs) to locate the exact subchapter, pulling the prohibition and the penalty structure together.
Constrained generation. The AI does not get to write freely. Through a technique called constrained decoding — where the model's output vocabulary is restricted at generation time — the system forces every response into a strict format that must include a specific citation ID. If the model attempts to cite a section that does not exist in the retrieved documents, that token is blocked. The AI literally cannot invent a citation because the pathway to do so is shut down.
Verification before output. A second AI agent acts as an internal auditor. It checks whether the cited text actually supports the generated claim. Does Section X really say what the answer claims? Are there conflicting statutes? Is the citation still current law? If the auditor finds a mismatch, the answer is suppressed. The system defaults to: "This question requires professional counsel. Please contact [specific agency]." It is better to refer you to a human than to give you wrong advice.
This architecture creates something your compliance team and your litigation readiness program will value: a complete audit trail. Every query-response pair is logged with the exact retrieval chunks used. If someone claims entrapment by estoppel, you can show exactly what the AI retrieved, what it cited, and how the verification layer checked the answer. That forensic trail is your defense.
For organizations in the government and public sector, the transition from generic chatbot to deterministic retrieval system is not optional — it is the difference between a tool that serves your citizens and one that exposes you to class-action lawsuits. The same grounding and citation verification architecture applies to any regulated domain where getting the answer wrong has legal consequences.
You can read the full technical analysis for the complete architectural specification, or explore the interactive version to see how Statutory Citation Enforcement handles real-world queries.
The question is not whether your organization will face scrutiny for AI-generated advice. The question is whether your system can prove it got the answer from the actual law — or whether it guessed.
Key Takeaways
- NYC's MyCity chatbot gave illegal advice on labor law, housing rights, consumer protection, and tenancy law — all from a .gov domain citizens trusted as authoritative.
- Air Canada was held liable for its chatbot's hallucinated policy in 2024, establishing that organizations own what their AI says regardless of disclaimers.
- Standard AI models are trained to be 'helpful,' which makes them agree with users instead of citing laws that contradict what users want to hear.
- The 'No Citation = No Output' architecture blocks the AI from generating any answer it cannot trace to a specific, current section of the official legal code.
- A full audit trail of every query, retrieval, and citation gives your legal team a forensic defense if a citizen claims they followed your AI's bad advice.
The Bottom Line
Government AI that guesses at the law is not a chatbot problem — it is a liability event that can strip sovereign immunity and expose your organization to fines reaching $1 million per violation. The fix is architectural: systems that cite specific statutes or stay silent. Ask your AI vendor: if your system cannot find a relevant statute for a citizen's question, does it refuse to answer — or does it guess?