The Problem
Harvey Murphy, a 61-year-old grandfather, spent 10 days in jail for a robbery he did not commit. He was 1,500 miles away when it happened. A facial recognition system at Macy's matched his face to grainy surveillance footage from a Sunglass Hut robbery in Houston. Murphy was in Sacramento, California. The AI said otherwise, and the police believed the machine.
During those 10 days in jail, Murphy was sexually assaulted and beaten. He suffered lifelong injuries. He has since filed a $10 million lawsuit against Macy's and Sunglass Hut's parent company, EssilorLuxottica. The system that flagged him relied on low-resolution surveillance video. It likely compared that footage against a booking photo from non-violent offenses decades earlier. Studies show that matching current images against photos taken years or decades apart can produce false-positive rates as high as 90%.
This was not a freak accident. Just weeks earlier, the FTC banned Rite Aid from using facial recognition for five years. Between 2012 and 2020, Rite Aid's system generated thousands of false-positive matches across hundreds of stores. Women and people of color were disproportionately flagged. Employees followed, searched, and publicly accused innocent customers based on automated alerts.
If your organization uses AI for any decision that affects a person's liberty, finances, or reputation, these cases are your warning shot.
Why This Matters to Your Business
The financial exposure here is not hypothetical. It is already hitting balance sheets and courtrooms.
- $10 million lawsuit: Harvey Murphy's claim against Macy's and EssilorLuxottica for wrongful arrest and personal injury caused by faulty AI identification.
- Five-year technology ban: The FTC prohibited Rite Aid from using facial recognition entirely. Rite Aid must also delete all biometric data and destroy every AI model built from that data.
- Model disgorgement: The FTC forced Rite Aid to "unlearn" its AI — destroying algorithms derived from improperly collected information. This is a new regulatory tool, and it wipes out years of investment overnight.
The regulatory environment is tightening fast. The EU AI Act now classifies biometric identification in public spaces as "high-risk." Providers must conduct conformity assessments, maintain detailed technical documentation, and ensure effective human oversight. Even if your company operates only in the United States, the NIST AI Risk Management Framework pushes the same principles. The FTC used similar logic — transparency, accountability, risk management — to justify its ban on Rite Aid.
Your board needs to understand: the cost of a poorly governed AI system is not a tech problem. It is a litigation problem, a regulatory problem, and a front-page-news problem. If your AI vendor cannot explain how their system avoids these failures, your company is holding the risk — not them.
What's Actually Happening Under the Hood
Here is why these systems fail, in plain terms.
Most commercial facial recognition tools are built for what engineers call "closed-set" problems. That means the system assumes the person it is scanning is definitely in the database. Think of unlocking your phone with your face. The phone knows you are you. It just needs to confirm it.
Retail security is the opposite. It is an "open-set" problem. The vast majority of people walking into your store are not in any criminal database. But a closed-set system does not know how to say "I don't recognize this person." Instead, it finds the closest match it can, even if that match is wrong. That is how Rite Aid generated thousands of false positives across hundreds of stores.
Think of it like a multiple-choice test with no "none of the above" option. The system must pick an answer, even when the right answer is not listed. Every face that walks through the door gets matched to somebody in the database, whether they belong there or not.
The second failure is treating the AI's output as a simple yes-or-no answer. In reality, every AI output is a probability — a guess with a confidence level. The Macy's system gave a match score for Harvey Murphy. But nobody asked how reliable that score actually was. A match score of 0.85 sounds high. But if the underlying image is blurry and the reference photo is decades old, that 0.85 is statistically meaningless. Without a layer that measures uncertainty — how sure the system actually is — you are making high-stakes decisions based on a coin flip dressed up as science.
What Works (And What Doesn't)
Let's start with what fails.
Off-the-shelf models without testing. Rite Aid bought facial recognition from two vendors whose contracts expressly disclaimed any warranty on accuracy. The company never tested the software for reliability. You cannot outsource accountability through a vendor contract.
Low-quality inputs treated as good enough. Both Rite Aid and Macy's fed grainy CCTV stills and old booking photos into their systems. In biometric engineering, bad input does not just lower accuracy — it increases errors exponentially.
No human review before action. Rite Aid employees confronted customers based on automated alerts alone. No one questioned the machine. The lawsuit against Macy's alleges that the company presented an automated match to police as verified fact, and police stopped investigating.
Here is what actually works.
Input validation. Before your system even attempts a match, a dedicated quality-check agent evaluates the image. Is the resolution sufficient? Is the lighting adequate? Is the reference photo recent enough? If the input fails these checks, the system rejects the comparison outright. No guess is better than a bad guess.
Uncertainty measurement. Instead of producing a single confidence score, the system generates a probability distribution — a range that shows how reliable the output actually is. Techniques like conformal prediction guarantee that the true outcome falls within predicted bounds at a confidence level you set. You decide your acceptable error rate. The system enforces it mathematically.
Human-in-the-loop decision gates. The system routes every result through confidence thresholds. Below 70% confidence, the match is automatically discarded. Between 70% and 95%, a trained human reviewer sees the original surveillance image alongside the database image and makes the call. Only above 95% — and only for low-consequence actions — does the system act on its own.
This is a multi-agent orchestration approach where specialized agents handle each step: one validates input quality, another runs the match, a third measures uncertainty, and a fourth flags ambiguous cases for human review. No single model makes the final call alone.
The compliance advantage is the audit trail. Every step — every image quality check, every confidence score, every human decision — gets logged automatically. When the regulator or the plaintiff's attorney asks how your system reached a decision, you can show them exactly what happened. NIST's Face Recognition Vendor Test provides the benchmarks your vendors should meet. It measures false match rates at thresholds as strict as 1 in 1,000,000. It also breaks down performance by gender, age, and demographic group — exactly the kind of bias data the FTC demanded from Rite Aid.
For organizations in AI security and resilience, the question is not whether your AI is accurate on average. It is whether your AI can prove it was accurate in this specific case, for this specific person, under these specific conditions. That is the standard deterministic workflows and tooling must meet.
You can read the full technical analysis for the engineering details, or explore the interactive version for a guided walkthrough of the architecture.
Key Takeaways
- Harvey Murphy spent 10 days in jail because AI matched his face to a robber 1,500 miles away — he now has a $10 million lawsuit pending against Macy's.
- The FTC banned Rite Aid from using facial recognition for five years and forced the company to destroy all AI models built on improperly collected biometric data.
- Most commercial facial recognition systems cannot say 'I don't know' — they always pick the closest match, even when the right answer isn't in the database.
- Every AI output is a probability, not a fact — without uncertainty measurement, a confidence score of 0.85 on a blurry image is statistically meaningless.
- A multi-agent system with human review gates and full audit trails is the only architecture that can survive both a regulator's scrutiny and a plaintiff's discovery request.
The Bottom Line
AI facial recognition has already cost one company a five-year technology ban and another a $10 million lawsuit. The fix is not better AI — it is AI that knows when it does not know, with humans making the final call and an audit trail proving every step. Ask your AI vendor: when your system produces a match on low-quality footage, can it show me the uncertainty distribution and the human review log for that specific decision?