The Problem
UnitedHealth Group's AI algorithm denied care to elderly patients — and it was wrong 90% of the time. Nine out of ten denials were reversed when a human judge actually reviewed them. But UnitedHealth knew something devastating: only 0.2% of patients had the resources to fight back.
The algorithm at the center of this crisis is called nH Predict. UnitedHealth's Optum division acquired it in 2020 for over $1 billion. It was supposed to predict how long Medicare patients needed post-acute care, like stays in skilled nursing facilities. Instead, it became a cost-cutting machine that ignored whether patients actually needed help.
Consider Carol Clemens. After surviving methemoglobinemia — a life-threatening blood disorder — she needed intensive skilled nursing care. Clinical evidence supported her ongoing need for rehabilitation. But nH Predict's projections terminated her coverage anyway. Her family paid over $16,768 out-of-pocket to prevent her premature discharge. UnitedHealth, the lawsuit alleges, "banked" on the fact that patients like Clemens were too sick, too old, or too poor to appeal.
In February 2025, a federal judge ruled the class action could proceed. The era of hiding behind black-box AI decisions is ending. If your organization uses AI to make decisions that affect people's lives or livelihoods, this case is your warning shot.
Why This Matters to Your Business
This is not just a healthcare story. It is a corporate governance story, and the financial exposure is staggering.
UnitedHealth Group generates roughly $300 billion in annual revenue. The company now faces a federal class action that could reshape how every regulated enterprise deploys AI. Here is what the numbers tell you about your own risk:
Denial rates surged over 100%. Post-acute care denials jumped from roughly 10% to 22.7% after the AI was deployed. Skilled nursing facility denials increased by 800%. Your board should ask: are your AI systems producing statistically anomalous outcomes that regulators can flag?
72% of S&P 500 companies now disclose material AI risks in their annual SEC filings. Reputational damage is the top-cited concern. A single viral AI failure can trigger litigation that costs multiples of whatever the AI was supposed to save.
EU AI Act penalties reach up to 7% of global turnover. Healthcare AI systems are classified as "High-Risk" under this law, requiring mandatory conformity assessments, transparency disclosures, and human oversight. If you do business in Europe, non-compliance is a board-level financial threat.
The FDA's January 2025 draft guidance now establishes a mandatory 7-step credibility framework for AI models used in medical and regulatory decision-making. Regulators are no longer suggesting best practices. They are writing rules.
The bottom line for your P&L: the cost of governing AI properly is a fraction of the cost of defending a class action, paying regulatory fines, or rebuilding your brand after a public failure.
What's Actually Happening Under the Hood
Here is why nH Predict failed, explained without jargon.
The algorithm worked by cross-referencing a database of 6 million patient records. It looked for patterns: patients with similar diagnoses historically stayed X number of days. Then it generated a "target" discharge date for each new patient.
Think of it like a GPS that only knows average traffic patterns. It tells you the drive takes 30 minutes. But it cannot see the accident ahead, the road closure, or the fact that your car has a flat tire. It just keeps insisting you should have arrived already.
nH Predict had the same blind spot. It relied on correlation — what usually happened — instead of causation — what was actually driving each patient's condition. It could not account for whether a patient had a caregiver at home, faced financial instability, or had specific clinical complications. These factors determine medical necessity under Medicare rules, but the model ignored them entirely.
This failure mode has a name: it is the difference between asking "what usually happens" and "why does this patient need more time." A correlation-driven model — one that spots patterns without understanding causes — tells you that patients with a certain diagnosis typically leave after 14 days. A causal model asks what factors cause a patient to need more time, and what happens when you remove coverage prematurely.
The problem got worse because UnitedHealth turned a flawed prediction tool into a mandatory directive. Internal managers told case managers to keep patient stays within 3% of the algorithm's projection. Then they narrowed that target to just 1%. Clinicians who deviated to accommodate actual patient needs faced disciplinary action or termination. The "human-in-the-loop" safeguard was reduced to a rubber stamp.
What Works (And What Doesn't)
Let's start with what does not work, because your organization may already be doing one of these.
Adding a chatbot layer on top of existing AI. This is the "wrapper" approach — putting a custom interface over a third-party AI engine like GPT. These wrappers offer no proprietary logic, inherit every bias from their foundational model, and produce black-box outputs you cannot audit. In regulated industries, they are liabilities.
Assuming human review fixes everything. UnitedHealth technically had humans reviewing AI decisions. But when you punish employees for overriding the algorithm, your human-in-the-loop is a fiction. Process design matters more than policy language.
Treating AI governance as an IT project. If your AI governance lives entirely in IT, you are missing the legal, clinical, and financial dimensions that create actual liability. The February 2025 ruling succeeded because the judge found UnitedHealth violated its contractual promise to have coverage decisions made by "clinical services staff" and "physicians" — not algorithms.
Here is what actually works, in three steps:
Input: Causal modeling instead of correlation. Your AI system should be built on causal and counterfactual modeling — an approach where you map the actual cause-and-effect relationships in your domain, not just historical patterns. For healthcare, this means modeling why a patient needs care, not just how long similar patients stayed.
Processing: Explainable AI with confidence scoring. Every decision your AI makes should come with a plain-language explanation of the factors that drove it. Tools like SHAP and LIME — methods that show which data points influenced a specific output — let auditors see if denials are driven by clinical evidence or by proxy variables like zip code. When the AI encounters a case outside its training data, confidence scoring should flag the uncertainty and route the case to a human reviewer.
Output: Full audit trail with kill-switch controls. Every AI output must be logged, time-stamped, and traceable. Your AI governance and compliance program should include a central AI registry that catalogs every model in your stack, model change management with rollback options, and clear authority for a cross-functional committee to shut down a model if its performance degrades.
This audit trail is what sells this approach to your compliance team. When a regulator or a plaintiff's attorney asks "show me how this decision was made," you either have a documented logic trail or you have a lawsuit. The FDA's 7-step credibility framework now explicitly requires a "Credibility Report" that documents validation strategies, metrics, and any deviations. If your AI cannot produce that report, it fails the regulatory test before it ever reaches a courtroom.
The organizations building AI solutions for healthcare and life sciences need systems that explain their reasoning, flag their own uncertainty, and keep humans genuinely in control. The court's message in the UnitedHealth case was unmistakable: if your AI system is fundamentally broken, courts will not require victims to participate in the charade of a rigged appeal process.
You can read the full technical analysis or explore the interactive version for a deeper look at the regulatory frameworks and architectural requirements.
Key Takeaways
- UnitedHealth's AI denied elderly patient care with a 90% error rate, but only 0.2% of patients could appeal — creating a profitable failure.
- A federal judge allowed the class action to proceed, ruling that substituting human clinical judgment with an AI algorithm may breach contractual promises to policyholders.
- The EU AI Act can penalize non-compliant healthcare AI deployments up to 7% of global turnover, and the FDA now requires a 7-step credibility framework for AI models.
- Correlation-based AI that predicts 'what usually happens' fails in regulated settings — you need causal models that explain 'why' a decision was made.
- Every AI output in a high-stakes domain needs a full audit trail, confidence scoring, and a kill switch — or you are building liability, not efficiency.
The Bottom Line
The UnitedHealth case proved that a profitable AI system can still be a catastrophically wrong one — and courts will hold your organization accountable for the gap. If you deploy AI in any decision that affects people's health, finances, or legal rights, your system must explain its reasoning and flag its own uncertainty. Ask your AI vendor: when your model denies a claim or makes a high-stakes recommendation, can it show a regulator the exact factors it weighed, the confidence level of that specific decision, and the audit trail proving a human had genuine authority to override it?