A stark editorial image specific to algorithmic healthcare denial — conveying the tension between automated systems and patient care.

Artificial IntelligenceHealthcareTechnology

The Algorithm That Denied Care to Dying Patients — And What It Taught Me About Building AI That Doesn't Kill

Ashutosh Singhal March 31, 202615 min read

I was sitting in a conference room in late 2024 when a colleague pulled up a stat on her laptop and turned the screen toward me. "Have you seen this?"

It was the appeal reversal rate for UnitedHealth Group's nH Predict algorithm — the AI system their subsidiary NaviHealth had been using to decide when Medicare patients should be cut off from post-acute care. Skilled nursing. Rehabilitation. The kind of care that keeps an 82-year-old from being sent home to an empty apartment after a stroke.

The number was 90%.

Nine out of ten times a human judge actually reviewed the algorithm's decision to deny coverage, they reversed it. The AI was wrong nine times out of ten. And UnitedHealth knew. They knew because only 0.2% of patients — elderly, disabled, cognitively impaired people — ever managed to file an appeal. The system wasn't designed to be accurate. It was designed to be unappealable.

I closed my laptop that night and couldn't sleep. Not because the technology surprised me — I've spent years building AI systems and I understand how correlation-driven models fail. What kept me up was something uglier: this wasn't a bug. It was a business model. And it was the logical endpoint of an entire philosophy of enterprise AI that my industry has been cheerfully promoting for half a decade.

I run Veriprajna, a company built on the premise that AI in high-stakes domains needs to be fundamentally different from the chatbots and content generators that dominate the conversation. The UnitedHealth crisis didn't just validate that premise. It radicalized it.

A Billion-Dollar Algorithm That Couldn't See a Dying Woman

Let me tell you about Carol Clemens, because the numbers don't mean anything without her.

Carol had methemoglobinemia — a life-threatening blood disorder where your blood can't carry oxygen properly. After a severe episode, she was in a skilled nursing facility getting the rehabilitation she needed to survive. The kind of care that Medicare is supposed to cover.

Then nH Predict generated a "target discharge date." The algorithm, trained on 6 million patient records, had cross-referenced Carol's diagnosis with historical outcomes and decided she was done. Never mind that her blood oxygen levels were still life-threateningly low. Never mind that her clinicians said she needed more time. The model had spoken.

Her family paid $16,768 out of pocket to keep her in care. They were lucky — they had the resources. Most patients in Carol's situation didn't.

Here's what haunts me about this case: nH Predict wasn't some rogue experiment. UnitedHealth's Optum division paid over $1 billion to acquire NaviHealth and its algorithm. This was a flagship product at a company projecting $340 billion in revenue for 2025. The most expensive AI deployment in healthcare history, and it couldn't distinguish between a statistical average and a woman who was suffocating.

Why Did the Algorithm Get It Wrong 90% of the Time?

A side-by-side comparison diagram showing how correlation-based AI (like nH Predict) differs from causal AI in clinical decision-making, using a specific patient scenario.

This is the question everyone asks, and the answer is deceptively simple. nH Predict was a correlation engine pretending to be a clinical tool.

It ingested patient records and found patterns: patients with diagnosis X typically stay Y days. That's it. That's the whole trick. It didn't model why patients need different lengths of care. It didn't account for whether someone had a caregiver at home, whether they were financially stable enough to manage outpatient treatment, whether they had specific complications that made their case different from the statistical average.

A model that tells you "patients like this usually leave in 14 days" is not the same as a model that understands why this specific patient needs 21 days. The first is a spreadsheet with extra steps. The second is intelligence.

I've had this argument with other founders more times than I can count. "But the model is accurate on average!" they'll say. Sure. And a river is four feet deep on average, which is no comfort to the person who drowned in the eight-foot section.

The technical term for what nH Predict lacked is causal reasoning — the ability to move from "what usually happens" to "what would happen if we changed this variable." A causal model would ask: what happens to Carol Clemens's recovery trajectory if we remove skilled nursing care on day 14? Does she relapse? Does she die? A correlation model doesn't ask. It can't. It wasn't built to.

I wrote about this distinction in depth in the interactive version of our research, because I think it's the single most important concept that enterprise leaders need to understand about AI right now.

The 3% to 1% Rule — Or, How You Turn Nurses Into Rubber Stamps

A process flow diagram showing how algorithmic coercion works — from algorithm output through management enforcement to clinician compliance and patient harm — with the feedback loop that suppresses appeals.

The algorithm's inaccuracy was bad enough. What UnitedHealth did with it was worse.

Whistleblower testimony revealed that NaviHealth managers set rigid compliance targets for their clinical staff. Case managers — nurses, doctors, people who had spent decades learning to assess patient needs — were told to keep patients' actual lengths of stay within a 3% variance of whatever nH Predict projected.

Then they tightened it to 1%.

Think about what that means in practice. You're a nurse. You've examined a patient. You know, from years of experience and from the clinical evidence in front of you, that this person isn't ready to go home. But the algorithm says day 14, and your manager says you need to hit day 14 plus or minus a fraction of a day, or you face disciplinary action. Maybe termination.

What do you do?

Most people complied. Not because they were bad clinicians, but because the system was designed to make compliance the only survivable option. Care coordinators were instructed to time their progress reviews to coincide exactly with the algorithm's predicted discharge date — engineering the clinical timeline to fit the model rather than the patient.

I remember describing this to a friend who works in aviation safety, and he went pale. "That's like telling pilots to land based on the flight plan regardless of weather conditions," he said. "You'd never fly again."

When clinicians are disciplined for overriding a flawed algorithm, you don't have a "human-in-the-loop." You have a human-shaped rubber stamp.

This is what I call algorithmic coercion, and it's the failure mode that terrifies me most — not because the AI is autonomous, but because it creates an environment where humans are punished for exercising the judgment the AI lacks.

What Happened in Court on February 13, 2025?

The class action — Estate of Gene B. Lokken v. UnitedHealth Group — reached a turning point when U.S. District Judge John Tunheim ruled the case could proceed. This matters enormously, and not just for UnitedHealth.

The court found that UHC's own policy documents promised coverage decisions would be made by "clinical services staff" and "physicians." By substituting those humans with an algorithm that effectively dictated outcomes, UnitedHealth potentially broke its contract with every policyholder.

Even more significant: the judge waived the requirement for patients to exhaust administrative appeals before suing. Normally, Medicare beneficiaries have to navigate multiple levels of bureaucratic review before they can go to court. But Tunheim looked at the 90% error rate, looked at the 0.2% appeal rate, and essentially said: we're not going to force dying people to participate in a system that's rigged against them.

That ruling should be required reading for every executive deploying AI in a regulated industry. The legal system is no longer willing to treat algorithmic dysfunction as a process problem that patients need to solve on their own.

Why "Wrapper AI" Is a Ticking Time Bomb in Healthcare

Here's where I need to be blunt about my own industry, because the UnitedHealth story isn't an isolated incident. It's the most visible symptom of a structural problem.

Over the past three years, the enterprise AI market has been flooded with what I call wrapper solutions — companies that take an existing large language model, wrap it in a custom interface, maybe fine-tune it on some domain-specific data, and sell it as a healthcare AI product. Or an insurance AI product. Or a compliance AI product.

These wrappers share every vulnerability that made nH Predict dangerous:

They're black boxes. You can't audit the reasoning behind any individual decision, which means you can't catch systematic bias until it's already harmed thousands of people.

They inherit the biases of their foundational models. If the training data reflects historical patterns of discrimination — and in healthcare, it always does — the wrapper faithfully reproduces those patterns.

They have no causal understanding. They predict based on statistical correlation, which means they're optimizing for "what usually happens" rather than "what should happen for this patient."

And critically, they're not defensible. Any competitor can build the same wrapper on the same foundation model. There's no proprietary intelligence, no unique insight — just a thin layer of automation over someone else's engine.

The wrapper economy in healthcare AI is building on sand. When the regulatory tide comes in — and it's coming fast — companies without deep, explainable, causally-grounded systems will be swept away.

I'm not saying this because Veriprajna competes with wrapper companies (though we do). I'm saying it because I've seen what happens when these systems fail in production, and the gap between "demo-ready" and "clinically safe" is a chasm that wrappers cannot cross.

How Does the FDA Want AI to Prove It's Trustworthy?

A visual summary of the FDA's 7-step credibility assessment framework mapped against nH Predict's failures, showing how the algorithm would have failed each requirement.

In January 2025, the FDA released draft guidance establishing a 7-step credibility assessment framework for AI models used in medical and regulatory decision-making. I've spent weeks with this document, and it's the most consequential piece of AI regulation I've seen.

The framework demands that every AI deployment clearly define the exact question it's answering, specify its role in the clinical workflow, assess what happens if it's wrong, and then prove — with rigorous testing — that it's fit for that specific purpose.

nH Predict would have failed at every step. It had no clear definition of its clinical role. Its risk assessment ignored the life-threatening consequences of denied care. Its "validation" optimized for cost containment, not patient outcomes.

Meanwhile, the EU AI Act classified healthcare AI as "High-Risk" in 2025, requiring mandatory transparency disclosures and human oversight. Non-compliance penalties run up to 7% of global turnover. For a company UnitedHealth's size, that's not a fine — it's an existential threat.

The World Health Organization has gone further, specifically targeting what they call automation bias — the tendency for clinicians to defer to an algorithm even when it contradicts their own clinical judgment. This is exactly what happened at NaviHealth. The WHO's 2024 guidance warns that over-reliance on AI can lead to a "degradation of skills" among physicians who stop exercising critical appraisal.

For the full technical breakdown of these regulatory frameworks and how they apply to enterprise AI deployment, see our research paper.

The Night I Realized Explainability Isn't Optional

There's a moment in every founder's journey where an abstract principle becomes visceral. For me, it was a late evening testing an early version of one of our models on a healthcare dataset.

The model had flagged a case for denial. I asked my team to run SHAP — SHapley Additive exPlanations, a tool that shows which features drove a specific prediction. The top factor wasn't the patient's diagnosis or their clinical trajectory. It was their zip code.

My lead engineer and I stared at the screen. We both knew what zip code correlates with in American healthcare data. We weren't looking at a clinical variable. We were looking at a proxy for race and income dressed up in five digits.

We scrapped the feature that night. But the experience crystallized something I'd understood intellectually but hadn't felt in my gut: if you can't explain why your AI made a decision, you can't catch the decisions that are indefensible.

This is why we build with explainability as architecture, not afterthought. Tools like SHAP give you a global view of what's driving your model. LIME — Local Interpretable Model-Agnostic Explanations — shows you the reasoning behind any single decision. For a patient like Carol Clemens, LIME would have made visible that the algorithm was ignoring her dangerously low blood oxygen in favor of average recovery statistics for her diagnosis code.

And then there's confidence scoring — the piece most wrapper solutions skip entirely. When a patient presents with a rare condition that's poorly represented in training data, the system needs to say, explicitly: "I don't know enough to make this call. Route this to a human." Not a suggestion. A hard stop.

Why This Can't Be an "IT Problem" Anymore

People always push back on me when I say AI governance belongs in the boardroom. "Isn't that what the engineering team is for?" No. Absolutely not. And the UnitedHealth case is the proof.

The engineers at NaviHealth didn't set the 1% variance mandate. That was a management decision. The engineers didn't decide to discipline clinicians who overrode the algorithm. That was a policy decision. The engineers didn't choose to deploy a correlation-based model for life-or-death coverage decisions without causal validation. That was a strategy decision.

By 2025, 72% of S&P 500 companies have disclosed material AI risks in their SEC filings. Reputational risk is now the top-cited concern. A single algorithmic failure can trigger litigation, regulatory action, and public outrage simultaneously — and the board that says "we didn't know" will find that ignorance is not a defense.

At Veriprajna, we push every client toward establishing cross-functional AI governance committees that include clinical leaders, legal counsel, and patient safety representatives — not just engineers and product managers. These committees need the authority to maintain a central registry of every AI model in the organization's stack, enforce rollback options when performance degrades, and — this is the part that makes executives uncomfortable — kill a profitable model when it's causing harm.

AI governance isn't a cost center. It's the difference between a company that deploys AI responsibly and a company that becomes the next cautionary tale in a Senate investigation.

The Argument I Keep Having

There's a conversation I have at almost every conference, and it goes like this:

"Ashutosh, you're overcomplicating this. We can fine-tune GPT-4 on our clinical data and ship something in six weeks. Your approach takes months."

I don't disagree on the timeline. I disagree on the definition of "done."

You can absolutely ship a wrapper in six weeks. You can demo it beautifully. It will generate plausible-sounding clinical summaries and make your investors happy. And then, six months later, when a patient dies because your model confidently recommended the wrong course of action and nobody could explain why, you'll discover that the six weeks you saved cost you everything.

The UnitedHealth crisis wasn't caused by bad engineers or malicious intent. It was caused by an organization that treated AI as a throughput optimization problem — reducing review time by six to ten minutes per case — instead of a clinical judgment problem. They measured success in processing speed and denial rates, not in patient outcomes.

The shift from predictive wrappers to what I call deep AI isn't about using fancier models. It's about asking a fundamentally different question. Not "how do we automate this decision?" but "how do we make this decision better, more transparent, and more accountable than a human alone could?"

Where We Go From Here

I want to end with something that's been bothering me since I started writing this.

The nH Predict story is shocking, but it shouldn't be surprising. We've spent years building an AI ecosystem that rewards speed over safety, correlation over causation, and automation over augmentation. The incentive structures — venture capital timelines, enterprise procurement cycles, the relentless pressure to ship — all push toward the wrapper approach. Build fast, sell fast, worry about governance later.

There is no "later." The February 2025 ruling made that clear. The FDA's credibility framework made that clear. The EU AI Act's 7% penalty made that clear. And Carol Clemens's $16,768 medical bill made that clear in the most human terms possible.

The path forward isn't less AI. It's AI that earns the authority we're giving it — through causal validation that understands why, through explainable architecture that shows its work, through governance structures that empower humans to override the machine without fear of punishment, and through the basic institutional humility to admit when the model doesn't know enough to make the call.

The question was never "can AI make healthcare decisions?" It was always "should we let AI make healthcare decisions it can't explain, can't justify, and gets wrong 90% of the time?" The answer, finally, is no.

We built Veriprajna because we believed that answer was coming. I just wish it hadn't taken dying patients to prove us right.