A striking editorial image specific to the intersection of AI hiring technology and disability exclusion, centered on the concept of a system confidently scoring a broken input.

Artificial IntelligenceHiringDisability

An AI Told a Deaf Woman to "Practice Active Listening." That's the Moment I Knew This Industry Was Broken.

Ashutosh Singhal March 22, 202612 min read

I was sitting in my home office, late on a Tuesday night, scrolling through the ACLU's complaint filing against Intuit and HireVue, when I hit the line that made me set my laptop down and just stare at the wall.

A Deaf Indigenous woman — identified as D.K. in the filings — had been required to complete an automated video interview for a promotion. She'd already earned positive evaluations, annual bonuses, a track record that should have made the promotion straightforward. But the AI system that processed her interview generated a piece of feedback that will haunt this industry for years: it told her to "practice active listening."

She is Deaf.

The system didn't know. The system didn't care. The system did what every large language model does — it pattern-matched against a training set built overwhelmingly from hearing, neurotypical, standard-American-English-speaking humans, and it decided that anyone who didn't sound like that dataset was deficient. Not different. Deficient.

I've spent years building AI systems at Veriprajna that are designed to make high-stakes decisions about people. And I can tell you with absolute certainty: this wasn't a bug. This was the architecture working exactly as designed. That's the problem.

What Actually Happened to D.K.?

A left-to-right pipeline diagram showing how a single ASR error cascades and amplifies through three AI layers, turning a qualified candidate into a rejection.

The facts are worth sitting with because they expose something deeper than a single software failure.

D.K. had requested a reasonable accommodation — specifically, a human-generated Communication Access Realtime Translation (CART) captioner to help her navigate the video interview. Instead, she was given automated captions. If you've ever watched auto-captions butcher a speaker with a mild regional accent, imagine what happens when the speaker has what linguists call a "Deaf accent" — speech patterns shaped by a lifetime of communicating without auditory feedback.

The Automated Speech Recognition system couldn't parse her speech. The transcript it generated was, functionally, garbage. And then a second layer of AI analyzed that garbage transcript for "leadership qualities" and "communication skills" and concluded she wasn't ready for management.

This is what I started calling cascading failure in conversations with my team — when an error in one AI layer doesn't just persist but amplifies as it passes through subsequent layers. Bad transcript feeds bad analysis feeds bad recommendation. By the time a human sees the output, it looks clean. A score. A ranking. A rejection. No one sees the 78% Word Error Rate underneath.

When the foundational transcript has a 78% error rate, every model built on top of it isn't analyzing the candidate — it's analyzing noise.

That number isn't hypothetical. Research on ASR systems processing Deaf speakers with average-to-low speech intelligibility consistently shows Word Error Rates between 77% and 78%. For comparison, standard American English speakers hit 10–18%. The system was never going to work for D.K. It was designed, from the ground up, to exclude her.

Why Does Every AI Hiring Tool Have This Problem?

Here's where I need to be honest about the industry I work in.

The vast majority of "AI hiring solutions" on the market right now are what we call wrapper products. They're thin interfaces built on top of general-purpose large language models — GPT-4, Claude, Gemini. The company adds a nice UI, some HR-specific prompts, maybe a dashboard with charts, and sells it as "AI-powered talent intelligence."

I've sat across the table from enterprise buyers who genuinely couldn't tell the difference between a wrapper and a purpose-built system. And why would they? The marketing looks identical. The demos are polished. The wrapper company says "we use advanced AI" and the deep AI company says "we use advanced AI" and the procurement team picks the one with the lower price tag.

The difference only shows up when someone like D.K. walks through the door.

General-purpose LLMs inherit every bias baked into the internet-scale datasets they were trained on. If decades of hiring data reflect a preference for candidates who speak a certain way, look a certain way, present a certain way, the model doesn't question that pattern — it optimizes for it. That's not a flaw in the model's reasoning. That's literally what the model was built to do: find patterns and replicate them.

I remember a heated argument with one of my engineers — I'll call him Ravi — about whether adversarial debiasing was worth the computational overhead. His position was pragmatic: "Most candidates won't trigger the edge cases. We're adding latency for a scenario that affects maybe 2% of interviews." My response was blunt: "If your system works perfectly for 98% of people and systematically discriminates against the other 2%, you haven't built a good product with edge cases. You've built a civil rights violation with a high accuracy rate."

Ravi came around. But I think about that conversation a lot, because I know it's happening at every AI company right now, and at most of them, the Ravis are winning.

How Do You Actually Build AI That Doesn't Discriminate?

A diagram showing the adversarial debiasing architecture — two models in a feedback loop where the adversary tries to detect demographics and the primary model is penalized when it succeeds.

The technical answer matters, but I want to explain it the way I'd explain it to a friend, not the way I'd write it in a spec doc.

The core idea behind what we build at Veriprajna is something called adversarial debiasing. Imagine you're training two AI models simultaneously. The first model — the one you actually care about — is trying to predict whether a candidate will succeed in a role. The second model is an adversary. Its only job is to look at the first model's internal representations and try to guess the candidate's race, gender, disability status, or any other protected attribute.

Then you punish the first model every time the adversary succeeds.

Over thousands of training cycles, the primary model learns to make predictions that are genuinely blind to protected characteristics — not because you've removed those data points from the input (that's the naive approach, and it doesn't work because proxies remain), but because the model's internal reasoning has been forced to find paths to its conclusions that don't pass through demographic information.

Counterfactual fairness means proving that a candidate's score would remain identical if their protected attributes — race, gender, disability — were different. That's not an aspiration. It's a mathematical test.

This is fundamentally different from what a wrapper can do. You can't bolt adversarial debiasing onto a GPT API call. You can't retroactively audit the internal representations of a model you don't control. You're just sending text to a black box and hoping the output isn't discriminatory. Hope is not a compliance strategy.

I wrote about the full technical architecture — including the multimodal fusion approach and the formal fairness metrics — in our interactive whitepaper if you want to go deeper.

The Modality Collapse That Sank D.K.

A side-by-side comparison showing modality collapse (left panel — audio dominates and drowns out other signals) versus dynamic modality reweighting (right panel — degraded audio is reduced and other channels compensate).

There's a specific technical failure in the HireVue case that I think most coverage has missed, and it's one that keeps me up at night.

The system suffered from what researchers call modality collapse. In a multimodal AI system — one that processes video, audio, and text simultaneously — each channel (or "modality") contributes to the final assessment. In theory, this is more robust than a single-channel system. If the audio is noisy, the video can compensate. If the transcript is garbled, the visual cues can fill in.

In practice, HireVue's system appears to have over-indexed on the audio channel. When D.K.'s speech didn't match the patterns the model expected, the audio signal didn't just contribute a low score — it dominated the entire assessment. The visual channel, which might have captured her engagement, her confidence, her expressiveness, was drowned out.

We solve this with something we call Modality Fusion Collaborative Debiasing. When our system detects that one modality is producing low-confidence outputs — say, the ASR is struggling with a non-standard accent — it doesn't just flag the problem. It automatically increases the weight of the other modalities. The written responses get more influence. The visual behavioral cues get more influence. The degraded audio channel gets less.

But here's the part that I think matters most, and it's not technical at all: when our system's confidence drops below a threshold, it routes to a human. Not as an afterthought. Not as an "escalation path" buried in a settings menu. As a core architectural decision.

D.K. asked for a human captioner. She was denied. In our system, she wouldn't have needed to ask. The system would have recognized its own limitation and brought a human in automatically.

AI should know when it's failing. The fact that HireVue's system confidently scored a transcript with a 78% error rate tells you everything about how these tools are built — and who they're built for.

What Happens When the Law Catches Up?

For years, the AI hiring industry operated in a regulatory vacuum. Companies could deploy whatever they wanted, audit nothing, and disclaim liability in their terms of service. That era is ending, fast.

The Colorado Artificial Intelligence Act (SB 24–205), effective in early 2026, establishes something unprecedented: a legal "duty of reasonable care" for anyone who develops or deploys high-risk AI systems. Hiring and promotion decisions are explicitly classified as high-risk. The law requires annual impact assessments that screen for algorithmic discrimination. Not voluntary. Not "best practice." Mandatory.

New York City's Local Law 144 already requires independent bias audits for automated employment decision tools. Similar legislation is advancing in California and Illinois. The EU AI Act classifies recruitment AI as high-risk and imposes transparency and human oversight requirements backed by revenue-based fines.

And then there's Mobley v. Workday, which might be the most consequential case most people haven't heard of. A federal court certified a collective action and ruled that an AI vendor can be treated as an "agent" or "indirect employer" when its software performs functions traditionally handled by a human hiring manager. That single ruling demolished the liability firewall that every wrapper company depends on — the idea that the vendor provides the tool but the employer bears all the risk.

I had a potential investor tell me, about a year ago, that compliance-first AI was "a niche play." That the market wanted speed and scale, not auditability. I told him that the market was about to get sued into wanting auditability. I think the ACLU filing proved the point.

For the detailed regulatory analysis and the full framework for how enterprises should be preparing, the technical deep-dive is here.

"But Our System Passed the Bias Audit"

People ask me this constantly — if a system passes an annual bias audit, isn't that enough?

No. And here's why.

Most bias audits test for disparate impact using the Four-Fifths Rule: if the selection rate for a protected group falls below 80% of the rate for the highest-selected group, there's a problem. This is a useful floor, but it's a terrible ceiling. A system can pass the Four-Fifths Rule in aggregate while systematically failing specific intersectional groups — say, Deaf Indigenous women — because the sample sizes are too small to trigger the statistical threshold.

D.K. wasn't failed by a system that was broadly biased against women or broadly biased against Indigenous people. She was failed by a system that couldn't process her specific combination of identity and communication style. Aggregate fairness metrics would never have caught it.

This is why we use SHAP (SHapley Additive exPlanations) analysis as a continuous monitoring layer, not a once-a-year checkbox. SHAP lets us decompose every single decision into its contributing features. If a candidate is scored low, we can see exactly which features drove that score. And if those features correlate with protected attributes rather than job-relevant competencies — if "prosody" or "speech cadence" is doing the heavy lifting instead of "problem-solving ability" or "domain expertise" — the system flags itself for remediation in real time.

The difference between a bias audit and continuous explainability monitoring is the difference between an annual physical and a heart monitor. One tells you what already went wrong. The other catches the problem while there's still time to act.

The Real Cost of Getting This Wrong

I want to end with something that isn't about technology or regulation.

When D.K. was denied her promotion, the company didn't just violate her rights. It lost a high-performing employee who had earned bonuses and positive reviews — someone who, by every human measure, was ready for the role. The AI didn't protect the company from a bad hire. It protected the company from a great one.

Every time a biased system screens out a qualified candidate — because of an accent, a disability, a name, a speech pattern that doesn't match the training data — the company doesn't just face legal risk. It loses the person. It loses the perspective, the problem-solving approach, the lived experience that no amount of "culture fit" optimization can replicate.

I've built Veriprajna on a conviction that I hold more strongly now than when I started: the companies that will dominate the next decade are the ones that figure out how to use AI as a bridge to talent they'd otherwise miss, not a filter that screens it out. The wrapper era is collapsing under the weight of its own lawsuits. The black-box era is being legislated out of existence.

What replaces it has to be different in kind, not in degree. Not a better wrapper. Not a more carefully prompted GPT call. A fundamentally different architecture — one that knows when it's wrong, explains why it's right, and brings a human in when neither is certain.

AI should be a bridge to talent, not a barrier to it. Any system that can't tell the difference between a disability and a deficiency has no business making decisions about people's careers.

The era of "deploy and disclaim" is over. What comes next is harder, slower, more expensive to build, and the only thing that will actually work.