A striking editorial image conveying the concept of algorithmic gatekeeping in hiring — a digital screening wall standing between job applicants and opportunities.
Artificial IntelligenceHiringTechnology

A Court Just Told Millions of Job Applicants They Might Have Been Discriminated Against by Software

Ashutosh SinghalAshutosh SinghalMarch 20, 202615 min read

I was sitting in a hotel lobby in Bangalore last year, waiting for a meeting that was running late, scrolling through legal filings on my phone — the way normal people scroll Instagram — when I hit a paragraph that made me put my coffee down.

A federal judge in California had just ruled that Workday, the $70 billion HR software giant, could be held liable as an agent under federal anti-discrimination law. Not a tool. Not a neutral platform. An agent — the same legal category as a human recruiter who throws out resumes based on someone's age or race.

The plaintiff, Derek Mobley, an African American man over 40 with disabilities, had been rejected from more than 100 jobs. Many of those rejections came within minutes of applying, often outside business hours. No human had looked at his resume. Software decided he wasn't worth considering, and it did so over and over again with algorithmic consistency.

I build AI systems. My company, Veriprajna, designs cognitive architectures for enterprises — the kind of deep, deterministic AI that's supposed to replace the sloppy, probabilistic shortcuts most of the industry is selling. And when I read that ruling, my first thought wasn't "this is bad for Workday." It was: most of the AI recruitment industry is built on the same rotten foundation, and almost nobody is talking about it.

1.1 Billion Rejections and a Judge Who Noticed

Let me give you the number that stopped the room when I shared it with my engineering team.

During the relevant period of the Workday case, approximately 1.1 billion job applications were rejected through Workday's software. That's not a typo. Billion, with a B.

In May 2025, a federal court granted preliminary certification of a nationwide collective action for age discrimination under the ADEA — the Age Discrimination in Employment Act. This means every person over 40 who was denied an employment recommendation through Workday's platform since September 2020 could be notified and could join the case. By July 2025, the court expanded scope to include applicants processed through HiredScore, an AI hiring tool Workday had acquired.

When software rejects a billion applications and a court says "that software is legally your agent," the entire HR tech industry has a structural problem, not a PR problem.

I remember the argument we had internally about this. One of my engineers — sharp guy, deep ML background — said, "But Workday is just running a recommendation engine. It's like blaming Google for showing bad search results." And I said, "No. It's like blaming a staffing agency that you hired to screen candidates and that staffing agency threw out every resume from anyone who graduated before 1995."

The court drew exactly that distinction. Judge Rita Lin separated "simple tools" — spreadsheets, email — from systems that actively score, rank, and recommend candidates. Workday's AI wasn't organizing data for a human to review. It was performing the traditional employer function of deciding who advances and who doesn't. That's agency. That's liability.

How Does an Algorithm Learn to Be Ageist?

A diagram showing how AI screening systems infer age through proxy signals without ever seeing a birthdate, mapping specific resume features to age correlation and then to rejection.

This is the part that keeps me up at night, because the mechanism is so banal.

Nobody at Workday — I genuinely believe this — sat down and wrote code that says if age > 40: reject(). That would be cartoonishly illegal and trivially detectable. The real problem is subtler and, honestly, harder to fix.

When you train a machine learning model on a company's historical hiring data — their "successful employees" — you're feeding it every bias those past hiring managers ever had. If the company historically hired younger workers for engineering roles, the model learns that youth-correlated signals predict "success." Not age directly. Proxies.

Here's what an AI screening system can infer about your age without ever seeing your birthdate:

Your email domain. An @aol.com or @hotmail.com address correlates with an older user demographic. Your technology references — listing Lotus Notes or COBOL expertise pins you to a specific era. Total years of experience, where "15+ years" becomes a temporal anchor. Even career progression markers: a "Junior Programmer" title from the early 1990s tells the model exactly when you entered the workforce.

I tested this with my own team. We built a synthetic dataset — fake resumes with controlled variables — and ran them through a standard transformer-based screening pipeline. The model had never been told anything about age. But when we measured selection rates using the EEOC's Four-Fifths Rule — which flags adverse impact when a protected group's selection rate falls below 80% of the highest group's rate — the results for applicants over 40 were devastating. Selection rates half that of younger applicants. Impact ratios around 0.50, well below the 0.80 threshold.

The algorithm doesn't need to know your age. It just needs your email provider, your vocabulary, and your career timeline. The math does the rest.

Nobody programmed discrimination. The training data is the discrimination, crystallized into weights and parameters and served back at scale.

Why "Just Use GPT" Is the Wrong Answer

I get this constantly. From investors, from potential clients, from well-meaning CTOs who've read three blog posts about AI transformation. "Why don't you just wrap GPT-4? It's good enough."

I had an investor tell me this to my face during a pitch. He leaned back, arms crossed, and said: "Ashutosh, OpenAI has spent billions on this. You're telling me your 40-person company is going to build something better?"

I told him he was asking the wrong question. The question isn't whether GPT-4 is "better" at generating text. Of course it is. The question is whether a probabilistic text-generation engine should be making decisions that determine whether a 52-year-old software engineer gets to feed her family.

The market is flooded with what I call LLM wrappers — thin application layers that repackage the outputs of foundation models like GPT-4 or Claude and sell them as "AI recruitment solutions." They look impressive in demos. They fail catastrophically in production, and here's why.

An LLM predicts the most likely next token. That's it. It's a sophisticated autocomplete engine. It doesn't reason about whether a candidate meets a job requirement. It generates text that looks like reasoning. And in recruitment, the gap between "looks like reasoning" and "actually reasoning" is the gap between compliance and a class-action lawsuit.

There's a well-documented phenomenon called lost-in-the-middle syndrome: standard transformer architectures show high accuracy when processing information at the beginning and end of their context window, but attention drops significantly in the middle. In a 10-page resume, critical certifications or recent accomplishments buried in the middle sections are statistically more likely to be overlooked. Not because the model decided they weren't important — because the architecture literally can't pay equal attention to everything.

I wrote about this architectural limitation and our approach to solving it in the interactive version of our research.

And then there's the economic problem. LLM wrappers face what I call moat absorption — as foundation model providers release more capable base models, they inevitably integrate the features that wrappers rely on as their value proposition. Resume parsing, sentiment analysis, basic matching — OpenAI and Google will eventually offer these natively. A company that merely wraps an API is training away its own competitive edge with every customer interaction.

The Night We Broke Our Own System

I want to tell you about a Thursday night about eight months ago, because it changed how I think about everything we build.

We were testing a prototype of our recruitment screening module — our neuro-symbolic architecture, which I'll explain in a moment — against a benchmark dataset. The system was performing beautifully on accuracy metrics. Precision was high. Recall was solid. My lead ML engineer, who'd been working 14-hour days on this, was practically glowing.

Then our compliance analyst ran the fairness audit.

The system was exhibiting demographic parity violations on disability status. Not huge ones — the impact ratio was around 0.78, just barely under the 0.80 threshold. But it was there. Our own system, the one I'd been telling everyone was "bias-resilient by design," was producing discriminatory outcomes.

The room went quiet. I felt sick.

We spent the next three days tearing the pipeline apart. The culprit turned out to be a feature in our training data that we'd assumed was neutral: employment gap duration. Candidates with disabilities are statistically more likely to have employment gaps — for medical leave, for accessibility-related job transitions, for recovery periods. Our model had learned that gaps predicted lower "success," and it was penalizing disability by proxy.

We caught the bias because we were looking for it. Most companies using off-the-shelf AI recruitment tools aren't looking. They don't even know they should be.

We fixed it using adversarial debiasing — training a secondary "adversary" model to predict protected characteristics from our predictor's output, then penalizing the predictor whenever the adversary succeeds. It's an in-processing technique that forces the system to unlearn discriminatory patterns rather than just masking them in post-processing.

But the lesson wasn't technical. The lesson was: if we, a company obsessed with fairness and verification, nearly shipped a biased system, what is everyone else shipping?

What Does "Deep AI" Actually Mean for Hiring?

An architecture diagram showing the neuro-symbolic pipeline — how a resume flows from language model extraction through a knowledge graph to a deterministic rule engine, with constitutional guardrails at three stages, producing an auditable decision trail.

When I say we build "Deep AI" instead of LLM wrappers, I don't mean we use deeper neural networks. I mean we go deeper into the problem.

Our architecture is neuro-symbolic — it combines the linguistic capabilities of neural networks with the logical rigor of symbolic reasoning. In practice, this means the LLM in our system is not the decision-maker. It's the translator.

Here's how it works, without the jargon:

When a resume enters our system, a specialized language model extracts structured facts — "this person has 5 years of Python experience," "this person holds a PMP certification," "this person worked at Company X from 2018 to 2022." These aren't interpretations. They're entity extractions, mapped to a knowledge graph that defines the relationships between skills, roles, and organizational requirements.

Then — and this is the critical part — a deterministic rule engine evaluates those extracted facts against the job requirements. Not a neural network. Not a probability distribution. Actual logic: IF experience >= 5 AND skill == Python THEN eligible = TRUE. The LLM cannot hallucinate the policy because the policy lives in code, not in weights.

Every recommendation generates an auditable logic trail. You can trace exactly which rule was triggered, by which data point, in which section of the candidate's file. When a regulator or a plaintiff's attorney asks "why was this person rejected?" — you have an answer that isn't "the model thought so."

We secure this with what we call constitutional guardrails — three layers of protection that run before, during, and after every interaction. Input rails catch adversarial prompts and PII leakage before they reach the core logic. Dialog rails enforce conversational boundaries. Output rails scan every result for hallucinations, toxicity, or policy violations before anything reaches a human recruiter.

This isn't theoretical. For the full technical breakdown of our architecture and the legal framework driving it, see our research paper.

Can You Really Make AI Hiring Fair?

People ask me this all the time, usually with a skeptical tone that implies they think the answer is no.

My honest answer: you can't make it perfectly fair. Fairness in hiring involves inherent trade-offs — mathematical ones, not just philosophical ones. Optimizing for demographic parity (equal selection rates across groups) can conflict with equality of odds (equal true positive and false positive rates). Optimizing for predictive parity (ensuring a high score means the same thing for every group) can conflict with both.

But you can make it dramatically fairer than the status quo, which is either biased humans or biased algorithms pretending to be neutral. And you can make it auditable, which is what the law actually requires.

We use SHAP — SHapley Additive exPlanations — to assign a contribution value to every feature in every decision. "Skill X contributed +15 to this candidate's score. Employment gap contributed -3." We use LIME — Local Interpretable Model-agnostic Explanations — to test whether small changes would flip a decision. If changing a candidate's zip code changes the outcome, something is wrong.

We generate counterfactual explanations: "This candidate was not selected because they lacked certification Y. If they had certification Y, they would have scored above the threshold." That's not a black box. That's a glass box, and it's what the EEOC's May 2023 guidance demands.

The Three-Lines-of-Defense Model That Most Companies Don't Have

Here's something that shocked me when I started talking to enterprise HR teams about their AI tools: most of them have no idea what models they're running.

I mean this literally. I sat in a meeting with the CHRO of a Fortune 500 company — someone responsible for hiring decisions affecting tens of thousands of people annually — and asked, "Can you tell me the selection rates by demographic group for your AI screening tool?" Blank stare. "Can you tell me what model it uses?" Longer blank stare. "Can you tell me who validated it for bias?" She said, "I think the vendor handles that."

The vendor "handles" it. The same vendor who, under the Workday precedent, is now potentially liable as your agent. The same vendor who almost certainly has a clause in their contract disclaiming responsibility for discriminatory outcomes.

Enterprise AI in recruitment requires what risk management professionals call a three lines of defense model:

First line: the business units building and deploying AI. They're responsible for training data selection, blind hiring techniques that anonymize names and graduation years, and day-to-day monitoring.

Second line: risk and compliance oversight. Model registries — a central inventory of every AI model, its purpose, its data sources, its risk tier. Continuous monitoring of selection rates and impact ratios. Vendor vetting that demands documentation of bias testing, not just marketing decks.

Third line: independent audit. NYC's Local Law 144 already mandates annual bias audits by independent third parties for automated employment decision tools. Penalties start at $500 for the first offense and escalate to $1,500 per violation per day. But the real cost isn't the fine — it's what happens when a court orders your company's name sent to millions of potentially aggrieved applicants, which is exactly what the Workday collective certification enables.

Why "Sovereign AI" Is the Future of Enterprise Hiring

The Workday case is accelerating a shift I've been watching for two years: the move toward what I call sovereign AI in enterprise recruitment.

Companies are waking up to the fact that sending their proprietary hiring data to a third-party API means that data could be used to train the next generation of someone else's model. They're realizing that when a public API updates — which happens without notice — their carefully validated screening pipeline can drift overnight, producing different outcomes for the same candidates. They're understanding that general-purpose LLMs lack the domain-specific knowledge graphs needed for accurate professional assessment.

The enterprises I talk to increasingly want to own their models. Run them in their own virtual private clouds. Control when and how they update. Maintain complete audit trails that don't depend on a vendor's goodwill.

This is where we're heading at Veriprajna. We don't sell API access. We build cognitive architecture that encodes institutional knowledge, compliance rules, and deterministic logic into systems that use AI as a powerful interface — not a fallible oracle making life-altering decisions on statistical vibes.

The Thought I Can't Shake

I keep coming back to Derek Mobley. Over 100 applications. Rejected by software, often in minutes, in the middle of the night. No human ever looked at his qualifications. No one ever told him why.

And he's not unusual. He's just the one who sued.

There are millions of people — qualified, experienced, capable people — who have been filtered out of job opportunities by algorithms trained on historical prejudice, deployed without adequate testing, and operated without meaningful oversight. They didn't get a rejection letter explaining that their @hotmail.com email address correlated with an age bracket the model had learned to penalize. They just got silence, or a form email, and moved on to the next application.

The Workday ruling doesn't solve this problem. But it does something almost as important: it makes the problem expensive. And in enterprise software, expensive problems get fixed.

The question is no longer whether AI should be used in hiring. It's whether the AI you're using can survive a deposition.

I build AI for a living, and I believe deeply in its potential to make hiring more fair, more efficient, and more human. But only if we stop treating recruitment AI like a consumer product and start treating it like what it is: a high-stakes decision system that determines people's livelihoods, operating in one of the most heavily regulated domains in American law.

The black box era is over. Build accordingly.

Related Research