A visual metaphor contrasting a transparent, auditable AI system against an opaque black-box system, set in the context of hiring/employment decisions, specific to the article's domain of AI regulation and compliance.
Artificial IntelligenceTechnologyBusiness

95% of Companies Are Breaking an AI Law Most People Don't Know Exists

Ashutosh SinghalAshutosh SinghalMarch 18, 202612 min read

I was on a call with a Fortune 500 CHRO — Chief Human Resources Officer — in early January when she said something that stopped me mid-sentence.

"Our legal team told us it's safer to just not comply."

She wasn't being reckless. She was being rational. Her company uses an AI-powered screening tool for hiring in New York City, which means they're subject to Local Law 144 — a regulation requiring companies to publicly post bias audits of any automated tool that helps make employment decisions. Her lawyers had done the math: the penalty for non-compliance was a fine of $500 to $1,500 per violation. The penalty for compliance — actually publishing the bias audit — was handing plaintiffs' attorneys a statistical roadmap to a discrimination lawsuit.

That conversation crystallized something I'd been circling for months. We don't have an AI ethics problem. We have an AI architecture problem. And the December 2025 audit by the New York State Comptroller just proved it.

The Audit That Broke the Illusion

A comparison infographic showing the dramatic enforcement gap — the city found 1 violation vs. the state finding 17 in the same 32-company sample, plus the 75% misrouted calls and 95% non-compliance rate.

On December 2, 2025, New York State Comptroller Tom DiNapoli released an audit of how New York City's Department of Consumer and Worker Protection — the DCWP — had been enforcing Local Law 144. The results were devastating.

The city's own reviewers had looked at 32 employers and found exactly one instance of potential non-compliance. State auditors examined the same 32 companies using more rigorous technical methods. They found 17 violations.

That's not a rounding error. That's a 1,600% gap between what the city caught and what was actually happening.

I remember reading the audit report on my laptop at 11 PM, scrolling through the findings, and feeling a mix of vindication and dread. Vindication because this was exactly the kind of failure we'd been warning our clients about — superficial AI compliance crumbling under real scrutiny. Dread because the scale of the problem was worse than even I had expected.

When the city found 1 violation and the state found 17 in the same sample, that's not an enforcement gap — it's an enforcement fiction.

The audit uncovered something almost comically broken: 75% of test calls to the city's 311 hotline about AI hiring issues were misrouted and never reached the DCWP. The agency admitted it lacked the technical expertise to evaluate whether companies were actually using automated decision tools. They never once consulted the city's own Office of Technology and Innovation. The entire enforcement apparatus was, functionally, a Potemkin village.

Why Most Companies Chose Silence

Here's where it gets worse. A study by Cornell University, Data & Society, and Consumer Reports examined 391 employers subject to Local Law 144. Of those, only 18 had published the required bias audits. Only 13 had posted transparency notices.

That means roughly 95% of covered employers were simply ignoring the law.

My team and I spent a week going through that study, cross-referencing the findings with what we knew about the AI tools these companies were using. We had a running argument in our office about whether this was mass negligence or something more calculated. My CTO thought it was laziness — companies just hadn't gotten around to it. I disagreed.

I think most of these companies ran the audits internally, saw the numbers, and panicked.

The reason is structural. Local Law 144 requires you to publish "impact ratios" — essentially, you compare the selection rates of different demographic groups and check whether they meet the EEOC's four-fifths rule. If your tool selects men at a rate of 60% and women at 40%, that's a ratio of 0.67 — below the 0.80 threshold that signals potential disparate impact.

The problem? Most AI hiring tools built on general-purpose language models will fail this test. Not because they're maliciously biased, but because they're trained on internet-scale data that reflects decades of societal bias. When you run the numbers honestly, the bias shows up. And when you publish those numbers, you've created a legal exhibit.

That CHRO I spoke with wasn't being cynical. She was describing the rational response to a system where the tools themselves generate evidence of discrimination the moment you audit them.

What Happens When You Build AI on Vibes

I need to explain something about how most enterprise AI actually works today, because it's the root of this entire crisis.

The dominant model in the market right now is what I call the "Wrapper Economy." A consulting firm takes a foundational model — GPT-4, Claude, Gemini — wraps a thin layer of custom prompts and API calls around it, and sells it to an enterprise as a solution. Resume screening, claims processing, risk assessment — the wrapper handles the interface, but the thinking happens inside a model that nobody in the enterprise controls, audits, or fully understands.

These models work by predicting the most statistically likely next word in a sequence. They operate on what I'd call semantic plausibility — what sounds right — not forensic reality — what is right.

I learned this the hard way. Early in Veriprajna's life, before we'd fully committed to our current architecture, we ran a test where we asked a leading LLM to evaluate a set of resumes while "ignoring gender." It still discriminated. Not overtly — it didn't flag "female" as a negative. But it systematically favored resumes that mentioned certain universities, used certain phrasing patterns, and listed certain extracurricular activities — all of which were statistically correlated with gender in the model's training data.

Telling an LLM to "ignore gender" is like telling someone to not think about elephants. The correlations are baked into the weights. You can't prompt your way out of structural bias.

When Colorado or the EU asks you to explain why your AI made an adverse decision, a wrapper system can only give you a post-hoc story — a plausible-sounding narrative about why it thinks it decided what it decided. That's not an explanation. That's a hallucination about its own reasoning. I wrote about this problem in depth in the interactive version of our research, where we walk through how this "auditability gap" plays out across different regulatory regimes.

The Compliance Trilemma Nobody's Talking About

A diagram showing the four overlapping AI regulations and their conflicting architectural requirements, illustrating why a single compliance approach cannot satisfy all of them simultaneously.

Here's what keeps me up at night: it's not just New York anymore.

By mid-2026, companies operating across multiple states and in Europe will face at least four overlapping — and technically conflicting — AI regulations:

New York City's Local Law 144 demands you publish intersectional bias statistics broken down by race and sex. Colorado's SB 24-205, effective June 2026, requires a broader "reasonable care" standard and mandatory disclosure to the Attorney General if you discover algorithmic discrimination. Illinois HB 3773 bans the use of zip codes as proxies for protected classes — a technique that many bias-mitigation tools actually rely on. And the EU AI Act demands detailed documentation of training data provenance and "conformity assessments" for high-risk systems.

These aren't just different rules. They're architecturally incompatible in places.

A data-masking technique you use to comply with the Illinois zip code ban might destroy the data representativeness that the EU requires. A bias audit that satisfies New York's race-and-gender framework might fail Colorado's standard if it doesn't also account for age and disability. And none of these frameworks accept "we used GPT-4 and added some guardrails" as a conformity assessment.

I had an investor tell me last year, "Just use GPT and add a compliance layer on top." I asked him: which compliance layer? For which jurisdiction? Tested against which metrics? He didn't have an answer, because there isn't one. You can't bolt compliance onto a system that was never designed to be auditable.

What We're Actually Building Instead

An architecture diagram showing the neuro-symbolic approach — how the neural network layer feeds into the symbolic logic layer before reaching output, with specific labels showing the audit trail and rule enforcement mechanism.

At Veriprajna, we made a bet early on that I'll admit felt lonely at the time: we rejected the wrapper model entirely. No thin API layers. No prompt-engineering-as-a-product. Instead, we build what I call Deep AI — systems engineered from the ground up for determinism, traceability, and sovereign control.

The core idea is a neuro-symbolic architecture — a system that separates the "voice" from the "brain." The neural network handles pattern recognition: reading resumes, parsing documents, identifying relevant features. But every decision passes through a symbolic logic layer — hard-coded rules derived from actual law, industry ontologies, and domain constraints — before it reaches the user.

When Illinois says you can't use zip codes as a proxy for race, our symbolic layer doesn't "try" to avoid it. It blocks it, deterministically, and logs exactly which rule was triggered and why. When a regulator asks for an explanation, we don't generate a plausible story. We provide a traceable chain of logic from input to output.

The difference between a wrapper and Deep AI is the difference between "the model said so" and "here's the exact rule, the exact input, and the exact logical path that produced this decision."

We also insist on what we call sovereign infrastructure — deploying models on the client's own cloud, not routing sensitive data through public APIs. When you send employee records or candidate information to a third-party API, that data can be logged, embedded in future training runs, or exposed through security vulnerabilities you have no visibility into. For a company subject to GDPR, CCPA, and now these AI-specific regulations, that's an unacceptable risk.

For the full technical breakdown of how these architectures work — the graph-based traceability, the physics-informed verification layers, the edge-native deployment models — I'd point you to our detailed research paper. The engineering is deep, but the principle is simple: every output must be provably correct, not probably correct.

"But Isn't This Overkill?"

People ask me this constantly. Usually it's someone who's been sold on the wrapper model and thinks I'm overcomplicating things.

Here's my answer: the New York State Comptroller just demonstrated that even a friendly regulatory review — one conducted by the city's own agency — was operating at a 1,600% error rate. The state-level auditors caught 17 violations where the city caught 1. And that was with a sample of just 32 companies.

What happens when Colorado's Attorney General starts investigating? What happens when the EU begins its conformity assessments? What happens when a plaintiff's attorney subpoenas your model's decision logs and finds that your "bias mitigation" was a system prompt that said "please be fair"?

This isn't a hypothetical. The Comptroller's audit explicitly recommended moving from passive, complaint-driven enforcement to proactive, research-driven investigation. The DCWP agreed to adopt this recommendation. The era of "nobody's checking" is over.

There's another objection I hear: "We'll just wait for the regulations to settle before investing." I understand the impulse. But the Cornell study found that the companies who waited — the 95% who didn't comply — are now sitting on a growing pile of legal exposure with no infrastructure to address it. When enforcement ramps up, they won't have months to build compliant systems. They'll have weeks.

The Night the Numbers Changed My Mind

I want to share a moment that fundamentally changed how I think about this problem.

About eight months ago, we were running a proof-of-concept for a financial services client. They wanted to test whether their existing AI screening tool — a well-known vendor's product, built on a major LLM — could pass a simulated LL144 audit. My team set up the test, ran the impact ratios, and sent me the results at about 9 PM.

I expected marginal failures — ratios of 0.75 or 0.78, close enough that some recalibration might fix them. Instead, we found impact ratios as low as 0.58 for certain intersectional categories. Not close to the threshold. Not fixable with prompt tuning. Structurally, fundamentally biased in ways that were invisible to the people using the tool every day.

I sat in my home office staring at those numbers and realized something that I think the entire industry needs to confront: the tools most companies are using right now would fail their own bias audits. The 95% non-compliance rate isn't just about companies ignoring the law. It's about companies who looked at what compliance would reveal and decided they'd rather not know.

That's not a regulatory problem. That's an engineering failure.

Where This Goes Next

The December 2025 audit is not the end of AI regulation's growing pains. It's the beginning of its adolescence. Regulators are learning from their mistakes. The next generation of enforcement won't rely on 311 hotlines and self-reported disclosures. It will use forensic tools to examine what your AI actually does — not what your compliance team says it does.

For the enterprise, this means the window for architectural decisions is closing. You can't retrofit determinism onto a probabilistic system. You can't bolt auditability onto a black box. You can't comply with four conflicting jurisdictions using a single prompt template.

The companies that will thrive in 2026 and beyond are the ones making the harder choice now: building systems where every decision has a traceable, auditable, defensible chain of logic. Not because it's easy. Not because it's cheap. Because it's the only architecture that survives contact with a regulator who actually knows what they're looking for.

The era of AI built on vibes is over. What replaces it will be defined by companies willing to engineer certainty into systems that the rest of the industry built on probability.

The CHRO I spoke with in January called me back last week. Her legal team had read the Comptroller's audit. They weren't advising non-compliance anymore. They were asking how fast we could deploy.

Related Research