AI Bias Audits That Find Root Causes, Not Just Statistical Gaps
We audit AI systems for discriminatory outcomes across hiring, lending, insurance, and healthcare, then build the mitigation pipelines that fix root causes.
Solutions for Fairness Audit & Bias Mitigation
Related Industries
Frequently Asked Questions
How much does an AI bias audit cost, and how long does it take?
Scope drives cost more than any other factor. A focused LL144-compliant audit of a single hiring tool with standard four-fifths rule analysis can be completed in two weeks. That is the checkbox. A comprehensive audit with causal proxy analysis, intersectional subgroup testing, counterfactual fairness evaluation, and documented mitigation recommendations takes 6-10 weeks depending on system complexity, data availability, and how many regulatory frameworks apply. The cost question most buyers should actually ask is what non-compliance costs. NYC LL144 penalties run $500-$1,500 per violation per day. Colorado SB 205 allows up to $20,000 per violation. Massachusetts just settled an AI lending discrimination case for $2.5 million. The audit investment is a fraction of a single enforcement action.
Which fairness metric should we use if we cannot satisfy all of them simultaneously?
You cannot satisfy all of them. Chouldechova (2017) proved that when base rates differ across groups, no imperfect classifier can simultaneously achieve calibration and equal false positive and false negative rates. Kleinberg, Mullainathan, and Raghavan proved a broader impossibility. The metric choice depends on the domain and the legal framework. In lending, calibration matters because risk scores need to reflect actual default probabilities for pricing to work. Equalized odds matter because disparate error rates create disparate impact liability under ECOA. In hiring, selection rate parity (demographic parity) is the starting point because the four-fifths rule operationalizes it, but equalized odds and predictive parity matter for test validation under the Uniform Guidelines. We walk through the impossibility tradeoffs for each client's specific system and regulatory context before anyone starts optimizing.
We removed race from our model. Why does it still show disparate impact?
Because other features carry the same information. Zip code encodes residential segregation. Graduation year and technology stack vintage reconstruct age. Employment gaps correlate with disability and caregiving. Educational institution correlates with race and socioeconomic status. Commute distance correlates with neighborhood composition. Removing the label does not remove the signal. We use causal graph analysis to trace every path from protected attributes through the feature space to the model's output. Some of those paths flow through legitimate business factors (credit history genuinely predicts repayment). Others flow through proxies that encode historical discrimination. The audit identifies which paths are which, and the mitigation targets the proxy paths without disrupting the legitimate ones.
Can we get sued for AI discrimination when we use a vendor's hiring tool?
Yes. Employers remain liable for discriminatory outcomes from vendor AI tools under Title VII, FCRA, and state employment laws. In Mobley v. Workday (May 2025), the court held that an AI vendor can be directly liable for employment discrimination as an agent of the employer. In the Eightfold AI class action (January 2026), both the platform and employers using it face exposure. Colorado SB 205 imposes separate obligations on deployers regardless of who built the system. Using a vendor tool does not shift the legal risk. It shifts the technical complexity, because you need to audit a system you did not build. We audit vendor AI tools using output-based testing methods that do not require access to the vendor's source code or model internals.
What is the difference between a LL144 compliance audit and a comprehensive fairness audit?
LL144 requires an independent bias audit examining selection rates by race/ethnicity and sex/gender categories, with results publicly posted. That is a floor, not a ceiling. The December 2025 NY State Comptroller audit found that DCWP's enforcement was superficial: 75% of complaints were misrouted, and auditors identified 17 potential violations where DCWP found only 1. A comprehensive audit goes further: intersectional analysis across combined protected attributes, causal proxy detection for features that encode protected information indirectly, counterfactual fairness testing, sensitivity analysis quantifying how robust fairness claims are to unmeasured confounders, and documented accuracy-fairness tradeoff analysis with the impossibility theorem implications spelled out. The comprehensive version is what survives scrutiny when enforcement gets serious.
How do we test for bias when we do not have access to protected-attribute data?
This is one of the hardest practical problems in fairness auditing. All major toolkits (AIF360, Fairlearn) require protected attributes as input. In practice, many organizations cannot collect individual-level race, gender, or disability data, particularly outside employment contexts. Options include Bayesian Improved Surname Geocoding (BISG) to infer race from name and geography (used by CFPB for fair lending analysis), ecological inference methods that estimate group-level disparities from aggregate data, and output perturbation testing where you construct counterfactual inputs varying only protected-attribute proxies and measure decision changes. Each method has limitations. BISG introduces its own bias for multiracial individuals and certain ethnic groups. We select and validate the proxy method against the specific population and regulatory context, documenting the uncertainty introduced by the imputation rather than treating estimated attributes as ground truth.
Do we need continuous bias monitoring, or is an annual audit sufficient?
An annual audit is what regulators require as a minimum. It is not what keeps you safe. Models interacting with feedback loops develop bias between audits: a hiring tool that rejects candidates from certain backgrounds trains on its own exclusions, reinforcing the pattern. Lending models where approval patterns influence credit bureau data create self-fulfilling prophecies. Input distributions shift as customer demographics or applicant pools change. We build continuous monitoring with sequential hypothesis testing that detects emerging fairness degradation in real time, triggering alerts when metrics cross thresholds. The monitoring tracks both aggregate and intersectional subgroup performance separately. The practical difference: an annual audit tells you the model was biased for months. Continuous monitoring tells you within weeks.
How do we audit LLMs and generative AI for bias when traditional fairness metrics do not apply?
Traditional fairness metrics (equalized odds, demographic parity) were designed for binary classifiers with defined protected groups and measurable outcomes. Generative AI produces open-ended text, images, or recommendations where bias manifests as stereotypical associations, differential quality, or refusal patterns across demographic contexts. We evaluate LLM bias through scenario-based probing (systematically varying demographic signals in prompts and measuring output differences), benchmark suites (BBQ, StereoSet, CrowS-Pairs) adapted to the specific deployment context, and output auditing where production outputs are sampled and evaluated for disparate treatment patterns. The challenge is that LLMs show stereotype-aligned errors up to 77% of the time in ambiguous contexts (BBQ benchmark research), and these biases are harder to mitigate than classifier bias because they are distributed across billions of parameters rather than concentrated in a few features.
What are the actual penalties for failing a bias audit or not conducting one?
Penalties are escalating rapidly across jurisdictions. NYC LL144: $500-$1,500 per violation per day, multiplied by each affected applicant. Colorado SB 205 (effective June 2026): up to $20,000 per violation under the Consumer Protection Act. EU AI Act (enforcement August 2026): up to EUR 35 million or 7% of global turnover for prohibited practices, EUR 15 million or 3% for high-risk non-compliance. Beyond statutory penalties, litigation exposure is real: the Massachusetts AG extracted $2.5 million from a single AI lending discrimination settlement in July 2025. SafeRent paid $2.275 million for housing screening algorithm discrimination. Mobley v. Workday is the first certified nationwide AI bias collective action. The cost of a comprehensive audit is a rounding error compared to any of these outcomes.
What regulatory requirements apply to AI bias in insurance underwriting?
Colorado SB 21-169 is the most prescriptive. It prohibits insurers from using external consumer data, algorithms, and predictive models in ways that produce unfairly discriminatory outcomes based on protected characteristics. Auto and health insurers must submit annual compliance reports by July 1, 2026. The Colorado Division of Insurance has proposed quantitative testing comparing models with and without estimated race variables to identify discriminatory features. Separately, 24 states have adopted the NAIC Model Bulletin requiring insurers to include transparency and fairness in their AI governance programs. The actuarial fairness question is distinct from statistical fairness: actuarially justified risk differentiation can produce statistical disparities that are legally permissible under insurance law but would be discriminatory under employment or lending law. We help insurance teams navigate this distinction with testing frameworks built for the insurance-specific regulatory context.
Build Your AI with Confidence.
Partner with a team that has deep experience in building the next generation of enterprise AI. Let us help you design, build, and deploy an AI strategy you can trust.
Veriprajna Deep Tech Consultancy specializes in building safety-critical AI systems for healthcare, finance, and regulatory domains. Our architectures are validated against established protocols with comprehensive compliance documentation.