
Apple and Goldman Lost Thousands of Disputes Into a State Nobody Knew Existed. It Cost $89 Million.
In June 2020, Apple added a form to the Apple Card dispute flow. A small thing. Before the change, you tapped "Report an Issue," landed in a Messages thread with Goldman Sachs, and your dispute went through. After the change, you had to fill out a second form once the first submission was in.
Here is what that small thing did. If you reported your problem in Messages but never completed the second form, the system decided your dispute was incomplete. It was never sent to Goldman. Nobody investigated it. No acknowledgment letter ever went out. And under Regulation Z, Section 1026.13, a lot of those first messages were valid Billing Error Notices — the kind a creditor is legally required to acknowledge within 30 days and resolve within two billing cycles.
Thousands of them just sat there. Submitted, but routed nowhere. In October 2024, the Consumer Financial Protection Bureau ordered Apple and Goldman Sachs to pay $89 million for it, and barred Goldman from launching a new credit card without a credible compliance plan. I have spent a good part of my career inside dispute operations at a card issuer, and when I read that order I did not see negligence. I saw a state machine with a hole in it, and a hole that every bank I know has somewhere in its own workflow right now.
That is the thing I want to convince you of in this essay: your dispute system has dead states you haven't found yet, and the reason you haven't found them is that you've been testing for them instead of proving they can't exist. The fix is a technique most bankers have never been offered — formal verification — and it's why we built Veriprajna's financial compliance verification practice around it.
The $89 Million Bug Was a State, Not a Mistake

Let me be precise about what went wrong, because the precision is the whole point.
The Apple-Goldman workflow had a reachable state you could describe in one line: the dispute was in FormA_Submitted AND FormB_Pending. From that state, there was no transition to Investigation_Initiated. None. A consumer who stopped after the first message lived there permanently — owed an acknowledgment by law, owed nothing by the system's own logic.
The disputes weren't lost. They were exactly where the system put them. The system was just never told that "here" was a place a dispute could die.
When my team first walked through this on a whiteboard, an engineer said the obvious thing — that it was just a bug, the kind we'd have caught in testing. I asked him how. The Apple-Goldman flow was one integration point between two systems. Most large issuers I've worked with have 10 to 15 systems touching a single dispute — the Visa VROL portal, Mastercard's GCMS, the case-management platform, the core banking ledger, the letter-generation system, the credit-bureau feed, the provisional-credit engine, and a fistful of internal routing queues. Every API change, every partner integration, every new product feature opens fresh paths through that maze.
Testing checks the paths you thought to write down. It is, by definition, a list of the failures you already imagined. The Apple-Goldman dead state was a failure nobody imagined, which is precisely why no test covered it. You cannot test your way out of a problem you can't picture. That's not a process failure. It's a mathematical ceiling.
Why Couldn't We Just Write More Tests?
I lost an afternoon to this argument, so let me save you the afternoon.
The engineer's counter was reasonable: write more test cases. Cover more paths. Get coverage up. So I drew it out. A dispute workflow isn't a line, it's a graph — branches for dispute type, for network, for whether provisional credit was issued, for which deadline regime applies, for whether a partner system acknowledged the handoff. Each branch multiplies the last. A workflow with a few dozen decision points has more reachable states than you will ever write tests for, and the dangerous ones are the combinations nobody sits down and enumerates because they look absurd until they happen.
He kept going for a while. Then I drew the combinatorial blowup — the number of states doubling, then doubling again — and he stopped. You don't out-test a state space. You run out of quarters before you run out of states.
This is the moment I started reading about formal methods seriously, and the technique that matters here has a plain description. A model checker takes a model of your workflow and an invariant — a property that must always hold — and then it explores every reachable state, exhaustively, looking for one where the property breaks. Write the invariant as "every submitted dispute reaches acknowledgment within 30 days," point the checker at the Apple-Goldman flow, and it returns the dead state in seconds, along with the exact step-by-step path that lands you there. Tools like TLA+ do this for distributed systems; the same logic applies cleanly to a regulated workflow with hard deadlines.
Testing samples the paths you imagined. A model checker visits the ones you didn't.
The Timeline Collision Nobody Owns

The part that keeps dispute managers up at night is worse than a single dead state.
A single dual-network card dispute can fire three or four regulatory and network clocks at the same time. Reg Z wants a written acknowledgment in 30 days and resolution within two billing cycles, capped at 90. Reg E, which governs debit and electronic-transfer errors, runs on a completely different schedule — provisional credit and resolution inside 10 business days, with a 45-calendar-day extension. Visa Claims Resolution runs Allocation cases up to 70 days and Collaboration cases up to 100. Mastercard's dispute framework runs 45 to 120 depending on the cycle, and the industry described its rollout as even harder to absorb than Visa's.
When those regimes collide on one dispute, compliance comes down to whoever on the dispute desk remembers which deadline governs that morning. I have watched a senior analyst apply Reg E timing to a Reg Z billing error because the two blur together under volume — and that exact confusion is one of the violations CFPB examiners look for. Citizens Bank got cited, in part, for denying billing-error notices over a missing affidavit. The deadlines are not soft. They are binary. You either acknowledged in 30 days or you didn't.
And this is the quiet reason formal verification fits financial compliance better than almost any other domain I can think of: the rules are already math. Thirty days is not a judgment call. "Provisional credit within 10 business days" is not a vibe. These are temporal constraints over a state machine, and temporal logic was built to prove exactly this kind of property — that across every possible path a dispute can take, no clock is ever blown.
I Built the Wrong Thing First
I want to tell you about the version we shipped that didn't work, because it's the version most of the market is still selling.
Our first instinct was monitoring. Build a system that tracks every dispute, watches the deadlines, and lights up red when one is about to breach. Dashboards. Alerts. A provisional-credit queue counting down — 9 days, 23 hours, go. It demoed beautifully. I was proud of it.
Then I showed it to a compliance lead at a pilot bank, and she gave it a polite shrug. It took me a few days to understand the shrug. A monitor tells you a dispute is dying while it dies. It's a smoke alarm. It assumes the dispute made it into the system to be tracked in the first place — which is the one assumption Apple-Goldman shattered, because those disputes were never tracked at all; they were in a state the tracker didn't know to watch. Monitoring is confession after the fact. It is the bank discovering its own failure a little earlier than the examiner would have. That's worth something, but it is not what she needed to put in front of her board.
Monitoring buys you the failure a few days before the examiner does. It never buys you the failure that can't happen.
That shrug cost us a rebuild, and it was the best thing that happened to the product. The market is crowded with the thing I built first. FINBOA tracks Reg E deadlines and automates provisional credit; it's good at it. Quavo automates dispute processing and posts real numbers — one credit union hit an 87% automation rate inside a month. FIS processes chargebacks through the network portals. Every one of them automates or watches the disputes that enter the system. Not one of them proves a dispute can't be lost before it enters.
Who Else Is Actually Proving Anything?
After the rebuild I went looking for who else was proving things, not watching them, and the answer is almost nobody.
The one serious name is Imandra. They do real formal verification in financial services — mathematical proofs of correctness — and their clients include Goldman Sachs itself. But their world is capital markets: exchange-matching logic, trading protocols. Consumer compliance, Reg Z, dispute workflows — that's not what they're pointed at, and on roughly $5 million in total funding they haven't pointed there.
Meanwhile the capital is pouring into the other corner of compliance. Bretton AI, formerly Greenlite, raised a $75 million Series B in February 2026 and serves OCC-regulated banks — for KYC, anti-money-laundering, and onboarding. SymphonyAI's Sensa platform cut false positives in a Spanish bank's sanctions screening by 91.8%. Alloy orchestrates identity across 800-plus institutions. Real money, real engineering — all of it aimed at financial crime and onboarding, none of it at whether your dispute resolution can violate a regulation.
That's the gap, and it's not a narrow one. The closest players to dispute work (FINBOA, Quavo) automate it without verifying it. The only verifier (Imandra) works a different market. Combine those two and you get a corner of the field that, as far as I can tell, no one is standing in: formal verification pointed at consumer dispute compliance. That's the corner we walked into.
The Sentence That Rebuilt Our Pitch
The thing that finally made the value land for me didn't come from an engineer. It came from a former examiner I sat down with.
I asked her what she actually wanted in the room during a billing-error review. She didn't say "good dashboards." She said, more or less, that she wanted to know the bank could show its process couldn't fail the deadline — not that it usually didn't. The whole posture flips on that one word. The board attestation slide that reads "we tested it and it passed" is an admission you only checked the paths you thought of. A proof says something categorically stronger: across every path the workflow allows, the invariant holds.
And the regulators are quietly walking toward this on their own. OCC Bulletin 2025-26 clarified that any quantitative method materially driving a risk or compliance decision is a "model" — and the bulletin is explicit that the sophistication of AI or machine learning doesn't exempt it from validation. Read that carefully: an automated dispute router is now examinable as a model, not just as an ops tool. And formal verification is the strongest model validation that exists. It doesn't sample the behavior. It proves the property.
Across the Atlantic, the EU AI Act classifies credit and creditworthiness AI as high-risk, with a hard compliance deadline of August 2, 2026, and extraterritorial reach that pulls in US providers serving the EU market. High-risk classification means you must demonstrate provable system properties — accuracy, robustness, oversight. That is, again, exactly what a proof delivers and a test cannot.
Isn't This Overkill for a Dispute Queue?
People ask me some version of this constantly, so let me take the three honest objections head-on.
The first is cost. Verification sounds like a science project, and the consulting alternative — a Big 4 firm redesigning your process — runs $500K to several million and hands you a redesigned process nobody mathematically checked. Set that against the downside. Apple-Goldman was $89 million. Wells Fargo's consent order, across multiple failures including dispute handling, ran to $3.7 billion. You are not buying verification against the cost of a SaaS license; you're buying it against the tail.
The second is "our volume is fine." Global chargebacks are projected to hit 337 million annually by 2026 and to climb 24% through 2028, with entirely new dispute types arriving — including disputes from autonomous AI agents buying things customers never approved. The manual desk that "handles it" today is the same desk where 42% of institutions still run compliance on manual processes, per Wolters Kluwer's Q1 2026 report. Rising volume plus manual memory is how a dead state goes unnoticed for four years.
The third is "we passed our last exam." So did Apple and Goldman, presumably, right up until they didn't. Passing an exam means an examiner sampled your paths and didn't hit the broken one. It is the same sampling problem as testing, wearing a different badge.
What I Believe Now
I went into dispute operations thinking compliance was a discipline of diligence — careful people, good checklists, deadlines met. I came out of it convinced that diligence is the wrong tool for a problem this shape. You cannot be diligent enough to cover a state space that doubles every time someone ships a feature. The careful people at Apple and Goldman didn't lose those disputes through carelessness. They lost them through a state nobody drew.
So the question I'd put to anyone running a dispute desk isn't "are your people careful." It's narrower and colder than that: can you draw the state your next dropped dispute will die in? If you can't draw it, you can't test for it, and if you can't test for it, the only honest thing left is to prove it can't exist. That's the work — proving your dispute workflows can't violate Reg Z, Reg E, and the network timelines, on every path, before an examiner finds the one you missed. If that's the assurance you actually need, that's what we build.
Apple had thousands of engineers and a dead state anyway. Engineers were never the missing variable. A proof was.


