Financial Compliance Verification
Apple and Goldman Sachs had thousands of engineers, billions in revenue, and a dispute resolution workflow that silently dropped tens of thousands of valid billing error notices into a technical void. The CFPB found it. They paid $89 million.
We build formal verification systems that mathematically prove your dispute workflows comply with Reg Z, Reg E, and card network timelines. Not testing. Not monitoring. Proof.
$89M
Apple-Goldman consent order for dispute system failures
CFPB, October 2024
337M
Projected annual chargebacks globally by 2026
Chargebacks911
42%
Of banks still rely on manual compliance processes
Wolters Kluwer, Q1 2026
The Apple-Goldman failure was not a freak accident. It is the pattern that every bank with a multi-system dispute workflow is exposed to right now.
In June 2020, Apple added a "forms feature" to the Apple Card dispute workflow. Before the change, a consumer would tap "Report an Issue," enter a Messages conversation with Goldman Sachs, and the dispute would be transmitted. After the change, consumers were asked to complete a secondary form after the initial submission.
Here is the bug: if a consumer submitted the initial dispute via Messages but did not complete the secondary form, the system logic treated the dispute as incomplete. No transmission to Goldman Sachs. No investigation. No acknowledgment letter. The consumer was held responsible for the disputed charge.
Under Regulation Z Section 1026.13, those initial Messages submissions often constituted valid Billing Error Notices. The regulation requires the creditor to acknowledge the notice within 30 days and resolve it within two billing cycles. Instead, the disputes sat in a dead state: submitted but never routed.
This is a state machine problem. The dispute workflow had a reachable state (FormA_Submitted AND FormB_Pending) from which no transition existed to (Investigation_Initiated). A TLA+ model checker would have found this dead state in seconds by exhaustively exploring every reachable state in the workflow and checking the invariant: every submitted dispute must reach investigation within 30 days.
Apple and Goldman had one integration point between two systems. Most large issuers have 10-15 systems touching a single dispute: the card network portal (Visa VROL/Mastercard GCMS), the case management platform, the core banking ledger, the letter generation system, the credit bureau reporting system, the provisional credit engine, and several internal routing queues.
Every system change, API update, or partner integration creates new paths through this workflow. Any of those paths can introduce a dead state. Testing checks the paths you anticipate. Formal verification checks all of them.
Your dispute team is juggling four overlapping regulatory and network timeline regimes simultaneously. When they conflict, compliance depends on your staff knowing which deadline governs.
| Regime | Acknowledgment | Resolution | Provisional Credit | Governs |
|---|---|---|---|---|
| Reg Z (1026.13) | 30 days | 2 billing cycles (max 90 days) | Not required during investigation | Credit card billing errors |
| Reg E (1005.11) | N/A | 10 business days (45 calendar ext.) | Within 10 business days | Debit/EFT errors |
| Visa VCR | N/A | 30-70-100 days (varies by type) | Network-specific rules | Visa-branded transactions |
| Mastercard DR | N/A | 45-120 days (varies by cycle) | Network-specific rules | Mastercard-branded transactions |
A single dual-network card dispute can trigger Reg Z, Visa VCR, and Mastercard DR requirements simultaneously. Manual processes cannot guarantee all deadlines are met for every possible dispute path.
Pull this table up the next time someone proposes a compliance automation vendor. The question is not whether they automate disputes. It is whether they can prove the automation is compliant.
| Approach | What It Does | Compliance Assurance | Gap |
|---|---|---|---|
| FINBOA | Reg E dispute tracking, deadline alerts, provisional credit automation | Tracks timelines after disputes enter the system | No verification that disputes can't be lost before entering the system. Reactive alerting, not proactive proof. |
| Quavo | End-to-end dispute automation, 87% automation rate at credit unions | Automates dispute processing steps | Automates the happy path. No guarantee the automation handles every edge case. No cross-regime timeline verification. |
| Imandra | Formal verification for exchange matching logic, trading protocols | Mathematical proofs of protocol correctness | Capital markets only. Does not address consumer compliance, Reg Z/E, or dispute workflows. |
| SymphonyAI Sensa | AI-native AML/sanctions/financial crime platform, 91.8% false positive reduction | Strong for AML and sanctions screening | Financial crime focused. Does not handle dispute resolution compliance or regulatory timeline verification. |
| Bretton AI (fka Greenlite) | KYC/AML automation, $75M Series B (Feb 2026), serves OCC-regulated banks | Regulatory-first design for onboarding compliance | Onboarding and financial crime. No dispute resolution. No formal verification. |
| FIS Disputes Direct | Chargeback processing, card network portal integration (VROL, Mastercom) | Processes chargebacks per network rules | Mechanical processing, not compliance verification. Known integration challenges: expensive customization, heavy IT maintenance. |
| Big 4 / Large SIs | Compliance program design, process re-engineering, regulatory remediation | Policy and process consulting | They redesign processes but don't verify them mathematically. Engagements run $500K-$5M+ and produce documentation, not proofs. The process they design can still have dead states. |
| Internal team + testing | Manual QA, scenario testing, periodic compliance audits | Tests known scenarios | Only checks paths you anticipate. Cannot prove absence of violations. Apple-Goldman had internal warnings and still missed the forms bug. Honest limitation: no testing approach can be exhaustive for complex workflows. |
Every approach above either automates dispute processing or manages compliance programs. None mathematically proves the workflow is compliant.
Each engagement is custom. These are the capabilities we reach for based on where your dispute compliance risk is highest.
We model your entire dispute resolution workflow as a formal state machine in TLA+. Every state your dispute can occupy, every transition between states, every handoff between systems. Then we run model checking to exhaustively verify two properties: no dispute can reach a dead state (the Apple-Goldman bug), and every dispute path satisfies Reg Z timing requirements.
When the checker finds a violation, it produces a counterexample: a specific sequence of events that leads to the failure. That counterexample tells your engineering team exactly what to fix.
A credit card dispute on a Visa-branded card triggers Reg Z (30-day ack, 90-day resolution) and Visa VCR (30-day acquirer response, 70-day allocation) simultaneously. A debit dispute adds Reg E (10-day provisional credit, 45-day resolution). We encode all applicable deadline regimes into one model and verify that no dispute path violates any of them.
When Visa or Mastercard updates their dispute timelines, we re-run verification against the new constraints. You know within hours whether your workflow still complies, instead of discovering a gap during your next exam.
Every system change creates risk. A new form field, an API version update, a partner integration change. We integrate formal verification into your change management process. Before any modification to the dispute workflow goes live, we re-verify the entire state machine.
If the change introduces a path that could violate a regulatory timeline, the deployment is blocked with a counterexample. Your compliance team sees exactly which regulation would be violated and under what conditions, before a single customer is affected.
The Apple-Goldman case was a boundary failure. Disputes were lost between Apple's system and Goldman's system. We model every handoff point in your dispute workflow: the card network portals (VROL, Mastercom, GCMS), the core banking integration, the letter generation service, the credit bureau reporting feed.
We verify that no dispute can be dropped at any boundary, even under failure modes: network timeouts, partial submissions, batch processing delays, concurrent updates. Each boundary gets a formal specification of what must be true before and after the handoff.
OCC Bulletin 2025-26 requires that AI systems driving compliance decisions be validated as models under SR 11-7. The EU AI Act classifies financial AI as high-risk with a compliance deadline of August 2, 2026. Formal verification produces the strongest possible validation artifact: a mathematical proof, not a test report.
We generate documentation that maps directly to OCC examination expectations and EU AI Act conformity assessment requirements, including the specific system properties proven, the verification methodology, and any identified limitations with counterexamples.
When CFPB examiners run Module 4 (Billing Error Resolution) of their Reg Z examination procedures, they evaluate the quality of your compliance management system and internal controls. A dashboard showing real-time verification status of every dispute workflow property replaces the typical binder of policies and test results.
Each property displays its verification status (proven, counterexample found, pending re-verification), the date of last verification, and any model changes since the last proof. The examiner sees exactly which regulatory requirements have been mathematically verified and which are still under review.
Formal verification is not a quick overlay. It requires understanding your systems deeply before modeling them. We are transparent about the timeline and what each phase requires from your team.
We catalog every system touching your dispute workflow. Core banking APIs, card network portal integrations, case management platforms, letter generation systems, credit bureau reporting feeds. For each system, we document its behavior: synchronous vs. batch, latency characteristics, failure modes, retry logic.
For COBOL mainframes and legacy core systems, we work with your technical team to understand the actual behavior, not the documented behavior. FIS Code Connect and Temenos Transact have specific limitations around real-time state synchronization that we need to capture accurately.
Your team's involvement: 2-3 hours per week from a dispute operations lead and a technical architect who knows the integration layer. We also need read access to your dispute workflow documentation and system architecture diagrams.
We encode your dispute workflow as a TLA+ state machine specification. Every state, every transition, every regulatory constraint. The specification is readable by your compliance team (TLA+ is closer to structured English than to code), and we walk them through it to confirm the model matches reality.
Then we run the TLA+ model checker. It exhaustively explores every reachable state in the workflow and verifies the safety properties: no dead states, all Reg Z timing requirements satisfied, all Reg E requirements satisfied where applicable, all card network timelines met.
What to expect: The first model-checking run almost always produces counterexamples. This is the point. Each counterexample is a specific violation path that your team can evaluate and fix. Institutions under active consent order monitoring can use these results immediately to demonstrate proactive compliance improvement.
Once the base model is verified, we layer in the cross-regime constraints: Visa VCR allocation/collaboration timelines, Mastercard dispute resolution windows, and the interaction effects when multiple regimes apply to the same dispute. This is where the complexity lives, and where manual compliance management most often fails.
We also integrate the verification into your change management workflow. This means connecting the formal model to your CI/CD pipeline or change approval process so that system modifications are re-verified before deployment.
Honest caveat: State space explosion is a real constraint in formal verification. For workflows with many concurrent systems and high branching factors, we use abstraction techniques (compositional verification, symmetry reduction) to keep the model tractable. We are upfront about which properties we can verify exhaustively and which require bounded checking.
Regulatory requirements change. Card network rules change. Your systems change. The formal model is a living artifact that we maintain and re-verify as your environment evolves. When the CFPB updates Reg Z commentary, when Visa adjusts VCR timelines, when you upgrade your core banking API, we update the model and re-run verification.
During examinations: We provide the verification artifacts, the compliance dashboard, and if needed, a technical walkthrough for examiners. The goal is to shift your compliance posture from "we believe we're compliant" to "we can demonstrate, with mathematical proof, that our workflow satisfies these specific regulatory requirements."
Answer these questions about your current dispute resolution infrastructure. The assessment identifies where formal verification would address your highest-risk gaps, and where other improvements should come first.
Question 1 of 6
Testing checks specific scenarios you think of. Formal verification checks every possible scenario, including ones you did not anticipate. Your QA team might test 200 dispute paths and find them compliant. A formal verifier explores every reachable state in the workflow and either proves compliance holds universally or produces a concrete counterexample showing exactly how a violation occurs.
The Apple-Goldman forms bug is a textbook example: the incomplete-form-plus-valid-dispute path was never in any test plan, but a TLA+ model checker would have found it in seconds.
The practical difference is that testing gives you confidence, while verification gives you proof. When a CFPB examiner asks how you know your dispute workflow satisfies the 30-day acknowledgment requirement, testing lets you say "we checked 200 scenarios." Verification lets you say "we proved it holds for all possible inputs, system states, and failure modes." That distinction matters when the examiner has seen the Apple-Goldman consent order and is looking for the same patterns in your systems.
Yes, and this is a common concern we address in the first engagement phase. Formal verification does not require replacing or modifying your core banking infrastructure. We model the behavior of your existing systems as they participate in the dispute workflow, including their latency characteristics, batch processing windows, and failure modes.
For COBOL mainframes specifically, we catalog the APIs and data schemas during the initial assessment (typically 6-8 weeks), then build the formal model around how those systems actually behave, not how they should behave. FIS Code Connect, Temenos Transact, and proprietary core systems all have specific limitations around real-time dispute state synchronization. We model those limitations explicitly.
If your mainframe processes dispute updates in nightly batches, that batch window becomes a parameter in the formal model, and we verify that the batch delay cannot cause a Reg Z timing violation under any combination of dispute submission times and processing loads. The formal model sits alongside your existing systems as a verification layer. It does not replace anything.
OCC Bulletin 2025-26 and the existing SR 11-7 framework are clear: any quantitative method that materially influences risk or compliance decisions is a "model" that requires validation. This explicitly includes AI and machine learning systems used in compliance workflows.
The validation requirements cover conceptual soundness, ongoing monitoring, and outcomes analysis. Most banks validate compliance AI through back-testing and periodic review. Formal verification goes further. It provides mathematical proof that the system satisfies its specifications, which is the strongest possible form of conceptual soundness validation.
For dispute resolution AI, this means proving that the automated workflow cannot miss a Reg Z deadline, cannot drop a dispute between systems, and cannot misroute a billing error notice. The OCC examiner handbook specifically looks for evidence that model limitations are understood and documented. Formal verification produces counterexamples when limitations exist, showing exactly which edge cases the system cannot handle. That level of transparency is what examiners want to see.
The first provable result typically arrives within 12-16 weeks. During the assessment phase (weeks 1-8), we map your dispute workflow state machine and identify the regulatory constraints to verify. During the modeling phase (weeks 8-16), we encode those constraints in TLA+ and run the model checker. The first verification run either produces a proof that your workflow satisfies Reg Z timing requirements or generates counterexamples showing specific violation paths.
Either result is immediately useful for examiner conversations. The counterexamples are arguably more valuable early on: they show you exactly where your workflow can fail before a CFPB examiner finds it.
For institutions under active examination or consent order monitoring, we prioritize the highest-risk workflows first. If your biggest exposure is the dispute acknowledgment timeline, we can have a verified model of that specific property within 8-10 weeks. Full cross-regime verification covering Reg Z, Reg E, Visa VCR, and Mastercard timelines concurrently takes 16-24 weeks depending on workflow complexity.
Yes, and the Reg Z/Reg E intersection is one of the most common sources of compliance errors we see. Reg E requires provisional credit within 10 business days and final resolution within 45 calendar days (or 90 days for certain transactions). Reg Z requires acknowledgment within 30 days and resolution within two complete billing cycles, not to exceed 90 days.
When a dispute involves both a credit card and a linked debit card or checking account, the institution must apply the correct regulation to each component. Staff frequently misapply Reg E timelines to credit transactions or vice versa.
The formal model encodes both regulatory frameworks and their applicability rules. For a given dispute, the verifier determines which regulation governs based on the transaction characteristics and proves the workflow satisfies the correct timing requirements. It also flags cases where your workflow applies Reg E timing to a Reg Z-governed transaction, which is a common examination finding.
The EU AI Act classifies AI systems used for creditworthiness evaluation and credit scoring as high-risk under Annex III. The compliance deadline is August 2, 2026. For banks operating in the EU or serving EU customers, any AI system involved in dispute resolution decisions that affects credit reporting falls under this classification.
High-risk AI systems must demonstrate accuracy, robustness, cybersecurity, and human oversight. The Act requires technical documentation showing that the system meets these requirements. Formal verification produces the strongest possible technical documentation: mathematical proofs that the system behaves as specified under all conditions.
The European Banking Authority published its analysis of AI Act implications for the banking sector in November 2025, and it specifically highlights the need for provable system properties. We produce EU AI Act compliance documentation as a standard deliverable of the verification engagement, including the required conformity assessment evidence.
The research behind this solution page, including the full technical analysis of the Apple-Goldman failure and the formal verification approach.
Engineering Absolute Compliance: Deep AI After the Apple-Goldman Sachs FailureFormal verification of financial state machines, multi-agent compliance architectures, and the regulatory case for provably correct dispute resolution systems.
The average CFPB consent order for dispute resolution failures costs $50M-$89M in penalties and remediation.
Formal verification of your dispute workflow costs a fraction of that and produces mathematical proof that your system satisfies Reg Z, Reg E, and card network timing requirements. The first verification results arrive within 12-16 weeks.