AIRLINE OPERATIONS AI
Legacy crew schedulers run column generation on a static snapshot of your network. When cascading disruptions hit and crew positions go stale, the solver optimizes a phantom airline. Southwest lost $1.2B learning this. Spirit lost $50-100M in July 2024 when its scheduling algorithm created conflicting assignments for 43% of available crews. With DOT's automatic refund rule now making every 3-hour delay a mandatory cash refund, the cost of slow IROPS recovery has never been higher.
Veriprajna builds ML-powered IROPS recovery engines that augment your existing Jeppesen or IBS installation. We do not replace your solver. We handle what it cannot: cascading disruptions with uncertain crew positions, network-wide blast radius analysis, and recovery plans generated in minutes instead of hours.
$60B/year
Industry IROPS cost
IATA estimate
4-12 hours
Manual crew recovery time
Industry benchmarks
3-hour trigger
DOT mandatory auto-refund
DOT Final Rule, Oct 2024
The anatomy of an airline operational collapse, from the ops control center floor.
A winter storm grounds flights at a key station. Your crew scheduling solver runs on batch cycles, typically every 30-60 minutes. It takes a static snapshot of the network, freezes time, and computes the optimal recovery. But the network state is changing every 5 minutes. By the time the solver returns a solution, the inputs are wrong. Crews have moved. Connections have broken. The solution is invalid before anyone sees it.
This is the Optimization-Execution Gap. Your solver was designed for efficiency (the cheapest schedule in a known world), not resilience (a survivable schedule in an unknown world). The gap is manageable during isolated delays. During cascading disruptions, it becomes fatal.
Your automated crew notification system is overwhelmed. Crews stranded at outstations call the scheduling center to report their positions. Hold times hit 4 hours, then 8. The solver requires hard inputs: "Captain Smith is at Gate B7 in Denver." But Captain Smith might be at the hotel, might be on the employee bus, might have rented a car to drive to Colorado Springs. Your solver cannot work with "probably in Denver." It needs certainty. During a cascade, certainty does not exist.
This is exactly what killed Southwest in December 2022. They lost track of their own pilots and flight attendants. SkySolver was generating schedules for crews who were not where the system thought they were. The airline was optimizing a phantom network.
The number of broken crew pairings is growing exponentially, not linearly. Each cancelled flight displaces a crew, which breaks the next pairing, which strands an aircraft, which cancels the downstream flight. For a point-to-point carrier, the blast radius is uncontained because there are no hub "regeneration points" where crews and aircraft naturally reconverge.
Your solver hits its computational cliff. The branch-and-price algorithm cannot find even a feasible solution (let alone an optimal one) within the operational decision window. Your dispatchers abandon the system and start working spreadsheets and whiteboards. They are now solving an NP-hard combinatorial problem by hand, under pressure, at 3 AM. This is where the $1.2B losses happen.
Since October 2024, every domestic delay exceeding 3 hours triggers a mandatory automatic refund. Not a voucher. Not a rebooking. A cash refund within 7 business days, without the passenger requesting it. For a carrier operating 300 daily flights, 50 flights delayed past the 3-hour mark at an average ticket value of $280 and 150 passengers per flight represents $2.1M in mandatory refund exposure from a single bad day. The financial penalty for slow IROPS recovery just became an order of magnitude more severe.
An honest assessment of what each vendor actually delivers in 2026. Pull this up when evaluating options.
| Vendor | What They Do | Strengths | Gaps |
|---|---|---|---|
| Jeppesen (now Thoma Bravo) |
CrewPlan, CrewAlert, Stratosphere (new AI layer). Industry-standard column generation solver. 100+ airline clients. | Deepest airline relationships. Decades of domain encoding. New Stratosphere AI disruption tool (Oct 2025). Now independent from Boeing with dedicated investment. | Core solver is still batch column generation. Stratosphere is predictive analytics, not ML-driven recovery. Ownership transition creates uncertainty for long-term roadmap. Mid-size carriers often get less attention than marquee accounts. |
| IBS Software (iFlight / iFlight Core) |
Cloud-native ops platform. AWS co-engineering partnership. Recent wins: Korean Air, Aeroitalia, Groupe Dubreuil. | Modern cloud architecture. Agent-like disruption recovery models. AWS infrastructure for scalability. Growing rapidly in mid-market. | Agent-like models are still rule-based, not learned policies. Full iFlight implementation is a 12-18 month project. Fewer production deployments than Jeppesen. No published IROPS recovery benchmarks. |
| Optym (CrewSolver, SkyMAX) |
Crew pairing optimization + integrated flight scheduling. Southwest Airlines client (SkyMAX). | Proven 3-7% crew cost reduction. Holistic schedule + crew optimization. Hyper-heuristics and ML augmentation. | Focused on planning-phase optimization (pre-departure), not real-time IROPS recovery. No published digital twin or simulation capability. Smaller client base than Jeppesen or IBS. |
| Sabre / Amadeus | GDS providers with operations modules. Deep integration with reservations and departure control. | Ecosystem integration: booking, check-in, departure control, and crew scheduling on one platform. Large install base. | Crew scheduling is a secondary capability, not their core product. Operations modules lag behind Jeppesen/IBS in solver sophistication. Innovation focused on revenue management and distribution. |
| Big 4 / Large SIs (Accenture, Deloitte, etc.) |
Digital transformation consulting. Implement Jeppesen, IBS, or Sabre as part of broader ops modernization. | Project management at scale. Change management expertise. Board-level relationships. | They are implementers, not builders. They install the same vendor platforms you can contract directly. $2M-$10M engagements, 12-24 months to operational impact. Staffed with generalist consultants who rotate across industries. |
| Emerging AI players (Softlabs, Kaiban, Tech Mahindra) |
AI-powered disruption management and re-accommodation automation. Mostly passenger-facing agentic AI. | Modern tech stacks. Fast deployment for passenger re-accommodation. Lower price points. | Focus on passenger-facing automation (rebooking, notifications), not operational crew recovery. Limited understanding of CBA complexity and Part 117 constraint encoding. No published FAA regulatory compliance track record. |
| Veriprajna | ML-powered IROPS recovery layer that augments existing scheduling infrastructure. Graph-based network analysis. Probabilistic crew tracking. | Purpose-built for the 15 worst IROPS days. Works with (not against) your existing Jeppesen/IBS solver. Shadow mode validation before any operational trust. Network vulnerability analysis for point-to-point carriers. | No airline production deployment track record yet. Smaller team than incumbent vendors. Cannot replace full crew planning lifecycle (day-of recovery only). Requires quality data feeds to perform. |
Five capabilities, each targeting a specific failure mode that current tools do not address.
When your column generation solver hits its computational cliff during cascading disruptions, our ML layer takes over. We use graph neural networks to encode your route network topology, crew positions, aircraft states, and active constraints into a unified representation. The GNN captures what tabular data cannot: how a disruption at one station propagates through dependency chains to affect crews and aircraft three connections downstream.
The recovery engine generates ranked recovery plans (crew swaps, deadhead repositioning, proactive cancellations) in minutes. Each plan is validated against your constraint engine before reaching a dispatcher. We reach for Graph Attention Networks specifically because the attention mechanism lets the model weigh which connections matter most in the current disruption state. A delayed inbound flight to a hub gets more attention weight than an on-time flight to a spoke with buffer time.
This solves the "data black hole" that caused Southwest's 2022 collapse. Instead of requiring a hard crew position ("Captain Smith is at Gate B7"), we model crew locations as probability distributions. If a pilot's last ACARS check-in was Denver 3 hours ago and they have a confirmed hotel reservation near the airport, we model that as: 70% at hotel, 20% at airport, 10% in transit. If they also have a boarding pass for the 6 PM to Phoenix, we factor that into their availability window.
The recovery engine works with these probability distributions rather than waiting for the certainty that never comes during a crisis. Recovery plans are scored against the most likely crew position scenarios, with fallback options pre-computed for less likely positions. Your dispatchers see: "Plan A (85% confidence, requires Captain Smith in Denver) and Plan B (95% confidence, uses a different crew but requires one deadhead)."
We map every dependency chain in your route network and identify the 5-10 "fault lines" where a single disruption creates maximum downstream damage. For a point-to-point carrier flying 300 daily departures, we compute the blast radius for each station by time of day and season. Denver at 2 PM in January is a fundamentally different risk profile than Denver at 10 AM in July.
The output is a network risk map that your planning team can use to make informed trade-offs. We might identify that adding one buffer aircraft at Denver and pre-positioning a reserve crew at Phoenix reduces your cascade exposure by 40% for the winter season at a cost of 0.3% daily utilization. That is a $200K annual investment to prevent $5-10M in potential IROPS damage. The analysis is specific to your route map, your fleet mix, and your historical disruption patterns.
A lightweight simulation environment where your ops team rehearses disruption recovery before it happens. Load last winter's actual weather data, inject your real crew rosters and fleet positions, and run: "What happens if Denver closes for 6 hours on a Thursday in January?" The simulator models cascading effects across your network, shows which crews get stranded, which pairings break, and which downstream flights are at risk.
This is not a full digital twin (which would require 12+ months and millions of dollars to build). It is a purpose-built simulation that uses your existing data feeds and focuses specifically on crew-related disruption cascades. Your dispatchers can practice recovery strategies, test pre-positioning plans, and build the muscle memory for crisis response during calm periods. Airlines that rehearse IROPS scenarios recover faster when real disruptions hit because the decision patterns are already familiar.
A machine-readable encoding of your specific union contract rules alongside FAA Part 117 requirements. Part 117 sets the floor: 10 hours minimum rest, 8-9 hour flight time limits based on time of day, flight duty periods capped at 9-14 hours depending on start time and segment count. But your union CBA is where the real complexity lives.
A captain on your A320 fleet at JFK may have different rest provisions than a first officer on the same fleet at LAX, depending on CBA section carve-outs for domicile-specific rules. Reserve call-out windows, premium pay triggers, and training qualification requirements all create constraints that vary by fleet, base, and seniority bracket. We encode these as machine-executable rules that validate every recovery recommendation at the computation layer. When your union renegotiates rest rules or the FAA issues a new Part 117 interpretation, the constraint engine updates the same day, not the same quarter.
A side-by-side comparison of how your ops center responds with current tools vs. with Veriprajna's recovery engine running in advisory mode.
| Timeline | Legacy Process | With Veriprajna Recovery Engine |
|---|---|---|
| 2:00 PM | Denver ground stop issued. Solver begins batch re-optimization cycle (30-60 min runtime). | GNN detects the closure and immediately computes blast radius: 14 downstream flights at risk, 6 crews will miss connections within 3 hours. Dispatchers see a risk map within 90 seconds. |
| 2:15 PM | Dispatchers begin manually assessing which crews are affected. Phone calls to Denver station. | Recovery engine generates 3 ranked recovery plans. Plan A: proactively cancel 4 low-load flights to free crews for 10 high-value connections. Plan B: deadhead 2 reserve crews from Phoenix (seats confirmed on competitor carrier). Plan C: delay 6 flights by 90 min, accept DOT refund exposure on 2. |
| 3:00 PM | Solver returns first solution. Three of the assigned crews have moved since the snapshot was taken. Solution is partially invalid. Manual corrections begin. | Dispatcher approves Plan A with one modification. Constraint engine validates all crew assignments against Part 117 and CBA. Recovery plan is executing. 10 high-value connections protected. |
| 5:00 PM | Second solver run initiated with corrected crew positions. Additional flights have cascaded. Problem space has doubled. Dispatchers working whiteboards for Eastern network. | Proactive cancellations contained the disruption to Denver and two adjacent stations. Eastern network operating normally. System monitoring residual risk and adjusting as Denver reopens. |
| 9:00 PM | Network still degraded. 28 flights cancelled, 40+ delayed past 3 hours. Crew hotel costs mounting. DOT refund exposure: ~$1.7M. | 4 proactive cancellations, 8 flights delayed (none past 3 hours). Crews repositioned for tomorrow's schedule. DOT refund exposure: $0. |
This scenario is based on the disruption pattern observed in the December 2022 Southwest event, scaled to a 300-flight mid-size carrier. The specific recovery decisions would depend on your route network, fleet mix, and crew base locations. The point is not that the system is perfect. It is that generating 3 validated recovery options in 15 minutes gives your dispatchers a starting point that is better than a blank whiteboard.
From initial assessment to shadow-validated recovery engine. Total timeline: 4-8 months depending on data readiness and fleet complexity.
Weeks 1-4
We analyze your route network topology, historical IROPS data (12+ months), crew base locations, and fleet utilization patterns. The output is a network vulnerability report: your top 10 cascade risk stations ranked by blast radius, seasonal risk profiles, and a financial estimate of your annual IROPS exposure.
This phase also identifies the data feeds required for the recovery engine and assesses their quality and latency. If your crew position data has a 2-hour lag, that is the first problem to solve.
Weeks 4-8
We connect to your operational data feeds (flight status, crew positions, maintenance status) and build your airline-specific constraint engine. This involves working with your crew planning team and union representatives to digitize every CBA rule and Part 117 requirement applicable to your operation.
The constraint engine is tested against 6 months of historical crew assignments to verify it correctly flags every known violation and approves every known-valid assignment. If it disagrees with a historical human decision, we investigate whether the human was right or the rule encoding needs adjustment.
Weeks 8-20
The recovery engine runs in parallel with your dispatchers during every IROPS event. It generates recovery recommendations in real-time but does not execute any of them. Your dispatchers make decisions through their existing tools. After each event, we compare: what the system recommended vs. what your team did vs. what actually happened.
The goal is to demonstrate measurable improvement over at least one full disruption season (typically one winter or one summer thunderstorm season). If the system's recommendations would not have improved outcomes in at least 70% of significant IROPS events, we do not recommend proceeding to Phase 4.
Ongoing
Based on shadow-mode evidence, the system is activated as a real-time advisory tool for dispatchers. Trust is graduated: low-risk, high-frequency decisions (deadhead positioning on confirmed seats, reserve crew call-outs) can be automated first. Complex multi-station recovery scenarios remain human-approved.
We do not recommend full autonomous operation for crew scheduling decisions. Dispatchers have context the system does not: a crew member who just called in sick, a gate change that has not hit the feed yet, a maintenance issue being resolved. The system's role is to give dispatchers a strong starting point, not to replace their judgment.
Estimate your annual disruption exposure and DOT refund risk. Adjust the inputs to match your operation. The results are yours to use in budget discussions, vendor evaluations, or internal business cases.
Annual Cancellation Revenue Loss
$15.8M
Cancelled flights x passengers x ticket price
Annual DOT Auto-Refund Exposure
$18.9M
3+ hour delays x passengers x ticket price
Total Annual IROPS Exposure
$34.7M
Cancellations + refunds + estimated crew/hotel costs
Loading...
No. We build on top of your existing crew scheduling infrastructure, not instead of it. Your Jeppesen CrewPlan or IBS iFlight installation handles normal-day scheduling effectively. Column generation solvers are well-suited for the 350 routine days per year. The problem is the 15 worst days, when cascading disruptions push the solver past its computational cliff and your ops team reverts to spreadsheets and phone calls.
Our IROPS recovery engine sits alongside your existing solver. During normal operations, it runs in shadow mode, learning your network patterns and validating its recommendations against human decisions. When disruptions cascade beyond what the solver can handle, it generates recovery plans that your existing system validates for constraint compliance.
Integration happens through your current data feeds: ACARS position reports, flight status APIs, and crew management system exports. We do not touch your solver's codebase. Typical integration takes 3-4 weeks for read-only data access, with the recovery engine running in shadow mode within 6 weeks of project start.
We build a machine-readable constraint engine specific to your operation. Part 117 is the floor, but union CBAs are where the real complexity lives. A captain on your A320 fleet at JFK may have different rest provisions than a first officer on the same fleet at LAX, depending on CBA Section 12 vs. Section 12(b) carve-outs.
Most vendors treat these rules as configuration parameters in a settings file. We treat them as a first-class engineering problem. During the assessment phase, we work with your crew planning team and union representatives to digitize every applicable rule, including FAA Part 117.25(b) and (c) distinctions, CBA-specific rest provisions by fleet and domicile, training and qualification requirements per aircraft type, and seniority-based assignment preferences.
The constraint engine validates every recommendation the recovery engine generates before it reaches a human dispatcher. If a proposed crew swap violates any rule, it is masked at the computation layer, not caught by a human reviewer after the fact. When your CBA is renegotiated or an FAA interpretation changes, the constraint engine updates the same day.
We need four data feeds: real-time flight status (OAG, FlightAware, or your internal OCC feed), crew position reports (ACARS check-ins, crew app data, or manual position updates from your tracking system), crew roster and qualification data (exported from your crew management system, typically Jeppesen CrewAlert or IBS iFlight), and historical disruption data covering at least 12 months of IROPS events with crew recovery decisions and outcomes.
The first three feeds establish the real-time operating picture. The fourth trains the recovery engine on your specific network patterns, seasonal disruption profiles, and how your dispatchers actually recover.
Shadow mode typically starts 6-8 weeks after data access is established. The first 2-3 weeks are spent on data pipeline integration and constraint engine setup. Weeks 4-6 focus on training the network model on your historical disruption data. By week 6-8, the system is generating real-time recovery recommendations in parallel with your dispatchers, and you can begin comparing its suggestions against actual human decisions.
Accenture and Deloitte are platform implementers. They will run a 6-month discovery phase, produce a 200-page transformation roadmap, and then implement Jeppesen or IBS, the same vendors you can contract directly. Their value is project management and change management at scale. Their engagements typically run $2M-$10M and take 12-24 months before any operational impact.
We build the layer those platforms do not have. Jeppesen and IBS are excellent daily scheduling engines. Neither has production-grade ML for cascading IROPS recovery, probabilistic crew tracking, or network vulnerability analysis. A Big 4 firm will not build these capabilities because they are not a software engineering shop. They staff projects with generalist consultants who rotate across industries, not engineers who understand Graph Attention Networks and Proximal Policy Optimization.
Our engagement starts producing shadow-mode data within 8 weeks, not 8 months. You see comparative recovery plans from your actual recent disruptions. If our recommendations would not have improved outcomes, you know within the first winter season. The total engagement cost for assessment through shadow validation is $400K-$800K, depending on fleet size and data complexity.
Your existing systems remain the operational system of record throughout the engagement. Our recovery engine is advisory, not autonomous. It generates ranked recovery options that your dispatchers evaluate and approve. If our system goes offline, nothing changes for your operation because your dispatchers are already making decisions through your existing tools.
The system never executes crew swaps, cancellations, or deadhead assignments on its own. Every recommendation passes through the constraint engine (which guarantees regulatory and CBA compliance) and then through a human dispatcher who decides whether to act on it.
This is deliberate. Airlines should not hand operational authority to an unproven system. Trust is earned through months of shadow-mode validation where the system proves it consistently generates better recovery plans than the current process. Even after validation, we recommend graduated trust: automated execution only for low-risk, high-frequency decisions like deadhead positioning on confirmed seats, while complex multi-station recovery scenarios remain human-approved.
Point-to-point carriers are where this system delivers the most value, precisely because they are the most vulnerable to cascading disruptions. In a hub-and-spoke network, disruptions can be contained by firewalling the affected hub. Crews and aircraft return to the hub frequently, creating natural recovery points. A carrier like Delta can isolate an Atlanta ground stop and keep the rest of the network running because the hub structure provides built-in redundancy.
Point-to-point carriers like Southwest, Spirit, or Frontier do not have this structural advantage. An aircraft flies Baltimore to Denver to San Diego to Phoenix to Sacramento. A disruption at any station propagates down the entire chain. The crew that was supposed to fly San Diego to Phoenix is stuck in Denver. The aircraft they were supposed to meet in San Diego is stranded. The dependency graph has a much larger diameter, and the blast radius of any single disruption is uncontained.
Our network vulnerability analysis is specifically designed for this topology. We map every dependency chain in your route network, identify the stations where disruptions create maximum downstream damage, and pre-compute recovery strategies for the most likely failure scenarios. When Denver closes, the system already knows which crews to reposition and which flights to proactively cancel to contain the disruption locally rather than letting it propagate network-wide.
The DOT automatic refund rule, effective October 28, 2024, fundamentally changed the economics of cascading disruptions. Before the rule, airlines could offer travel vouchers or rebooking as the default remedy for delays and cancellations. Most passengers accepted vouchers, and the airline retained the revenue.
Now, any domestic delay exceeding 3 hours or international delay exceeding 6 hours triggers a mandatory automatic refund in the original form of payment within 7 business days. The airline cannot require the passenger to request it.
For a mid-size carrier operating 200-400 daily flights, a cascading disruption that delays 50 flights by 3+ hours now represents an immediate cash outflow, not a deferred liability. If the average ticket value on those flights is $280 with 150 passengers per flight, a single bad IROPS day can trigger $2.1M in mandatory refunds, on top of crew overtime, hotel costs, and deadhead repositioning. Before the rule, maybe 15-20% of those passengers would have pursued refunds. Now 100% are automatic. This makes every hour of faster IROPS recovery directly measurable in avoided refund exposure. The business case for a system that contains a 6-hour network meltdown to a 2-hour regional disruption is no longer theoretical.
The technical foundations behind this solution page, available as an interactive whitepaper.
The Computational Imperative: Antifragile Logistics with Graph Reinforcement Learning
Forensic analysis of the Southwest SkySolver failure, the limitations of column generation under cascading disruptions, and the technical architecture for GRL-based crew recovery with neuro-symbolic constraint enforcement.
Winter storm season starts in 10 months. Shadow mode takes 8 weeks to deploy.
For a mid-size carrier, a single severe IROPS day now costs $2-5M in cancellations, crew repositioning, and DOT mandatory refunds. The assessment phase identifies your specific exposure and proves the recovery engine's value against your actual historical disruptions.