Editorial cover visualizing the US power grid capacity crisis — the widening gap between retiring generation and surging AI-driven demand, with PJM and ERCOT as focal points.

EnergyArtificial IntelligenceTechnology

America's Power Grid Just Failed Its Biggest Test -- And Nobody Noticed

Ashutosh Singhal April 11, 202615 min read

I was on a call with an energy executive in Virginia last fall when he said something that stopped me cold.

"We have data centers requesting more power than we can physically deliver. Not next decade. Now. And every month we delay, another coal plant files for retirement."

He wasn't panicking — he'd been in the industry for thirty years. But there was something in his voice I hadn't heard before from someone that senior: resignation. Like he'd run the numbers enough times to know that the math simply didn't work anymore.

That conversation sent me down a rabbit hole that consumed my team at Veriprajna for months. What we found was worse than I expected. The largest grid operator in the United States — PJM Interconnection, which serves 65 million people across 13 states — just failed to procure enough electricity for the first time in its history. The shortfall: 6,623 megawatts. That's roughly the output of six nuclear reactors that simply don't exist. Meanwhile, in Texas, the grid operator ERCOT is drowning in 233 GW of interconnection requests — nearly three times the state's entire peak demand — with no realistic path to connect most of them.

These aren't hypothetical scenarios from a climate report dated 2050. The PJM shortfall hits in June 2027. That's eighteen months away.

What Happens When the Largest Grid in America Comes Up Short?

Let me put PJM's December 2025 capacity auction results in plain terms. Every year, PJM runs an auction where power plants bid to guarantee they'll be available when demand peaks. It's essentially the grid's insurance policy. This year, the auction cleared 134,479 MW of capacity — and came up 6,623 MW short of what's needed to maintain the reliability standard that's supposed to prevent blackouts.

The reserve margin dropped to 14.8%. The target is 20%. And capacity prices hit the regulatory ceiling of $333.44 per megawatt-day across the entire region — a price cap that was designed to protect consumers but now functions as a blinder, masking how desperate the situation actually is.

When the price cap binds across an entire 13-state region, you're not looking at a market signal. You're looking at a market scream.

What drives me crazy about the coverage of this is simple. Most articles frame it as "coal plants are retiring and renewables aren't replacing them fast enough." That's technically true but profoundly incomplete. The real story is about a mismatch so severe that no amount of conventional planning can fix it in time.

Between 2011 and 2023, PJM lost 54.2 GW of thermal capacity to retirements. Another 24 to 58 GW — up to 30% of installed capacity — is at risk of retiring by 2030. And here's the number that should keep every grid planner awake at night: replacing 1 MW of retiring coal or gas generation requires approximately 5.2 MW of solar or 14 MW of onshore wind to maintain equivalent reliability. The intermittency gap isn't a footnote. It's the whole story.

Why Is ERCOT's Interconnection Queue at 233 GW?

If PJM's crisis is about supply disappearing, Texas has the opposite problem — demand showing up faster than anyone imagined possible.

ERCOT's large-load interconnection queue hit 233 GW by late 2025. That's a 269% increase from the end of 2024. To give you a sense of scale: ERCOT's total peak demand is about 85 GW. The queue is nearly three times the entire grid.

Data centers account for 77% of those requests.

When I first saw that number, I assumed it was inflated by speculative applications — companies filing requests at multiple sites to see which one gets approved first. I was right, but only partially. The industry calls these "phantom loads," and they're a real problem. Hyperscalers submit applications across dozens of sites, clogging the engineering study process with projects that may never break ground. ERCOT recently brought in McKinsey to help sort credible requests from speculative ones, which tells you how overwhelmed the internal teams are.

But even after you strip out the phantoms, the underlying demand is staggering. And the supply side? ERCOT synchronized 23 GW of new generation in 2025 — mostly solar and batteries. The generation queue is dominated by 158 GW of solar and 175 GW of battery storage, with only 47 GW of natural gas. Texas lawmakers passed Senate Bill 6 and created a $9 billion fund to incentivize new gas plants, but roughly 35% of proposed gas projects have already withdrawn, citing global turbine shortages and permitting delays.

I wrote about this supply-demand collision in more detail in the interactive version of our research, but the takeaway is blunt: the grid cannot physically grow at the speed the AI revolution demands.

The Night I Stopped Believing in "Just Build More"

There was a specific evening — my team and I were deep into modeling the PJM retirement cliff — when one of our engineers pulled up a projection on screen and the room went quiet.

She'd mapped the retirement risk of every thermal plant in PJM against the timeline for new generation coming online. The lines crossed in 2027. Not 2030. Not 2035. The gap opened in eighteen months, and it widened every year after.

Someone said, "So we need to build about 7 GW of dispatchable generation in a year and a half."

I laughed. Not because it was funny. Because the average time to permit and build a gas plant in PJM territory is four to seven years. The average for a new transmission line is even longer.

That was the moment the thesis crystallized for me. We can't build our way out of this fast enough. The grid has to get dramatically smarter with the infrastructure it already has. And the kind of "AI" most energy companies are deploying — chatbots, basic regression models, dashboard analytics — is laughably inadequate for the problem.

The grid doesn't need another dashboard. It needs to think.

What Does "Deep AI" Actually Mean for the Grid?

Infographic showing the three classes of Deep AI models used for grid intelligence — PINNs, Graph Neural Networks, and Reinforcement Learning — with their specific grid applications.

I need to be specific here, because "AI for energy" has become one of those phrases that means everything and nothing. When I say Deep AI, I mean something very different from wrapping a large language model around a SCADA system (Supervisory Control and Data Acquisition — the industrial control systems that monitor and manage grid operations) and calling it innovation.

The electrical grid is a synchronized dynamical system. It obeys Kirchhoff's Laws (the fundamental rules governing how electrical current and voltage behave in circuits). Generators are coupled through the swing equation. Voltage, frequency, and power flow are governed by physics that don't care about your training data. Any AI system that ignores this physics is, at best, a toy.

At Veriprajna, we work with three classes of models that respect the grid's physical reality.

The first class is Physics-Informed Neural Networks — PINNs — which embed the actual differential equations governing generator behavior directly into the model's loss function. Instead of just learning patterns from historical data, the network is penalized for violating physical laws. The result: transient stability analysis that runs 87 times faster than conventional numerical solvers. For a grid operator staring down a potential cascading failure, that's the difference between predicting the blackout and living through it.

Then there are Graph Neural Networks, which treat the grid as what it actually is — a graph, with substations as nodes and transmission lines as edges. Traditional machine learning flattens this structure into a data table and loses the spatial relationships that matter most. A GNN can predict how a voltage dip at one substation propagates through the network topology in milliseconds. Our multilayer GNN architecture has achieved an F1 score (a measure of prediction accuracy that balances precision and recall) of 0.89 for identifying substations at risk of failure within 30 days.

The third class — and the one I find most promising for real-time operations — is Reinforcement Learning agents that make dispatch decisions by treating grid control as a constrained optimization problem. They learn policies that satisfy hard physical constraints — voltage limits, thermal ratings, frequency bounds — while maximizing reliability and minimizing cost.

None of this is theoretical. We've built these systems. And the gap between what they can do and what most utilities are currently using is enormous.

How Do You Find 6.6 GW Without Building a Single Power Plant?

Diagram explaining Dynamic Line Rating — showing how real-time weather and sensor data unlock hidden transmission capacity compared to static ratings.

This is the question that consumed us. And the answer starts with one of the most underappreciated technologies in the energy sector: Dynamic Line Rating.

Every transmission line in America has a "static" rating — a maximum power it's allowed to carry, based on worst-case assumptions about temperature and wind. These assumptions are deliberately conservative. On most days, the actual thermal capacity of the line is 20-40% higher than the static rating allows.

Dynamic Line Rating uses real-time weather data and IoT sensors to calculate what the line can actually handle right now, not what it could handle on the worst day of the century. We integrate computer vision and LiDAR (Light Detection and Ranging — a laser-based remote sensing technology) data to monitor conductor sag and temperature continuously.

The results are not incremental. In Indiana and Ohio, AES deployed these technologies and increased transfer capacity by 61% on 345 kV lines — at a cost of $0.39 million, compared to $1.63 million for traditional reconductoring. That's a 76% cost reduction and an 80% reduction in deployment time.

Now multiply that across PJM's 13-state footprint. You're not closing the entire 6.6 GW gap with DLR alone, but you're making a massive dent without pouring a single foundation.

The cheapest megawatt is the one already flowing through your wires that you didn't know you had.

The $163 Billion Question Nobody's Asking

The economics get genuinely alarming from here. An analysis by the Natural Resources Defense Council found that data center growth in the PJM region could drive $163 billion in cumulative capacity costs from 2028 through 2033. In Northern Illinois alone — ComEd territory — the projected impact is $21.4 billion, which translates to roughly $70 per month in additional costs for the average household.

Let me say that differently. The AI boom that's supposed to transform the economy could raise your electricity bill by $840 a year, and that's in a single utility zone.

When I present these numbers to technology executives, I watch their faces change. They understand server costs, network costs, talent costs. But most of them haven't internalized that the electricity to run their AI models is about to get dramatically more expensive — and potentially unavailable — because the grid serving their data centers is structurally short on capacity.

This is not a problem that resolves itself through market forces alone. When the PJM auction hits the price cap across the entire region, the market is telling you it's broken. The price signal that should attract new investment is being artificially suppressed, which means the investment doesn't come, which means the shortfall persists.

Can AI Actually Screen 233 GW of Interconnection Requests?

One of the projects I'm most excited about is something we've been building for the interconnection queue problem. FERC (Federal Energy Regulatory Commission) Order 2023 requires transmission providers to maintain public "heatmaps" of available capacity, but the actual study process — determining whether a specific project can connect at a specific point without destabilizing the grid — remains brutally manual.

We're deploying what I'd call agentic AI for interconnection screening. These aren't chatbots. They're autonomous reasoning systems that can ingest an interconnection application, check it against NERC (North American Electric Reliability Corporation — the body that sets reliability standards for the grid) and FERC standards, run a topological feasibility analysis using our GNN models, and assign a likelihood-of-completion score based on the project's commercial and physical readiness.

The goal is to shift ERCOT — and eventually other grid operators — from a "first-come, first-served" queue to a "first-ready, first-served" model. When you have 233 GW of requests and 23 GW of actual new generation, the ability to identify which projects are real and which are speculative isn't a nice-to-have. It's existential.

For the full technical breakdown of our architecture — including the PINN formulations, GNN topology, and RL control framework — see our research paper.

"But Can You Trust AI With the Grid?"

I hear this constantly. Usually from people who've seen enough enterprise AI demos to be skeptical, and honestly, they should be. The electrical grid is critical infrastructure. A bad recommendation from a chatbot wastes someone's afternoon. A bad recommendation from a grid control system blacks out a hospital.

This is why we refuse to deploy black-box models in operational settings. Every prediction our GNN makes comes with a graph-based explanation — it highlights the specific transmission lines and substations contributing to a risk assessment, so a human operator can verify the reasoning before acting. We call this stability-aware inference: the AI proposes, the physics constrains, and the human decides.

My team argued about this for weeks. Some of our engineers wanted to push for more autonomous control — the RL agents are genuinely better at real-time dispatch than most manual processes. But I kept coming back to the same principle: in safety-critical systems, explainability isn't a feature. It's a prerequisite.

We've also been careful about the IT/OT boundary (the divide between information technology systems and operational technology that controls physical equipment). Our architecture connects to existing distributed control systems without modifying the proven safety-critical control structures. The AI layer sits alongside the control layer, not on top of it.

The Retirement Cliff Is Predictable — If You Use the Right Models

One more thing that keeps me up at night. The 6.6 GW shortfall in PJM isn't a surprise if you have the right forecasting tools. We've built retirement prediction models using stacked LSTM (Long Short-Term Memory — a type of neural network for sequential data) networks and gradient boosting that analyze plant-level economics — CO2 emissions, fuel prices, renewable penetration in the local market, maintenance costs, regulatory exposure.

Our models predict plant retirement timing with a mean absolute percentage error of 1.07%. That level of accuracy gives grid operators a two-to-three-year warning window to intervene — with targeted capacity incentives, backstop procurement, or accelerated interconnection of replacement resources — before the reliability gap opens.

The fact that PJM was caught short in 2025 isn't because the retirement cliff was unpredictable. It's because the tools being used to predict it were inadequate.

People sometimes push back: "Isn't this just better forecasting? What's so 'deep' about it?" The depth is in what the model understands. A standard regression model sees a coal plant's age and fuel costs. Our model sees its position in the transmission topology, the renewable saturation in its pricing zone, the political environment of its state, and the cascading reliability impact of its retirement on every connected substation. That's not a spreadsheet. That's a digital twin of the grid's economic physics.

Where This Goes From Here

I don't think the PJM shortfall or the ERCOT queue crisis will be the last of their kind. I think they're the first. Every major grid operator in North America is going to face some version of this collision between retiring thermal generation, explosive AI-driven demand, and the physical limits of how fast you can build infrastructure.

The utilities that navigate this successfully won't be the ones that build the most. They'll be the ones that orchestrate the best — squeezing every available megawatt from existing lines through DLR, predicting retirements before they create emergencies, screening interconnection queues with AI instead of armies of engineers, and running real-time stability analysis in milliseconds instead of hours.

The 6,623 MW gap in PJM isn't just a number on an auction report. It's the distance between the grid we have and the grid we need. And that distance is growing every month.

The grid is the most complex machine humanity has ever built. We're asking it to power the most complex software humanity has ever built. Something has to give — and it shouldn't be the lights.

We can close that gap. Not by pretending AI is a magic wand, but by building AI systems that respect the physics, understand the topology, and earn the trust of the operators who keep the lights on. That's the work. And the grid doesn't have time for anyone to figure it out slowly.