An editorial image evoking the tension between autonomous vehicle technology and real-world safety failure — a robotaxi at a dark urban intersection facing ambiguous conditions.

Artificial IntelligenceAutonomous VehiclesTechnology

The Self-Driving Car Saw Her 5.6 Seconds Before Impact — And Still Couldn't Decide What She Was

Ashutosh Singhal April 6, 202616 min read

I was sitting in a conference room in late 2023, watching a video that would change how I think about AI safety forever. The footage was from a Cruise robotaxi in San Francisco. A pedestrian had been struck by a human-driven car, thrown into the path of the autonomous vehicle, and pinned underneath it. The robotaxi stopped — briefly — and then began to pull over to the side of the road, dragging the woman 20 feet across the asphalt.

The room was silent. Someone on my team said, "The car thought it had a side-impact collision." And that sentence — the car thought — became the seed of everything we've been building at Veriprajna since.

Because the car didn't "think" anything. It ran a classification subroutine, got the wrong answer, and executed a pre-programmed maneuver that turned a survivable accident into something far worse. There was no reasoning. No awareness. No safety architecture that could catch a catastrophic misdiagnosis before it became a catastrophe.

This is the gap I keep trying to explain to investors, clients, and fellow engineers: the distance between an AI that performs well in demos and an AI that behaves safely when the world stops cooperating. I've started calling it the Perception-Logic Gap — the space between what an autonomous system sees and what it actually understands. And right now, that gap is killing people.

What Happened When the AI Had Nearly Six Seconds and Still Failed?

A timeline diagram showing how the Uber ATG system reclassified the pedestrian over 5.6 seconds, resetting trajectory predictions each time, until it was too late to brake.

The Uber ATG crash in Tempe, Arizona in March 2018 is the case I come back to most often, because it's the purest illustration of how a probabilistic system can have all the data it needs and still make a fatal mistake.

The vehicle's sensors first registered Elaine Herzberg — a pedestrian pushing a bicycle across a dark road — approximately 5.6 seconds before impact. At 43 mph, that's roughly 378 feet of distance. More than enough for any competent braking system to stop the car.

But the AI couldn't decide what it was looking at. Over those 5.6 seconds, the perception system reclassified Herzberg repeatedly: first as an "unknown object," then as a "vehicle," then as a "bicycle." Each reclassification wasn't just a label change — it was a complete reset of the object's predicted trajectory. The system essentially developed amnesia every time it changed its mind.

I remember reading the NTSB report for the first time and feeling physically sick. Not because of the outcome — though that was devastating — but because of the mechanism. The AI determined emergency braking was necessary only 1.3 seconds before impact. Physics made the rest inevitable.

An AI that can see an obstacle for nearly six seconds and still can't decide what it is doesn't have a sensor problem. It has an architecture problem.

What made it worse — what made me angry, honestly — was learning that Uber had deliberately disabled the Volvo XC90's factory-installed collision avoidance system. The car came with Automatic Emergency Braking from the manufacturer. Uber turned it off to prevent what they called "erratic vehicle behavior." They wanted a smoother ride for their experimental software, so they removed the one deterministic safety layer that might have saved a life.

That decision haunts this industry. It's the original sin of treating AI safety as a tuning problem rather than an engineering discipline.

Why Does the Same Failure Keep Happening in Different Cars?

After the Uber crash, I expected the industry to learn. Specifically, I expected companies to build architectures where a perception failure couldn't cascade into a decision failure. Where there were hard safety boundaries that no experimental software could override.

Instead, we got Cruise.

The October 2023 incident in San Francisco was different from Uber in its specifics but identical in its architecture. A human-driven Nissan struck a pedestrian, throwing her into the path of a Cruise robotaxi. The Cruise vehicle hit her and stopped. So far, the system was working — imperfectly, but within parameters.

Then the post-impact logic kicked in. The system's impact detection wasn't granular enough to distinguish between a frontal run-over and a side-impact collision. It classified the event as a side impact. And the pre-programmed response to a side impact was: pull over to the side of the road to avoid blocking traffic.

The car pulled over. With a human being pinned underneath it. It dragged her 20 feet at about 7 mph before detecting "excessive wheel slip" — which it interpreted as a mechanical fault, not a person.

I spent a week after that incident arguing with my team about what the right response architecture should have been. One of our engineers — brilliant guy, very formal-methods-oriented — kept insisting the problem was solvable with better sensor fusion. "If the system had occupancy detection under the chassis," he said, "it would have known."

He was right. But he was also missing the point. The deeper failure was that the system had no concept of uncertainty about its own diagnosis. It classified the impact, and then it acted on that classification with full confidence. There was no intermediate state of "I'm not sure what just happened, so I should do nothing until I am." The architecture didn't allow for doubt.

That's what I mean when I talk about the Perception-Logic Gap. It's not just about seeing better. It's about knowing when you don't know.

The Cover-Up Was the Architecture Too

What happened after the Cruise dragging incident was almost as revealing as the incident itself. Investigations found that senior leadership was "fixated on correcting the inaccurate media narrative" rather than being transparent with regulators. Employees admitted to showing regulators a video of the crash knowing that internet connectivity issues often prevented the dragging portion from playing.

Cruise eventually paid a $500,000 criminal fine for submitting false reports to the NHTSA. Their California operating permit was revoked.

I bring this up not to pile on Cruise, but because it reveals something structural about how the industry treats safety. When your AI system is a black box — when even your own engineers can't fully explain why it made a particular decision in a particular moment — the temptation to control the narrative instead of fixing the architecture becomes overwhelming.

Transparency isn't a PR strategy for autonomous vehicles. It's a technical requirement. If you can't audit every decision your AI made in a crisis, you don't have a safety system — you have a liability.

At Veriprajna, we've made explainable safety audits a core part of our architecture work. Every decision the AI makes, especially post-impact, gets logged in a deterministic, tamper-proof format that regulators can audit in real time. Not because we're more virtuous than Cruise — because we've seen what happens when the alternative is "let the video speak for itself."

I wrote about the full technical framework behind this approach in our interactive whitepaper, including the specific failure modes we've catalogued from Uber, Cruise, Tesla, and Waymo.

What Does Tesla's "Vision-Only" Bet Actually Mean for Safety?

Tesla's approach to autonomous driving is philosophically different from Uber's or Cruise's, and the failures are different too. But they rhyme.

Tesla's Full Self-Driving system relies entirely on cameras — no LiDAR, no radar. Elon Musk has called LiDAR a "crutch." The bet is that sufficiently advanced neural networks can reconstruct a full 3D understanding of the world from 2D images alone, the way human vision does.

It's an elegant idea. I even find it intellectually compelling. But the NHTSA has opened over 40 investigations into FSD-related crashes between 2024 and 2025, covering 2.9 million vehicles, and the pattern is damning.

Eighteen separate complaints involve vehicles running red lights or failing to detect signal states. Multiple reports describe cars entering opposing lanes of traffic. A fatal 2023 collision occurred during sun glare on wet asphalt — conditions where the optical signal-to-noise ratio drops below what any camera system can reliably interpret.

I call this Capability Theater: the system performs beautifully in optimal conditions, creating an illusion of competence that collapses at the edges. Sunny day, clear road, standard intersection? Flawless. Low sun angle, wet pavement, unusual pedestrian crossing? The system doesn't degrade gracefully. It fails abruptly.

The problem isn't that vision-only can't work in theory. It's that Tesla is deploying it at scale without what I'd call Assurance Gates — hard boundaries that prevent the AI from making high-stakes decisions when its confidence drops below a verified threshold. If the glare saturation exceeds a certain percentage, the system should refuse to drive, not guess harder.

How Do You Prove an AI Won't Kill Someone?

This is the question that keeps me up at night. Not metaphorically — literally. There was a period last year where I was running formal verification experiments until 2 AM, trying to find the boundary between "tested enough" and "proven safe."

Traditional software testing is black-box: you run the system through N scenarios, and if it passes all of them, you ship it. But autonomous vehicles don't encounter N scenarios. They encounter the entire physical world, with all its chaos and edge cases and humans doing inexplicable things. No amount of scenario testing can cover that space.

Formal verification takes a different approach. Instead of asking "did the system pass these tests?", it asks "is there any input that could produce an unsafe output?" Tools like Marabou and α,β-CROWN can represent a neural network as a set of mathematical constraints and then search — exhaustively — for violations.

A safety property might look like this: for every possible input within a "low visibility" range, the braking command must never fall below a minimum threshold. If the solver finds a counter-example — a specific input that violates the property — you've identified a vulnerability before it kills someone.

One night, we were running verification on a perception model and the solver returned a counter-example that none of us had anticipated. A very specific combination of lighting angle and object distance that caused the braking confidence to drop to nearly zero. It wasn't a scenario any of us would have thought to test. The solver found it because it wasn't guessing — it was proving.

That moment crystallized something for me. Testing asks "does this work?" Verification asks "can this fail?" They're fundamentally different questions, and safety-critical AI demands the second one.

Testing tells you what your AI does. Verification tells you what it can never do. For safety-critical systems, only the second question matters.

The catch is that current neural networks are enormous — millions of parameters — and exhaustive verification of large networks is computationally intractable. We address this through neuron pruning: systematically removing redundant neurons that don't contribute to accuracy but make the network too complex to verify. The result is a leaner model that's both performant and mathematically provable.

For the full technical breakdown of our verification pipeline — including the SMT solver methodology and pruning approach — see our detailed research paper.

When the Problem Isn't the AI — It's the World

Waymo has logged over 56 million miles and has significantly lower injury rates than human drivers. By most metrics, they're the industry leader. And yet, Waymo has revealed a failure mode that nobody in the autonomous vehicle industry was prepared for: the world itself refusing to cooperate.

During a 2025 power outage in Los Angeles, dozens of Waymo robotaxis became stuck at darkened intersections. The vehicles were programmed to treat dead traffic signals as four-way stops — the correct legal response. But when dozens of autonomous vehicles all arrive at the same dead intersection, each waiting politely for its turn, and each requesting remote human assistance simultaneously, you get something I've started calling the Independence Trap: every vehicle behaving correctly in isolation while collectively creating gridlock that no individual vehicle can resolve.

The remote assistance center was overwhelmed. Robotaxis were blocking other robotaxis. The system that worked perfectly with one car at one intersection collapsed when scaled to a fleet in a city-wide emergency.

And then there's the problem nobody wants to talk about publicly. During civil unrest in Los Angeles in early 2025, crowds attacked Waymo vehicles — slashing tires, breaking windows, setting cars on fire. The vehicles, programmed for "passive safety," simply stopped when surrounded by people. Which is exactly the wrong response when the people surrounding you are trying to destroy the vehicle with passengers inside.

This has led to serious discussions about what some researchers call a "Danger Escape Mode" — the ability for an autonomous vehicle to commit minor traffic infractions (mounting a curb, proceeding through a red light) to protect its passengers from violence. It requires fundamentally rethinking the AI's ethical hierarchy, and it's a problem that no amount of better sensors or faster processors can solve.

I brought this up at a meeting with a potential client, and someone said, "Can't you just use GPT to handle edge cases like that?" I think my expression said more than my words did. This is a decision architecture problem that requires formal ethical reasoning, not a chatbot.

Why Can't We Just Test Our Way to Safety?

People ask me this constantly. "If Waymo has 56 million miles of data, isn't that enough testing?"

No. And the reason is mathematical, not philosophical.

The space of possible driving scenarios is effectively infinite. You can drive 56 million miles and never encounter the specific combination of sun glare, wet asphalt, and an unusually-dressed pedestrian that causes your perception system to fail. Edge cases aren't rare versions of common scenarios — they're scenarios that exist in the gaps between everything you've already seen.

This is why the regulatory landscape is shifting from "show us your test results" to "show us your safety proofs." ISO 21448, known as SOTIF — Safety of the Intended Functionality — was designed specifically to address hazards that occur when the AI is working exactly as programmed but encounters an environment it can't handle. It's not about the hardware failing. It's about the AI's inherent limitations meeting the real world.

And ISO/PAS 8800, which became the primary standard for AI in road vehicles in late 2024, goes further: it requires managing the entire AI lifecycle, from data acquisition through post-deployment monitoring. The era of "ship it and see what happens" is ending, at least for companies that want to operate legally in the EU, the US, and major Asian markets.

At Veriprajna, we structure our work around moving clients into what SOTIF calls the "Known/Safe" quadrant — systematically identifying triggering conditions, mapping environmental states that cause perception errors, and using high-fidelity simulation to inject edge cases that would be too dangerous to test on actual roads.

The Real Difference Between a Wrapper and a Solution

A side-by-side comparison diagram showing how traditional object classification (which failed in Uber and Cruise crashes) differs from occupancy-based perception, illustrating why "is this space occupied?" is safer than "what is this object?"

I've spent the last few years watching the AI industry split into two camps, and the split is getting wider.

On one side, there's the wrapper economy — companies building conversational interfaces on top of large language models, optimizing for deployment speed and user experience. Some of this work is genuinely useful. Most of it is irrelevant to safety-critical applications.

On the other side, there's what I call deep AI engineering: the integration of formal verification, sensor-fusion resilience, and deterministic safety architectures. It's slower. It's harder. It's less impressive in demos. And it's the only approach that can survive contact with the physical world.

The technical centerpiece of this shift is Bird's-Eye-View perception with Occupancy Networks. Instead of processing individual camera feeds and trying to stitch them together — a process that loses data at every seam — BEV perception transforms multi-view camera and LiDAR data into a unified 3D grid viewed from above. And instead of asking "what is this object?", occupancy networks ask "is this space occupied?"

That distinction matters enormously. If the Uber ATG system had been tracking occupied space rather than trying to classify objects, it wouldn't have mattered whether the system thought Herzberg was a pedestrian, a bicycle, or an unknown object. The space was occupied. The space was in the vehicle's path. Brake.

Similarly, if the Cruise vehicle had been running occupancy detection beneath its chassis, it would have known something was under the car regardless of how it classified the impact. The occupied space would have overridden the pull-over maneuver.

The question isn't "what is this object?" — it's "is this space occupied?" That single reframing could have prevented the two most notorious autonomous vehicle disasters of the last decade.

We use Transformer architectures — the same fundamental technology behind GPT — but not for conversation. We use them as spatial reasoning engines that fuse heterogeneous sensor data into what we call a Shared Canvas. Temporal self-attention allows the system to remember where an object was even during temporary occlusions — a pedestrian walking behind a parked truck doesn't disappear from the model's awareness just because the cameras can't see her for two seconds.

The $8.5 Million Lesson

The Uber ATG settlement was $8.5 million. The Cruise criminal fine was $500,000 — a number that doesn't begin to account for the operational shutdown, the reputational damage, or the human suffering. The NHTSA probe into Tesla covers 2.9 million vehicles. The global average cost of a single data breach is now $4.44 million.

When I add these numbers up, the conclusion is uncomfortable for the "move fast and break things" crowd: the cheap AI wrapper is the most expensive mistake an enterprise can make. Not because it doesn't work — it works fine in controlled environments. But the moment it encounters the uncontrolled world — the dark road, the post-impact confusion, the sun glare on wet asphalt, the angry crowd — the absence of deterministic safety architecture turns a software limitation into a human catastrophe.

People sometimes push back on our approach by saying formal verification is too slow, too expensive, too academic for real-world deployment timelines. I understand the objection. Verification is computationally expensive. Pruning networks for verifiability takes time. Building safety architectures with hard assurance gates is more work than wrapping an API.

But I'd ask those people to watch the Cruise dragging video. To read the NTSB report on Elaine Herzberg's death. To look at the 18 red-light complaints in the Tesla FSD investigation. And then tell me that "too slow" is a valid criticism of an approach designed to prevent exactly those outcomes.

The era of building autonomous systems on probabilistic hope is ending. Not because regulators are forcing it — though they are — but because the physics of the real world demands it. An AI system that navigates a thousand intersections perfectly and then runs a red light on the thousand-and-first isn't 99.9% safe. It's unsafe, period. Safety isn't a percentage. It's a property — one that either holds under all verified conditions or doesn't hold at all.

That's the shift I'm building Veriprajna around. Not better wrappers. Not faster demos. Deterministic assurance for systems where failure isn't a bug report — it's a body count.