A drone navigating autonomously through a GPS-denied environment, emphasizing onboard perception over satellite dependency.

Artificial IntelligenceDronesRobotics

Your Drone Is Not Autonomous — It's Just Automated in a World That Hasn't Tried to Kill It Yet

Ashutosh Singhal February 10, 202615 min read

There's a moment I keep coming back to. We were running a test flight in a simulated GPS-denied corridor — nothing fancy, just a standard quadcopter with our navigation stack bolted on. The GPS module was physically disconnected. My engineer, who'd spent three weeks tuning the Visual Inertial Odometry pipeline, was standing next to me with his arms crossed, chewing on a pen cap. The drone lifted off, hovered, and began threading its way through the test environment using nothing but a stereo camera and an IMU.

Then I walked over and switched on a consumer-grade GPS jammer we'd bought for testing. Nothing changed. The drone didn't flinch. It didn't know there was anything to flinch about — it had never been listening to the sky in the first place.

That was the moment I understood, viscerally, what we'd been arguing about in whiteboards and Slack threads for months. The drone wasn't resilient to jamming. It was indifferent to it. And that indifference — that total independence from a signal that can be wiped out by a $50 device — is the entire point.

I'm Ashutosh, founder of Veriprajna. We build navigation and perception systems for drones that operate in environments where GPS doesn't exist, where cloud connectivity is a fantasy, and where "return to home" means nothing if you don't know where you are. I want to tell you why the word "autonomous" as the drone industry uses it is a lie, and what it actually takes to build a machine that can think for itself.

The $1 Billion-Per-Day Assumption Nobody Questions

Here's a number that should unsettle you: GPS generates approximately $1.4 trillion in economic benefits for the U.S. private sector. A loss of GPS service would cost the U.S. economy roughly $1 billion per day. We've built an entire civilization's logistics, agriculture, finance, and defense infrastructure on signals transmitted from 20,200 kilometers above the Earth — signals that arrive at your receiver with the power of a 25-watt light bulb viewed from 10,000 miles away.

That's not a metaphor. That's the actual signal strength. And every drone manufacturer in the world has built their "autonomous" systems on top of it.

I spent years in the AI space before founding Veriprajna, and the thing that radicalized me about drone navigation was watching footage from Ukraine. FPV drones — cheap, effective, responsible for an estimated 70% of troop casualties — routinely lose GPS within 5 to 10 kilometers of front-line electronic warfare deployments. Russian systems like the R-330Zh Zhitel create near-constant area denial. When GPS goes dark, these drones don't degrade gracefully. They become, as I've started calling them, expensive paperweights.

A drone that depends on GPS for stability is not autonomous. It is automated within a permissive environment. Remove the permission, and you remove the autonomy.

This isn't just a military problem. It's a physics problem that shows up everywhere GPS signals can't reach: underground mines, urban canyons, the underside of bridges, the narrow gaps between oil storage tanks. Anywhere the signal bounces, degrades, or simply doesn't penetrate.

Why Did We Assume the Sky Would Always Be There?

I think the honest answer is convenience. GPS is magic — free, global, accurate enough for most things. When you're building a drone company, the navigation problem feels solved on day one. Plug in a GPS module, write some waypoint logic, and call it autonomous. Ship it.

The first time I pitched our approach — building navigation from the ground up using onboard vision and inertial sensing — an investor looked at me and said, "Why wouldn't you just use better GPS?" I tried to explain that "better GPS" is an oxymoron when someone is actively trying to deny you GPS. He wasn't convinced. He'd never had to think about a world where the infrastructure fails.

But the infrastructure does fail. In mining, it was never there to begin with. A drone inspecting a stope after blasting — flying through dust and potentially toxic gases in total darkness — has zero satellite signal. In oil and gas pipeline inspection, where a single failure can cost $8.5 million versus $75,000 for a repair caught early, drones need to fly in GPS shadows created by massive metallic structures. The multipath effect corrupts timing calculations and introduces position errors of several meters. Several meters, when you're flying next to a pressurized pipeline.

The industry's answer has been optical flow — a downward-facing camera that tracks ground texture. It's better than nothing. But it needs good lighting, it needs visible texture, and it still relies on GPS for yaw and altitude reference. It's a band-aid, not a solution.

What Does It Actually Mean to Navigate Without GPS?

A labeled diagram showing how Visual Inertial Odometry (VIO) fuses camera and IMU data, illustrating each sensor's weakness and how fusion cancels them.

This is where I need to take you inside the engineering, because the solution is beautiful in the way that biology is beautiful. Think about how you navigate a dark room. You don't use GPS. You use your eyes and your inner ear — vision and your vestibular system. You see landmarks, you feel acceleration and rotation, and your brain fuses those two streams into a continuous sense of where you are.

Visual Inertial Odometry — VIO — does exactly this for a drone. A camera tracks distinctive features (corners, edges, texture) across successive frames. An Inertial Measurement Unit, or IMU, measures acceleration and rotation at extremely high frequency, often 200 to 1000 times per second. Neither sensor works alone. The camera is too slow and can't judge absolute scale. The IMU drifts catastrophically — double-integrating acceleration to get position means errors grow quadratically with time. A consumer-grade IMU can drift meters within seconds.

But fused together, they cancel each other's weaknesses. The IMU provides high-rate state prediction and handles rapid maneuvers where images blur. The camera anchors the drifting IMU estimate to fixed landmarks in the world. The result: drift rates as low as 1–2% of distance traveled, even in GPS-denied environments. No satellites. No external signals. Nothing to jam.

I wrote about this fusion architecture in depth in the interactive version of our research, but the key insight is simpler than the math: VIO is un-jammable because it's passive. It receives light and feels inertia. There's no signal to intercept, no frequency to overwhelm, no link to sever.

The Night We Broke Our Own System

I want to be honest about something. VIO is not magic. We learned this the hard way.

About four months into development, we were testing in a warehouse — concrete floors, white walls, fluorescent lighting. The drone took off, flew beautifully for about thirty seconds, and then started drifting sideways like it was drunk. My lead engineer pulled the logs and went quiet for a long time. Then he looked up and said, "It can't see anything."

White walls. Uniform concrete. No texture, no corners, no features to track. The camera was staring at a blank canvas, and the VIO pipeline was running on pure IMU integration — which meant it was accumulating drift at a terrifying rate.

That failure taught us more than any success. We spent the next several weeks integrating two critical mitigations. First, LiDAR-VIO fusion — adding a lightweight solid-state LiDAR that provides dense geometric data even in total darkness or featureless environments. The LiDAR point cloud gives the system geometric constraints when cameras fail. Second, and this is where it gets interesting, semantic masking.

Why Does a Navigation System Need to Understand What It Sees?

A side-by-side comparison showing how standard VIO sees raw geometric points versus how semantic SLAM classifies and masks dynamic objects, preventing navigation errors.

Standard VIO treats the world as a cloud of meaningless points. A corner is a corner whether it's on a building or on a moving truck. This creates a devastating failure mode: if the drone tracks features on a moving object and assumes they're stationary, it miscalculates its own motion to compensate. The drone thinks it's moving when it's not, or vice versa.

We had this happen during an outdoor test. A delivery truck drove through the frame, and the drone lurched sideways trying to "correct" for motion that wasn't its own. My stomach dropped. In a mine shaft or near a pipeline, that lurch is a crash.

The fix required what I think of as the leap from navigation to understanding. We run deep learning models — semantic segmentation networks — that classify every pixel in the frame. Car. Person. Tree blowing in wind. These dynamic regions get masked out of the VIO pipeline entirely. The drone only tracks static background features.

Geometric SLAM sees points, lines, and planes. Semantic SLAM sees "door," "wall," "truck." That difference is the difference between a system that navigates and a system that understands where it is.

This semantic layer does something else remarkable: it enables long-term navigation. Geometric features — the pixel intensity of a corner — change with lighting. The same building looks completely different at noon versus midnight. But the concept of a "window" or "door" is invariant to lighting. A drone with semantic SLAM can recognize a location visited during the day even when returning at night, as long as the semantic structure is visible.

It also enables human-centric commands. "Fly through the door." "Inspect the red tank." Not "fly to coordinate 47.3821, -122.3456." For operators in high-stress environments — a mine manager after a blast, a soldier under fire — that difference in cognitive load is enormous.

The Cloud AI Trap That Almost Got Us

A diagram contrasting cloud-dependent vs. full edge processing architectures, showing the latency and vulnerability problems of cloud dependency.

Early on, before we'd fully committed to edge processing, someone on my team proposed a hybrid architecture: run the VIO locally but stream video to the cloud for semantic processing. On paper, it made sense. Cloud GPUs are powerful. Why cram everything onto a tiny embedded board?

We built a prototype. It worked in the lab, where we had perfect Wi-Fi. Then we tested it with realistic network conditions — simulated 4G with occasional dropouts — and watched the semantic mask arrive 300 milliseconds after the drone needed it. At 20 meters per second, that's six meters of blind flight. The drone was making navigation decisions based on where dynamic objects were, not where they are.

That was a team argument that got loud. One camp wanted to optimize the network path. I pulled rank — the only time I've done it on a technical decision — and said we're going full edge. No cloud dependency. Period.

Here's why I was so stubborn about it. In defense applications, a drone streaming video to the cloud is a radio beacon. Enemy direction-finding assets can triangulate it. You've built a "smart" drone that announces its position to everyone with an RF scanner. In industrial settings, network coverage inside a mine or between storage tanks is unreliable at best. And in both cases, the latency isn't just average latency — it's tail latency, the 99th percentile worst case, that kills you. A momentary spike from congestion or cell tower handover, and your control loop goes unstable.

If your drone's intelligence lives in the cloud, severing the network link doesn't degrade the system — it lobotomizes it. The drone doesn't get slower. It gets stupid.

Research shows that teleoperation becomes practically uncontrollable above 700 milliseconds of latency. And jitter — the variance in latency — is worse than constant delay, because control algorithms can compensate for a known lag but oscillate wildly when the lag keeps changing.

We moved everything onboard. Every neural network, every optimization loop, every decision. For the full technical breakdown of our architecture, including the specific sensor fusion approaches and algorithm comparisons, I've published our detailed research.

How Do You Run All of This on a Device That Flies?

This is the part that keeps me up at night, honestly. Running non-linear optimization for VIO simultaneously with convolutional neural networks for semantic segmentation, all at 30+ frames per second, on a board that weighs grams and draws watts — not kilowatts — is an engineering problem that has no room for sloppiness.

We build on the NVIDIA Jetson Orin NX, which delivers 100 TOPS (trillion operations per second) in an embedded form factor drawing 10 to 25 watts. That's a staggering amount of compute for something you can hold in your hand. But raw silicon isn't enough.

We use NVIDIA's TensorRT to compile our neural networks with Int8 quantization — converting 32-bit floating point weights to 8-bit integers. This sounds like a brutal approximation, and it is, but done carefully it doubles or triples inference throughput with minimal accuracy loss. We offload feature tracking to dedicated vision accelerator cores, freeing the GPU for deep learning. The non-linear optimization backend — bundle adjustment, the mathematical heart of SLAM — runs as parallelized CUDA kernels.

The result is a heterogeneous computing pipeline where the flight controller receives odometry updates at over 50Hz regardless of scene complexity. The drone doesn't stutter when it enters a visually complex environment. It doesn't slow down when it needs to think harder.

What Happens When the Drone Gets Lost?

This was another fear that kept me awake. VIO gives you local consistency — "I moved 5 meters forward" — but it accumulates drift over time. Without GPS providing an absolute position fix, how do you prevent errors from compounding over a long mission?

The answer is loop closure, and it's one of the most elegant ideas in robotics. When the drone returns to a previously visited area, the system matches the current visual fingerprint against its stored map. If it recognizes where it is, it calculates the total drift accumulated since the last visit and snaps the entire trajectory back into alignment. It's like the drone's own internal GPS correction, except it comes from recognition rather than satellites.

We use a modified version of ORB-SLAM3 — the first system capable of multi-map merging. If the drone loses tracking during an aggressive maneuver (or gets "kidnapped," as roboticists charmingly call it), it starts building a new map. When it later recognizes a previously mapped location, it merges the maps. This makes the system remarkably resilient to exactly the kind of disruptions you'd expect in real operations.

We enhanced the standard ORB feature extraction with deep learning — SuperPoint and SuperGlue networks that find and match features even in challenging lighting where traditional computer vision fails. This hybrid approach gives us the robust mathematical backend of ORB-SLAM3 with the perceptual capability of modern neural networks.

Who Actually Needs This?

People always ask me whether this is a solution looking for a problem. It's not. The problem is screaming at us from three directions simultaneously.

In defense, GNSS denial is the first move in modern warfare. It's asymmetric — a cheap ground-based jammer neutralizes expensive aerial assets over vast areas. VIO-equipped drones can lock onto a target visually and execute autonomously even after the command-and-control link is severed. They operate in total radio silence, invisible to RF scanners. A single operator can deploy a swarm that navigates a GPS-denied corridor using nothing but onboard perception.

In mining, the environment is naturally GPS-denied. After blasting, stopes fill with dust and toxic gases. Waiting for human clearance costs money and risks lives. A VIO-enabled drone flies in immediately, inspects rock fragmentation and structural stability, and returns data in minutes instead of the days a manual survey requires. Drone operations can reduce inspection costs by up to 70% compared to traditional methods — but only if the drone can actually fly where it needs to.

In infrastructure inspection, the economics are brutal. Pipeline failures cost millions. Drones are the answer — but inspecting the underside of a bridge or the base of a tank farm puts them in GPS shadows where they can't maintain the precise station-keeping required for high-resolution imaging. VIO solves this. The drone holds position with centimeter-level precision regardless of satellite visibility, turning reactive maintenance into predictive maintenance.

The Word That Needs to Change

I've become somewhat obsessed with the distinction between "automated" and "autonomous." An automated system executes a pre-defined script based on external inputs — GPS coordinates, pilot commands. Remove the inputs, and the script crashes. An autonomous system perceives its environment, determines its state, and makes decisions without external reliance.

Almost every commercial drone on the market today is automated. The industry calls them autonomous because the word sells better. But the distinction isn't semantic — it's the difference between a system that works when everything goes right and a system that works when everything goes wrong.

The era of automated drones — reliant on fragile satellite tethers and cloud connectivity — is ending. The future belongs to systems that carry their intelligence with them.

We don't wrap APIs at Veriprajna. We don't fine-tune language models and call it robotics. We engineer the fundamental navigation and perception stacks that allow machines to exist and act in the physical world — to perceive, understand, and navigate without asking anyone's permission.

For the defense commander, the mine operator, and the infrastructure manager, this distinction isn't academic. It's the difference between a mission that succeeds and a machine that falls out of the sky.

The sky was never going to be there forever. We just built like it would be.