Phone showing a person mid-squat with a teal skeleton overlay, left knee drifting inward, a green 'GOOD REP' badge in the corner
Artificial IntelligenceHealthcareMachine Learning

We Built an AI That Counted Perfect Squats. A 62-Year-Old's Knee Was Collapsing the Whole Time.

Ashutosh SinghalAshutosh SinghalMay 15, 202614 min read

There's a frame I keep coming back to, frozen on a laptop somewhere in our codebase: a 33-point skeleton overlaid on a man doing a bodyweight squat in his living room. Frame 58 of about 90. It's our software's drawing — the dots on his hips, knees, ankles, the green lines connecting them. The phone camera caught it at 30 frames a second, and our system had already scored the rep.

It scored it as good.

The man was 62, eight weeks out from ACL reconstruction surgery, doing exactly the exercise his physical therapist had prescribed. And in that frame, his left knee was caving inward — drifting about four centimeters toward the midline of his body, away from the straight line his hip and ankle should have held. In a healthy 30-year-old that's a sloppy rep. In a post-ACL knee, that inward collapse is one of the specific mechanical patterns that re-tears the graft you just paid a surgeon to install.

Our AI counted it as a win. That morning is the reason I can tell you, with some confidence, what the actual hard problem in fitness AI is — and why almost nobody is solving the part that matters.

Pose estimation is the sensor. It is not the brain.

Here's the thing I wish someone had told me before we wrote a line of code: the part everyone thinks is hard is free.

Tracking a human body through a camera — finding the joints, drawing the skeleton, following 33 keypoints frame by frame — is a solved problem. Google's BlazePose and MoveNet, the MediaPipe framework, all of it is open-source and runs on a phone in your pocket. Every fitness AI company on earth is running the same handful of pose-estimation libraries. There is no moat there. I learned this the expensive way, by initially believing the opposite.

Pose estimation tells you where the knee is. It does not tell you whether the knee is in trouble.

The hard problem lives one layer up. It's the interpretation: knowing that a four-centimeter inward drift means knee valgus, that knee valgus is a re-injury risk for this specific patient with this specific history, and that it should trigger a clinician alert rather than a "nice work" animation. The sensor is commoditized. The brain is not. That gap — between sensing a movement and understanding it — is the whole game, and it's what we eventually built the exercise-verification engine for.

I didn't understand the gap until our own product fell straight into it.

The "good rep" counter that lied

Annotated squat figure showing 78-degree flexion vs 90 target, 4cm inward knee drift, and 2:1 tempo, with a pose-library-vs-clinician verdict banner

When we started, we did what felt obvious and fast. We took the keypoints coming out of the pose model, wrote some geometry to measure joint angles, set a threshold — knee bends past this angle, count a rep — and shipped a clean little engine that counted repetitions and flashed form tips. It demoed beautifully. Reps ticked up. Green checkmarks. In a conference room it looked like a finished product.

The squat video broke that illusion. We had the per-frame data sitting right there in a CSV — I could see the patient's left-knee horizontal position sliding medially through the descent, frames 45 to 72. I could see his knee only reached 78 degrees of flexion when the prescribed target was 90. I could see the descent took about a second and the way back up took more than twice that — a 2-to-1 ratio that screams he was dropping into the squat on momentum and then grinding to stand back up.

Three independent red flags. Insufficient depth, an inward collapse, and a compensatory tempo. Any physical therapist glancing at that clip would have stopped the session. Our software, looking at the exact same numbers, returned: rep complete, good form.

That was the failure I had personally argued for. I'd been the one saying "let's get something counting reps and iterate" — the lean thing, the shippable thing. And the lean thing was actively dangerous, because it gave a recovering patient confident positive feedback while he reinforced the precise movement his surgeon was trying to undo.

A rep counter that can't tell a safe squat from an unsafe one isn't a smaller version of the product. It's the wrong product wearing the right product's clothes.

The lesson underneath that frame: raw pose data is necessary and nowhere near sufficient. 78 degrees is not "bad" in the abstract — for a week-eight post-ACL patient who was at 60 degrees two weeks ago, it might be exactly the right amount of progress. The same number can be a success or a warning depending entirely on who is moving and where they are in recovery. A fixed threshold can't hold that. The intelligence has to.

Why the accuracy gap makes this harder, not easier

I'll add the part the demos never show you, because it took us a while to respect it. A single phone camera is not a motion-capture lab.

When researchers put monocular pose estimation head-to-head against Qualisys — the gold-standard, multi-camera marker system clinics use for real biomechanics — MediaPipe correlates around 0.80 for the lower limb and 0.91 for the upper limb. Respectable. But the error on something like knee flexion angle runs anywhere from 9 to 22 degrees in 2D, and a 2025 paper in Nature Scientific Reports was blunt that single-camera systems "do not deliver accurate depth estimations." Mirrors, dim living-room lighting, patterned leggings, a couch occluding half the body — every one of those degrades the signal further.

So you're building clinical judgment on top of a noisy sensor. That sounds like a reason to give up. It's actually the reason the interpretation layer has to be smart rather than literal. If your system treats a raw 78-degree reading as gospel, camera noise alone will have it flip-flopping between "pass" and "fail" rep to rep. The intelligence layer's job includes weighting which readings to believe — smoothing across frames, weighting the patterns that survive noise (a consistent medial drift) over the ones that don't (a one-frame jitter), and being honest about confidence. We lean on temporal models for this; a class of network called a temporal convolutional network can hit 98.7% accuracy on movement recognition in well under two milliseconds on a mid-range phone, small enough to run entirely on the device. But the architecture matters less than the principle: you don't build certainty by pretending the sensor is perfect. You build it by designing for the fact that it isn't.

Hasn't Hinge Health Already Solved This?

Three-column comparison: vendors that verify but are closed, vendors that bill but don't verify, and the gap for an embeddable verification engine

Around this time I started getting the "isn't this already solved?" question, usually from investors, occasionally from myself at 2am. The MSK care giants are real and they are big. So let me be precise about what they do and where the door is still open.

Hinge Health is projecting $732 million in revenue for 2026. They built TrueMotion computer vision for joint analysis and a triage assistant called Robin, and they have the clinical results to back it — among their patients, dramatically fewer spinal fusions and knee replacements than traditional care. Sword Health bought Kaia Health for $285 million in January 2026, folding Kaia's "Motion Coach" computer vision into Sword's wearable-sensor platform, and is reportedly raising another half a billion. These are formidable companies.

They are also closed boxes. Their verification technology is welded to their own care model, sold to enterprise employers, priced and packaged as a destination — not as something you can put inside the PT platform or wellness app you're already building. If you run a physical therapy software company and you want verified exercise data inside your product, Hinge and Sword don't sell you that. They sell you the thing that competes with you.

The rest of the field splits cleanly. Peloton IQ launched form-tracking cameras in October 2025 — but it's consumer fitness, locked to Peloton hardware, with no clinical capability. Kemtai offers a browser-based B2B vision platform tracking 44 landmarks, which is genuinely useful, but it's general-fitness and its form rules are one-size-fits-all, not configurable per patient per exercise. On the billing side, Limber Health and MedBridge handle the remote-monitoring workflow well — Limber's patients complete over three times more home-exercise sessions — but they manage the paperwork of monitoring; they don't independently verify whether the exercise was done correctly.

The market has companies that verify movement but won't let you build on them, and companies that handle the billing but can't verify the movement. The gap is an engine that does the verification and lives inside your product.

That gap is not a research problem. It's an integration-engineering problem, and that distinction is the whole reason a focused team can win here.

Why Does a Rep Count Fail an RTM Audit?

For the physical therapy platforms we work with, this stopped being academic the moment the reimbursement math changed.

Only about 35% of patients fully stick to their home exercise programs — most of the rest abandon within the first month — and patients famously overreport when you ask them. Clinicians know the self-reported data is fiction. What they want is Remote Therapeutic Monitoring: a set of CMS billing codes (98975 through 98981) that pay for monitoring a patient's musculoskeletal recovery between visits. Real recurring revenue.

The catch is what RTM documentation requires. To bill it and survive an audit, you need device-gathered data — with timestamps, tied to an actual clinical decision that changed the treatment plan. A rep count doesn't clear that bar. "Patient did 30 squats" is not the same as "patient's movement quality declined, triggering a protocol adjustment on this date." This is exactly where our rep-counter would have left a clinic exposed. I remember a PT telling us, politely, that she couldn't bill on our data because it told her that he exercised, not how — and "how" was the entire clinical and legal point.

Then the bar got lower and the stakes got higher at the same time. The CMS 2026 Final Rule added two new codes — 98979 and 98985 — and dropped the monitoring threshold from 16 days down to as few as 2, and the management-time minimum from 20 minutes to 10. Suddenly far more patients are billable. But the documentation standard didn't relax: it still demands device data tied to treatment decisions. More clinics rushing into RTM means more clinics generating exactly the kind of thin, count-only data that doesn't hold up. The reimbursement opportunity and the documentation risk grew in the same breath.

There's a regulatory edge running alongside this that shapes how you build. The FDA's January 2026 guidance lets a tool positioned purely as "wellness" sidestep medical-device classification — fine for a step-counter. But the moment you make a clinical claim, which RTM verification inherently does, you're potentially in software-as-a-medical-device territory, with the validation and oversight that implies. So the same engine has to be architected to live on either side of that line depending on what the customer is claiming, which is not a decision you want to discover late.

So the verified-exercise engine isn't a nice-to-have feature for these platforms. It's the difference between RTM revenue that survives an audit and RTM revenue that becomes a clawback.

The other buyer in the room: wellness without surveillance

Privacy pipeline: video and 33 keypoints stay on the phone via on-device inference; only an aggregate compliance token reaches the employer

I almost under-served this second buyer, and I'm glad we didn't — the human material is just as sharp.

Corporate wellness directors are staring at a different version of the same problem. Musculoskeletal issues cost an employer somewhere around $486 a year per employee in direct spend, plus an estimated $3,105 in lost productivity — $3,591 a head. As much as 36% of MSK surgeries are unnecessary, a roughly $90 billion drag on the workforce. The market response is enormous: corporate wellness spending is hitting $100 billion in 2026, and where 83% of large employers offered virtual MSK care in 2025, nearly all of them — 96% — plan to by 2027.

And yet only about a quarter of employees actually use the programs they're offered. More than half say they're reluctant to share health data with their employer at all. After years of Fitbit-shaking, step-faking gaming scandals, employees have learned that "wellness verification" often means "my boss is watching." A CHRO on one of our early pilots asked me, in that joking-but-not-joking way, whether what we'd built "felt like surveillance." It was the most useful question anyone asked us, because the honest first answer was: it could.

That objection rewired our architecture. The same skeletal keypoints that make exercise verification possible are, legally, a minefield — body-movement patterns can re-identify a person the way a fingerprint does, which puts pose data squarely in the conversation about biometric privacy laws like Illinois's BIPA and Europe's GDPR. There were 107 new BIPA class actions filed in Illinois in 2025 alone. No one has sued a fitness app over pose estimation yet, but the legal theory is fully assembled and waiting.

The fix for "does this feel like surveillance" turned out not to be a privacy policy. It was a design constraint: the raw video and the skeleton never leave the phone.

We process the movement on-device and transmit only aggregate compliance signals — the prescribed work was done, at this quality, on this date — never the video, never the keypoints. The employer gets verified participation. The employee's body data stays on the employee's device. That's the only version of corporate exercise verification I'd actually want used on me.

"But can't a phone camera never be accurate enough?"

People raise the accuracy ceiling as a reason this can't work, and I take it seriously — I just think it argues for the opposite conclusion. Yes, a monocular camera will never match a $100,000 marker lab. But clinical usefulness doesn't require lab-grade angles. It requires reliably catching the patterns that matter: a consistent inward knee drift, a depth that's regressing week over week, a tempo that signals compensation. Those are robust signals even through camera noise, if the interpretation layer is built to find them and ignore the jitter. The goal was never to measure 90.0 degrees. It was to know that this rep, for this person, today, is moving the wrong direction.

The other question I get is whether the new wave of "agentic AI" everyone's hearing about at conferences makes custom biomechanics work obsolete. It's the opposite. An autonomous health agent that adjusts a patient's exercise plan is only as trustworthy as the data underneath it. Point an agent at rep counts and it will confidently optimize a patient deeper into injury. The exercise-intelligence layer is the thing that makes agentic monitoring safe rather than reckless. The smarter the agent on top, the more it matters that the verification beneath it actually understands movement.

What that frozen frame taught me

We rebuilt the engine around a single idea: the threshold isn't a number, it's a clinical decision, and it belongs to the clinician. A squat-depth target keyed to "week 8 post-ACL" is not the same target as "70-year-old, post-knee-replacement" or "30-year-old corporate athlete." So we made the thresholds configurable — set by the clinician, per patient, per exercise — and we built the output to be the thing RTM billing and a nervous wellness buyer both actually need: structured, timestamped, protocol-mapped, privacy-respecting compliance data. Camera in, clinically meaningful decision out. If you want the full picture of how that pipeline fits together, it lives on our solutions page.

I still think about that 62-year-old man doing his squats, trusting a green checkmark that had no idea his knee was folding. He did everything right. He showed up, he did the work, he followed instructions. The software failed him quietly, in a way only a trained eye would have caught — and the whole promise of putting AI in his living room was that the trained eye would always be there.

A rep counter watches a body move. An exercise-verification engine knows what the movement means. Between those two sentences is a recovering knee, a billable clinical decision, and an employee deciding whether to trust the thing watching them. That's not a gap in the technology. It's the entire job.

Related Research