An editorial image conveying the shift from massive physical trial-and-error experimentation to computationally-guided precision discovery in chemistry.

Artificial IntelligenceMaterials ScienceMachine Learning

We Spent Months Building AI That Predicts Materials Before Synthesizing Them — Here's Why the Lab of the Future Won't Start With a Beaker

Ashutosh Singhal February 27, 202614 min read

There's a moment I keep coming back to. It was late on a Thursday, and I was staring at a spreadsheet someone had sent over — a log of every compound a mid-size pharma team had physically synthesized and tested over the previous eighteen months. Thousands of rows. Reagent costs, synthesis hours, characterization results. And next to each row, a column labeled "Outcome." The vast majority said the same thing: Fail.

Not "interesting negative result." Not "informative dead end." Just: fail. Compound didn't bind. Material wasn't stable. Reaction didn't yield. Thousands of experiments, millions of dollars, and the team was essentially back where they started — except now they knew what didn't work. Which is something, I suppose, if you write it down. Most teams don't even do that.

That spreadsheet crystallized something I'd been circling for a while at Veriprajna. The way most R&D labs discover new materials and molecules is fundamentally broken — not because the scientists aren't brilliant, but because the method itself has hit a wall that no amount of brilliance can scale. The search space for drug-like molecules is estimated between 10⁶⁰ and 10¹⁰⁰. The number of atoms in the observable universe is roughly 10⁸⁰. We are asking human beings to find needles in haystacks that are literally larger than the cosmos, and handing them tweezers.

I'm going to tell you why we built what we built, what we got wrong along the way, and why I believe the era of guess-and-check science — what's often called the "Edisonian approach" — is ending. Not gradually. Abruptly.

Why Is the Edison Method Still the Default in R&D?

Thomas Edison tested thousands of carbon filaments before finding one that glowed long enough to be useful. That story gets told as a parable about persistence. What gets left out is that even Nikola Tesla, Edison's contemporary, pointed out that "a little theory and calculation" could have saved 90% of the labor. Edison eventually came around to more structured methods himself. But his legacy — brute-force trial and error — somehow became the foundational methodology of modern pharmaceutical and materials research.

High-Throughput Screening, or HTS, was supposed to industrialize this. Automate the guessing. Test a million compounds instead of a thousand. And for a while, it worked — or at least, it felt like it was working. But the hit rates kept dropping. The false positives kept climbing. The compounds that "worked" in the screen turned out to be toxic, insoluble, or impossible to manufacture at scale. A standard HTS campaign might test 10⁶ compounds. Even if you scaled that to a billion — 10⁹ — you'd have explored approximately 0.000000000000000000000000000000000000000000000000001% of the available chemical space.

The Edison Method in modern chemistry is like trying to map the Pacific Ocean by dipping a teaspoon into the water at random intervals.

The financial consequences are brutal. The cost of developing a new drug hit approximately $2.23 billion per asset in 2024. The internal rate of return on pharma R&D cratered to 1.2% in 2022 before rebounding to 5.9% in 2024 — and that recovery was largely driven by a single class of drugs (GLP-1 agonists), not by any systemic improvement in how discovery works. This decline even has a name: Eroom's Law. Moore's Law spelled backwards. Every decade, drug discovery gets slower and more expensive.

I remember a conversation with a materials scientist — someone I deeply respect — who told me, flatly, "We know we're wasting most of our budget. But we don't know which experiments to skip." That sentence haunted me. Because the honest answer is: you can know. You just need a different kind of instrument. Not a better pipette. A better map.

The Night the Simulation Disagreed With the Lab

When my team first started building physics-informed models for materials prediction, we had a humbling experience. We'd trained a model to predict the stability of a class of inorganic compounds. The model flagged a particular composition as thermodynamically unstable — essentially, it said "don't bother synthesizing this." But the literature suggested otherwise. A published paper claimed this composition was promising.

We argued about it for two days. Half the team wanted to trust the model. The other half said we were being arrogant — who were we to override published experimental results? We eventually dug deeper into the paper and found that the "promising" result was based on a metastable phase that degraded within hours under real operating conditions. The model had been right. The paper had technically been right too — the material existed — but it was useless for any practical application.

That was the moment I understood the difference between Physics-Informed Machine Learning and the kind of AI that dominates the headlines. Most AI in the market right now — the stuff built on top of large language models — learns from correlations in data. It's pattern matching at scale. And it's extraordinary for many tasks. But molecules aren't sentences. They're three-dimensional graphs with geometric constraints, electron orbitals, chirality, and thermodynamic boundaries that don't care what patterns GPT-4 has seen in its training corpus.

Physics-Informed ML embeds the actual laws of physics — conservation of mass, thermodynamic equations, quantum mechanical constraints — directly into the model's architecture. Instead of needing millions of data points to learn that energy is conserved, the model knows it from the start. This means three things that matter enormously in practice:

It needs far less training data. It can extrapolate beyond its training distribution without generating physically impossible results. And it doesn't hallucinate molecules that violate basic chemistry.

I wrote about this distinction in much more depth in the interactive version of our research, but the core insight is simple: if your AI can propose a molecule that violates conservation of mass, it's not doing science. It's doing autocomplete.

What Happens When You Close the Loop?

A labeled process diagram showing the closed-loop autonomous discovery cycle — AI predicts, robot synthesizes, sensors characterize, data feeds back — contrasted with the traditional open-loop linear process.

Here's where things get interesting — and where we spent most of our engineering effort. Predicting materials computationally is the first step. But prediction alone is still an open loop. You simulate, you get a result, a human interprets it, another human designs the next experiment, someone books time on the synthesizer, and weeks later you have one data point. The bottleneck isn't the AI. It's the human in the middle.

The real transformation happens when you close the loop: the AI predicts, a robot synthesizes, sensors characterize the result, the data feeds back into the model, and the AI selects the next experiment. No human in the middle. Design-Make-Test-Analyze, running continuously.

The A-Lab at Lawrence Berkeley National Laboratory demonstrated this at scale. Their autonomous system synthesized 41 novel inorganic compounds in 17 days — a throughput that would take human researchers months. When a synthesis failed, the AI analyzed the X-ray diffraction pattern, adjusted precursor ratios and heating profiles, and retried. A 71% success rate for novel materials, achieved by a system that corrects its own mistakes in real time.

But the mathematical engine underneath this — the part that makes the whole thing work — is something called Active Learning with Bayesian Optimization. And it's worth understanding, because it's the reason closed-loop labs are not just faster than human-led labs, but fundamentally more efficient.

Why Does Bayesian Optimization Beat Random Screening?

A side-by-side comparison showing random screening (scattered dots across a search space with mostly misses) versus Bayesian Optimization (strategic sampling guided by uncertainty, converging on the optimal region).

Traditional screening is random. You pick compounds from a library, test them, and hope. Bayesian Optimization does something radically different: it builds a probabilistic model of the entire search space, including what it doesn't know, and then strategically selects the experiment that will teach it the most.

The model predicts two things for every untested point: an expected value (how good this material might be) and an uncertainty (how confident the model is in that prediction). Then an acquisition function — think of it as the AI's decision-making strategy — balances exploration (investigating uncertain regions) against exploitation (refining areas that look promising).

This is where it gets elegant. The ANI-1x machine learning potential achieved DFT-level accuracy — that's Density Functional Theory, the gold standard of computational chemistry — while using only 10% of the data that naive random sampling would require. And Cost-Informed Bayesian Optimization can reduce reagent costs by up to 90% by factoring in the price of each experiment when deciding what to test next. If two experiments offer similar information, but one costs $5,000 in reagents and the other costs $50, the system picks the cheap one.

Active Learning doesn't just find answers faster. It asks better questions.

One thing that consistently surprises people: in this framework, failed experiments are some of the most valuable data you can generate. In the Edisonian model, a negative result gets buried in a lab notebook. In Active Learning, every failure sharpens the model's understanding of where the boundaries are. It maps the dead ends of chemistry — permanently — so the organization never wastes resources on those paths again. That topological knowledge of the failure landscape is intellectual property that compounds over time.

The "Just Use GPT" Problem

A labeled architecture diagram showing the hybrid AI system — LLM as orchestrator handling literature parsing and strategy, GNN constrained by physics handling molecular property prediction — with clear role separation.

I need to address something directly, because I hear it constantly. Investors, potential clients, even some scientists say: "Why not just use GPT-4 for this? It knows chemistry."

It doesn't. Not in the way that matters.

Large language models represent molecules as text strings — typically SMILES notation, which is a linear encoding of a three-dimensional structure. This is like describing a building by reading its address aloud and expecting someone to understand the floor plan. LLMs are "permutation sensitive" (the order of characters matters), while molecules are permutation invariant (the order you list the atoms is irrelevant). Benchmarks consistently show that Graph Neural Networks, which model molecules as actual 3D graphs with nodes and edges, outperform LLMs on property prediction tasks involving geometric structure.

The right architecture — the one we advocate for and build — is hybrid. LLMs are brilliant orchestrators. They can parse scientific literature, extract synthesis recipes, generate experimental protocols, and reason about high-level strategy. But for the heavy lifting of molecular design, stability analysis, and property prediction, you need Graph Neural Networks constrained by physics. The LLM is the project manager. The GNN is the engineer. You need both, and you need to know which one to trust with which task.

Many current AI offerings in science are wrappers around public LLM APIs. A wrapper cannot enforce conservation of mass. It cannot navigate a 10¹⁰⁰ search space with Bayesian rigor. It cannot integrate with the robotic hardware that closes the loop.

For the full technical breakdown of these architectural decisions — including how multi-fidelity optimization fuses cheap simulation data with expensive experimental results — see our detailed research paper.

The Plumbing Nobody Talks About

There's a dirty secret in autonomous lab research: the AI is often the easy part. The hard part is getting the spectrometer to talk to the liquid handler to talk to the hotplate to talk to the AI. Laboratory instruments from different vendors speak different proprietary languages. Without a universal translation layer, your self-driving lab is a brain in a jar.

This is why the SiLA 2 standard — Standardization in Lab Automation — matters so much, and why we spent an unglamorous but critical amount of time on middleware. SiLA 2 treats every instrument as a microservice. The AI sends a high-level command ("Dispense 5ml") without needing to know the serial port protocol of the specific robot arm. It runs on modern web protocols, supports cloud connectivity, and — crucially — can wrap legacy instruments. A twenty-year-old HPLC can become part of an autonomous loop.

Before any physical robot moves, we simulate the entire experiment in a Digital Twin — a virtual replica of the lab that validates timing, collision paths, and logistics. When the real experiment runs, we compare sensor data against the twin's predictions to catch anomalies: a clogged pipette, a drifting temperature sensor. The twin is the safety net that makes autonomy trustworthy.

I'll be honest: this is the part of the work that nobody writes breathless press releases about. But it's the part that determines whether a closed-loop lab actually works in production or just works in a demo.

The Numbers That Changed My Mind

I came into this work as a skeptic of hype. AI in drug discovery has been "five years away" for twenty years. What changed my mind wasn't theory. It was specific results.

Exscientia got AI-designed small molecules into Phase I clinical trials in roughly 12 months, against an industry average of 4–5 years. Insilico Medicine moved a fibrosis candidate from target discovery to preclinical candidate in under 18 months, at a fraction of the typical cost. The A-Lab's 41 compounds in 17 days. The Broad Institute's predictive toxicity models that filter out dangerous compounds before synthesis, saving millions in downstream failure costs.

These aren't projections. They're results. And they share a common architecture: simulation before synthesis, physics-informed models, closed feedback loops, and systematic capture of negative data.

People sometimes ask me whether this eliminates the need for wet labs entirely. It doesn't. The wet lab is still where truth lives — where the model's predictions meet reality. What changes is the ratio. Instead of running ten thousand experiments to find one hit, you run fifty. The wet lab becomes a validation instrument, not a search engine.

Others worry about job displacement — will this replace scientists? In my experience, the opposite happens. The scientists I've worked with who adopt these tools don't become less important. They become more strategic. They spend their time on the problems that require human judgment — interpreting unexpected results, designing new experimental paradigms, asking questions the AI hasn't been trained to ask — instead of pipetting their way through a library of compounds they already suspect won't work.

The Art Became Engineering

I think about that spreadsheet often. All those rows of "Fail." Each one represented someone's hypothesis, someone's afternoon, someone's budget. In the Edisonian model, those failures were the cost of doing business — inevitable, expected, and largely invisible in the final accounting.

In the model we're building, every one of those failures would have been predicted. Not all of them — I'm not claiming omniscience. But enough of them that the spreadsheet would be a fraction of its length, and the "Outcome" column would look very different.

The search space for new molecules and materials is incomprehensibly vast. No amount of human intuition, no fleet of liquid handlers, no billion-dollar HTS campaign can meaningfully explore it through physical experimentation alone. The math simply doesn't allow it. What the math does allow is intelligent navigation — using physics-constrained models to simulate before synthesizing, Bayesian optimization to ask the right questions in the right order, and robotic automation to close the loop between prediction and reality.

The Edisonian era produced extraordinary things. But it was a methodology born from a time when theory couldn't keep up with experiment. We no longer live in that time. The theory is here. The compute is here. The robotics are here. The only thing left is the institutional willingness to stop treating R&D as an art practiced by gifted individuals and start treating it as an engineering discipline powered by deterministic systems.

Don't guess and check. Simulate and select.

That's not a slogan. It's an economic imperative. Every material tested physically that could have been ruled out computationally is money set on fire. And the organizations that understand this first won't just move faster — they'll make the old way of working economically unviable for everyone else.