Semiconductor AI Verification & Silicon Correctness

Your team is already using LLMs on Verilog. The bug classes it cannot catch are the ones that kill silicon.

The 2024 Wilson Research Group / Siemens EDA Functional Verification study put first-silicon success at 14%, the lowest number in twenty years of tracking. In 2020 it was 32%. The cause is not lazy engineering. It is complexity outpacing the verification tools, a spec that mutates faster than the testbench, and a new class of failure that generalist LLMs introduce into RTL. We see five hallucination modes in HDL code the industry has not yet named cleanly.

Class 1

Syntactic hallucination

Code that does not compile. Caught by Verilator, Icarus, or the synthesis front-end in seconds. This is the class the industry already knows how to handle.

Class 2

Semantic hallucination (blocking vs non-blocking)

LLMs trained on Python and C write Verilog as if statements execute sequentially. They use blocking assignments (=) inside clocked always_ff blocks where non-blocking (<=) is required. The simulator may schedule events in an order that masks the race. Synthesis produces different logic. Silicon deadlocks.

// What the LLM wrote. Simulates "fine" in some simulators.
always_ff @(posedge clk) begin
  stage2 = stage1;  // blocking
  stage3 = stage2;  // now sees the NEW stage2, not the old one
end
// The designer wanted a 2-cycle pipeline. The silicon ships a 1-cycle bypass.

Class 3

Protocol hallucination (AXI, AHB, TileLink, PCIe)

The code compiles and passes 90% of directed tests. Then it asserts WVALID before AWREADY, or holds VALID high while flipping data, or violates a sub-clause buried on page 84 of the AMBA spec. The chip works on the internal test harness and hangs the moment it is connected to a third-party memory controller. We catch this with pre-verified SVA libraries for each protocol, not with more simulation cycles.

Class 4

Vacuity hallucination (the dangerous one)

The LLM generates an SVA property. The formal engine proves it. You ship. The property was trivially true because the antecedent never fires. This is worse than no verification, because you have a certificate that says "proven" on a buggy design. Any formal flow that does not run vacuity checks is theater. Siemens has been warning about this since 2017 and the field still ships tools without it.

// LLM-generated "grant follows request" property
property p_grant;
  @(posedge clk) req |-> ##[1:$] gnt;
endproperty
// If the LLM also set an assume that forces req = 0 always,
// the formal engine "proves" this property in milliseconds.
// The real arbiter is broken. The certificate is worthless.

Class 5

CDC / metastability blind spot

LLMs see signal names, not clock domains. They connect a 2 GHz CPU domain signal directly to a 400 MHz peripheral domain flop, skip the double-flop synchronizer, and simulation cannot catch it because RTL sim does not model metastability. Accellera opened a CDC/RDC/Glitch interoperability standard in 2024 precisely because the fragmentation across SpyGlass, Questa CDC, and Conformal CDC was breaking sign-off.

Why this matters in dollars: 70% of respins are caused by spec changes, not pure logic bugs (2024 Wilson / Siemens data). So a verification flow that only catches logic bugs addresses a subset. Classes 2 through 5 above are the subset that still blow tape-outs, because they bypass simulation and only show up in silicon. A 5nm respin is $10M in masks plus a 3 to 6 month schedule slip. On an 18-month product cycle, a 6-month slip can erase half of lifetime revenue.

The vendor landscape a fabless DV lead is actually choosing from

Your real alternatives are not theoretical. They are the three EDA giants (who you almost certainly already pay), six well-funded agentic AI startups pitching you at DVCon and DAC, Big 4 systems integrators, and the specialist formal consultancies. We have no product to sell against them. We help you pick, integrate, and operate the right combination.

Option	What they actually do	Strengths	Honest gaps
Cadence JasperGold, Cerebrus AI Studio, ChipStack Super Agent	Gold-standard formal engine. Multi-block RL-driven digital implementation. Agentic AI super agent announced Feb 2026.	JasperGold is the reference formal tool. Deep foundry integration. ~30% of EDA market.	Historical JasperGold baseline pricing ($225K base + $45K/seat) is out of reach for most early-stage RISC-V / AI accelerator startups. Cloud-first agentic features do not meet IP-sensitive on-prem requirements.
Synopsys VC Formal, DSO.ai, AgentEngineer	L4 agentic workflow (AgentEngineer, March 2026), claimed 2 to 5x productivity. RL-based design space exploration. $35B Ansys acquisition adds multiphysics.	Deepest customer base. Every large fabless already has a VC Formal contract. AgentEngineer is the most credible vendor agentic stack today.	Opinionated custom flows are not their business. They will not tell you to use an open-weight model or SymbiYosys. Small shops get templated attention.
Siemens EDA Questa Formal, Questa CDC, Catapult HLS	Strong Questa formal and CDC franchise. Publishes the Wilson study. Deepest automotive ISO 26262 track record.	Automotive qualification expertise. Good CDC / RDC story. Tool qualification packages ready.	Agentic AI story lags Cadence and Synopsys. Less RISC-V ecosystem focus.
ChipAgents $74M total, Feb 2026	Multi-agent RTL design and verification. DVCon 2026 demo of multi-agent Root Cause Analysis with no human in the loop.	Strongest pure-play agentic story. Matter Ventures (TSMC-backed), Bessemer, Micron, MediaTek, Ericsson on the cap table.	Cloud platform. On-prem / air-gapped deployment pathway is unclear for IP-sensitive customers. Integration into an existing Jenkins/CI sign-off flow is still DIY.
Normal Computing $85M+ total, Mar 2026	Auto-formalization: LLM translates engineer intent into formal properties and proves them. Samsung Catalyst led the last round. ARIA Scaling Compute programme.	Closest peer on the LLM + formal thesis. Claims half of the top 10 semiconductor design firms are using Normal EDA. Delivered real silicon (CN101).	Product, not consultancy. Not a fit if you need custom fine-tuning on your proprietary RTL corpus or integration into a legacy flow you will not rip out.
Axiomise Specialist formal consultancy	formalISA app deployed across Ibex, CVA6, cheriot-ibex, 0riscy, cv32e40p, WARP-V. Found 65+ bugs in Ibex including six debug-unit branch bugs.	The most credible RISC-V formal verification track record in the industry. Real, publishable bug finds. Deep ISA expertise.	Small team. Formal methods only; no LLM-assisted SVA generation, no on-prem LLM story, no integration with the agentic AI wave.
Big 4 / large SIs Accenture, Deloitte, Wipro, HCL	Large VLSI / verification services practices. Headcount on the shelf.	Scale. Offshore delivery. Existing MSA with your procurement.	Body-shop economics. Opinionated AI verification architecture is not their business. The partner who sold you the engagement has never written an SVA property in their life.
Veriprajna Vendor-neutral custom build	Fine-tune an open-weight coder LLM on your RTL corpus, wrap it around whichever formal engine you already own, wire it into your Jenkins/CI, add vacuity and coverage metrics. All on your hardware.	No product to push. On-prem / air-gapped by default. RISC-V, AXI4, RISC-V debug, and formal coverage economics are our comfort zone. Honest about what formal can and cannot do.	We do not replace your formal engine. We do not ship a qualified ISO 26262 tool of our own. Spec drift and organizational change are problems consulting cannot solve; we can only design around them.

Pricing, funding, and product information reflect public disclosures through early 2026. Always verify current terms directly with each vendor.

What we build

Every engagement is custom. These are the five shapes most fabless customers end up asking for, and the opinionated choices we make inside each.

1. On-prem LLM + formal glue layer

A fine-tuned open-weight coder model (Qwen 2.5 Coder, DeepSeek Coder, Llama 3.3, or Mistral Large) running on your own H100 or H200 cluster, wrapped around whichever formal engine you already own. Zero RTL ever leaves your network.

What we reach for: vLLM for inference, LoRA adapters per IP family so the base weights stay shared, local RAG over your spec documents and past bug history, a thin orchestration layer that calls JasperGold, VC Formal, Questa Formal, or SymbiYosys through their Tcl/Python APIs. The LLM never runs the solver. It writes properties and interprets counter-examples.

Why this not a hosted API: because your RTL is crown-jewel IP and your CISO is not signing a data processing agreement with a US or EU startup founded last year.

2. RISC-V formal harness and SVA library

Pre-built SystemVerilog assertion libraries for AXI4, AXI4-Lite, APB, AHB, and TileLink compliance, plus RISC-V pipeline hazard detection, Load-Store Unit scoreboarding, debug unit correctness, and CSR access checking, tuned to your custom extension ISA.

The reference point: Axiomise found 65+ bugs in the Ibex core through formal, including six debug-unit branch bugs that simulation missed. Formal works on RISC-V. The bottleneck is the scarcity of engineers who can write the assertions. We build the library so your team does not have to.

Honest caveat: a curated assertion library is more reliable than LLM-from-scratch generation but still cannot prove the absence of every bug class. We pair it with COI (cone of influence) and mutation-based coverage analysis.

3. Vendor-neutral tool selection and pilot

Your DV lead is getting pitched by ChipAgents, Normal Computing, MooresLabAI, Silimate, Bronco AI, and the in-house Cadence and Synopsys agentic products. Six products, six different claims, zero independent benchmarks on your actual RTL.

What we do: run a structured four-week bake-off on your codebase under NDA. Same test suite, same bug budget, same coverage targets. Honest report comparing bug-finding rate, false-positive rate, setup effort, integration debt, and the pricing terms each vendor actually offered you.

Why buyers trust us with this: we do not resell any of these products. If the right answer is "stay with JasperGold and add a thin LLM assist," we will say so.

4. Agentic RTL review in your CI

Every pull request that touches RTL gets reviewed by a multi-agent pipeline before a human looks at it. One agent lints and checks style. A second runs a formal property set derived from the changed files. A third checks CDC and RDC paths. A fourth generates a human-readable summary with counter-example traces where properties failed.

Opinionated choice: we run the agents inside your existing CI (Jenkins, GitLab, BuildKite, whichever). We do not replace your CI with a new platform. The agents are services the pipeline calls. When you fire us, you keep the pipeline.

What we refuse to build: an agent that auto-merges RTL without a human review. Silicon is not a microservice. You cannot ship a hotfix to a chip.

5. Chiplet / 3D-IC thermal-aware floorplanning (for RL placement, when appropriate)

This is the one place we think reinforcement learning for placement is actually worth deploying. The incumbents (Cadence Cerebrus, Synopsys DSO.ai) are tuned for monolithic 2D SoCs. The chiplet / UCIe wave has opened up a new class of floorplanning problem (inter-chiplet wire length, thermal stacking, bump pitch constraints) where the public tooling is immature.

What we build: a hybrid simulated-annealing + RL floorplanner on top of OpenROAD for the chiplet partitioning phase, with thermal constraints as a first-class reward term. Benchmarked against published ISPD / ICCAD results before we touch your design.

We acknowledge the AlphaChip controversy directly. Igor Markov's 2023 critique showed Google Circuit Training taking 32 hours where a tuned simulated annealing run took 12.5 hours and a Cadence commercial tool took 0.05 hours. We do not pitch RL as a replacement for tuned SA on well-understood problems. We use it where the design space is genuinely new and human intuition has no priors to draw on.

How we work

Every engagement starts with a two-week scoping phase on a small block of your RTL before we touch anything larger. We would rather walk away at week two than burn your schedule on a bad fit. Typical cadence for a full build.

Scoping · 2 weeks

Read your spec, walk through your existing flow, pick one representative block (often a bus interface, arbiter, or a single RISC-V pipeline stage) and run our baseline formal harness on it. Output: a written report with the bug classes we see, the assertions we would build, and a cost estimate for the full engagement. If the answer is "you should keep doing what you are doing," we say so and bill for the two weeks only.

Infrastructure · 4 to 6 weeks

On-prem LLM stack deployed on your cluster. Base model fine-tuned with LoRA adapters on your RTL corpus. RAG indexed over your specs and past bug database. Hooks into your formal engine, your Jenkins/CI, and your issue tracker. We instrument everything with proof coverage, vacuity, and bounded-depth metrics from day one.

Assertion library and bring-up · 6 to 10 weeks

We port or write the SVA library (protocol compliance, pipeline, CDC) for your top 3 to 5 IP blocks. We run the formal regression. We triage findings with your DV lead. Your team owns every assertion by the end of the phase. No black boxes.

Handover · 2 to 4 weeks

Your engineers run the flow for two full sprints with us watching. We document every opinionated choice we made so the next person can understand why. We exit. Optional retainer for regression tuning if you prefer.

Timelines are honest ranges, not sales numbers. A 2-stage pipeline block can be done in three weeks. A full RISC-V core with custom extensions runs closer to five months. We say so up front and we do not squeeze to hit an artificial date.

Respin exposure calculator

Three inputs. Tells you the mask cost exposure, the expected schedule slip, and the revenue-at-risk on one silicon respin at your node. The numbers come from the 2024 Wilson Research Group / Siemens study, recent SemiAnalysis mask cost data, and typical 18-month product cycles. Use it in your next tape-out readiness review. The result recommends specific actions you can take without hiring us.

Process node

Annual revenue target for this product ($M)

Dedicated formal verification engineers

Questions DV leads and CTOs actually ask

These are real questions from fabless and RISC-V customers. Each answer adds depth not covered in the sections above.

Does any RTL or GDSII leave our network?

No. Every deployment architecture we ship runs on your hardware. Fine-tuned model weights live on your cluster. LoRA adapters with your IP-specific tuning live behind your firewall. vLLM inference runs on your GPUs. RAG indexes your spec documents from your own document store. Our engineers access the environment through your standard VPN and SSO with audit logging. For defense, aerospace, and SCIF customers we ship the entire stack on signed offline update bundles and do not require any outbound connection from the environment. The one exception is the initial base-model download, which is done on an unclassified system and then transferred in. If you need a stricter air gap than that, we have done it.

How do we know the LLM-generated assertions are not vacuous?

Vacuity is the failure mode we worry about most, and it is the reason every formal flow we ship runs a three-layer check. First, the formal engine's native vacuity check (JasperGold and VC Formal both have one; SymbiYosys needs a wrapper we provide). Second, a mutation-based sanity check where we inject a bug into the design and confirm the assertion fires. An assertion that passes vacuity but does not catch injected bugs is not buying you anything. Third, a COI (cone of influence) report showing exactly which signals each property reaches. If a property has an empty COI it is dead code and we delete it. These are the same metrics Siemens has been publishing about in Verification Horizons since 2017 and we treat them as table stakes.

We are an automotive customer targeting ISO 26262 ASIL D. Can we use this flow for sign-off?

Not directly for sign-off, and we will not pretend otherwise. ISO 26262 requires tool qualification (TCL2 or TCL3 depending on how you use the tool) with a documented qualification package. Synopsys, Cadence, and Siemens all ship qualified flows; a custom LLM-assisted tool is not on that list. What we do build for automotive customers is an AI-assist layer that runs alongside the qualified tool, not in place of it. The qualified tool still produces the sign-off evidence. Our layer accelerates assertion authoring, reviews properties for vacuity, and flags CDC paths for human inspection. The qualification chain on your signed-off tool is untouched. ASIL D customers should also plan on a documented independence review between the assist layer and the qualified verification, which we help you structure.

Why shouldn't we just buy ChipAgents or Normal Computing instead?

You might. Both are well-funded, technically credible, and have real customers. The reason teams come to us after evaluating them is usually one of three things. First, the cloud deployment model did not clear their security review (common). Second, they needed fine-tuning on a proprietary custom-extension ISA that the product team could not prioritize. Third, they wanted a custom integration into an existing Jenkins / regression / sign-off flow that the product team cannot support without a six-figure professional services engagement. If none of those apply to you, the product is probably the right answer and we will say so. If they do apply, we build the custom layer and leave you with a system your own engineers can maintain. On pilots, we recommend putting all three options on the same RTL for four weeks. The bake-off is cheap compared to a wrong bet.

What's your stance on the AlphaChip / Markov controversy for RL placement?

We think Igor Markov's critique was technically correct on the specific numbers. Google Circuit Training at 32 hours versus tuned simulated annealing at 12.5 hours and a Cadence commercial tool at 0.05 hours is not a story of RL winning placement for mainstream SoCs. That does not mean RL is useless for silicon. It means the 2020 framing was wrong. The places where we think RL placement earns its compute today are chiplet and 3D-IC floorplanning where the design space is genuinely new, thermal-aware analog layout where existing tools are weak, and transfer learning across closely related RISC-V IP families where an agent trained on your previous generation gives you a warm-start. We do not pitch RL placement against DSO.ai or Cerebrus on a monolithic digital SoC at 5nm. That is a fight we would lose and you would pay for.

How do you handle the fact that 70% of respins come from spec changes, not logic bugs?

Honestly, this is the hardest problem in verification and no AI tool solves it cleanly. What we do is treat the spec as a first-class input to the verification flow. The LLM watches the spec repo (Confluence, Google Docs, Git, whichever you use) and flags properties whose underlying assumption has changed. When a reviewer marks a section of the spec as revised, the dependent properties get re-run automatically and the delta report goes to the DV lead before the next regression closes. This does not eliminate spec drift. Nothing does. It makes the drift visible in hours instead of in silicon. The single biggest win we see on this is catching "spec changed two sprints ago and nobody re-ran the affected formal properties" before it propagates through the hierarchy.

We already own JasperGold. Should we replace it?

No. JasperGold is the best commercial formal engine and we use it when the customer already owns it. What we add is the LLM-assist layer on top of it (assertion generation, counter-example interpretation, vacuity sanity checks) and a CI integration that most teams have not taken the time to build cleanly. The return on your existing JasperGold investment goes up, not down. If you do not own JasperGold and cannot justify the base + per-seat pricing, we will typically recommend a hybrid of Questa Formal (cheaper per seat) for bulk regression and SymbiYosys (open-source) for automated property debug. We have shipped this stack to RISC-V IP startups where a JasperGold purchase was not an option.

How small a team can this work for?

We have built useful flows for a 6-person RISC-V IP startup and we have built for a 400-person AI accelerator company. The lower bound is the presence of at least one engineer who is comfortable reading SVA and interpreting a formal counter-example trace. If nobody on the team can read an SVA property, no LLM-assisted flow is going to close that gap, and you should hire or contract for that skill before engaging us or anyone else. Beyond that baseline, the engagement scales with how much RTL is in scope. A single bus-interface block is a six-week job. A full RISC-V core with custom extensions and an interconnect fabric is four to six months.

Your first-silicon success rate is 14%. The math on LLM-generated RTL is worse.