As synthetic intelligence reshapes software program improvement, a small startup is betting that the business's subsequent huge bottleneck gained't be writing code — it is going to be trusting it.
Theorem, a San Francisco-based firm that emerged from Y Combinator's Spring 2025 batch, introduced Tuesday it has raised $6 million in seed funding to construct automated instruments that confirm the correctness of AI-generated software program. Khosla Ventures led the spherical, with participation from Y Combinator, e14, SAIF, Halcyon, and angel traders together with Blake Borgesson, co-founder of Recursion Prescribed drugs, and Arthur Breitman, co-founder of blockchain platform Tezos.
The funding arrives at a pivotal second. AI coding assistants from corporations like GitHub, Amazon, and Google now generate billions of strains of code yearly. Enterprise adoption is accelerating. However the capacity to confirm that AI-written software program really works as meant has not saved tempo — creating what Theorem's founders describe as a widening "oversight hole" that threatens crucial infrastructure from monetary methods to energy grids.
"We're already there," stated Jason Gross, Theorem's co-founder, after we requested whether or not AI-generated code is outpacing human overview capability. "For those who requested me to overview 60,000 strains of code, I wouldn't know the best way to do it."
Why AI is writing code quicker than people can confirm it
Theorem's core know-how combines formal verification — a mathematical approach that proves software program behaves precisely as specified — with AI fashions educated to generate and test proofs robotically. The strategy transforms a course of that traditionally required years of PhD-level engineering into one thing the corporate claims will be accomplished in weeks and even days.
Formal verification has existed for many years however remained confined to essentially the most mission-critical functions: avionics methods, nuclear reactor controls, and cryptographic protocols. The approach's prohibitive price — typically requiring eight strains of mathematical proof for each single line of code — made it impractical for mainstream software program improvement.
Gross is aware of this firsthand. Earlier than founding Theorem, he earned his PhD at MIT engaged on verified cryptography code that now powers the HTTPS safety protocol defending trillions of web connections every day. That mission, by his estimate, consumed fifteen person-years of labor.
"No person prefers to have incorrect code," Gross stated. "Software program verification has simply not been economical earlier than. Proofs was written by PhD-level engineers. Now, AI writes all of it."
How formal verification catches the bugs that conventional testing misses
Theorem's system operates on a precept Gross calls "fractional proof decomposition." Slightly than exhaustively testing each doable habits — computationally infeasible for advanced software program — the know-how allocates verification assets proportionally to the significance of every code element.
The strategy lately recognized a bug that slipped previous testing at Anthropic, the AI security firm behind the Claude chatbot. Gross stated the approach helps builders "catch their bugs now with out expending quite a lot of compute."
In a current technical demonstration referred to as SFBench, Theorem used AI to translate 1,276 issues from Rocq (a proper proof assistant) to Lean (one other verification language), then robotically proved every translation equal to the unique. The corporate estimates a human workforce would have required roughly 2.7 person-years to finish the identical work.
"Everybody can run brokers in parallel, however we’re additionally in a position to run them sequentially," Gross defined, noting that Theorem's structure handles interdependent code — the place options construct on one another throughout dozens of information — that journeys up standard AI coding brokers restricted by context home windows.
How one firm turned a 1,500-page specification into 16,000 strains of trusted code
The startup is already working with prospects in AI analysis labs, digital design automation, and GPU-accelerated computing. One case examine illustrates the know-how's sensible worth.
A buyer got here to Theorem with a 1,500-page PDF specification and a legacy software program implementation affected by reminiscence leaks, crashes, and different elusive bugs. Their most pressing drawback: bettering efficiency from 10 megabits per second to 1 gigabit per second — a 100-fold improve — with out introducing further errors.
Theorem's system generated 16,000 strains of manufacturing code, which the client deployed with out ever manually reviewing it. The arrogance got here from a compact executable specification — a couple of hundred strains that generalized the huge PDF doc — paired with an equivalence-checking harness that verified the brand new implementation matched the meant habits.
"Now they’ve a production-grade parser working at 1 Gbps that they’ll deploy with the arrogance that no data is misplaced throughout parsing," Gross stated.
The safety dangers lurking in AI-generated software program for crucial infrastructure
The funding announcement arrives as policymakers and technologists more and more scrutinize the reliability of AI methods embedded in crucial infrastructure. Software program already controls monetary markets, medical units, transportation networks, and electrical grids. AI is accelerating how rapidly that software program evolves — and the way simply delicate bugs can propagate.
Gross frames the problem in safety phrases. As AI makes it cheaper to search out and exploit vulnerabilities, defenders want what he calls "uneven protection" — safety that scales with out proportional will increase in assets.
"Software program safety is a fragile offense-defense steadiness," he stated. "With AI hacking, the price of hacking a system is falling sharply. The one viable resolution is uneven protection. If we would like a software program safety resolution that may final for quite a lot of generations of mannequin enhancements, it is going to be through verification."
Requested whether or not regulators ought to mandate formal verification for AI-generated code in crucial methods, Gross provided a pointed response: "Now that formal verification is affordable sufficient, it could be thought of gross negligence to not use it for ensures about crucial methods."
What separates Theorem from different AI code verification startups
Theorem enters a market the place quite a few startups and analysis labs are exploring the intersection of AI and formal verification. The corporate's differentiation, Gross argues, lies in its singular give attention to scaling software program oversight quite than making use of verification to arithmetic or different domains.
"Our instruments are helpful for methods engineering groups, working near the metallic, who want correctness ensures earlier than merging modifications," he stated.
The founding workforce displays that technical orientation. Gross brings deep experience in programming language idea and a monitor document of deploying verified code into manufacturing at scale. Co-founder Rajashree Agrawal, a machine studying analysis engineer, focuses on coaching the AI fashions that energy the verification pipeline.
"We're engaged on formal program reasoning so that everybody can oversee not simply the work of a mean software-engineer-level AI, however actually harness the capabilities of a Linus Torvalds-level AI," Agrawal stated, referencing the legendary creator of Linux.
The race to confirm AI code earlier than it controls every part
Theorem plans to make use of the funding to increase its workforce, improve compute assets for coaching verification fashions, and push into new industries together with robotics, renewable power, cryptocurrency, and drug synthesis. The corporate presently employs 4 folks.
The startup's emergence alerts a shift in how enterprise know-how leaders might have to judge AI coding instruments. The primary wave of AI-assisted improvement promised productiveness positive aspects — extra code, quicker. Theorem is wagering that the subsequent wave will demand one thing completely different: mathematical proof that velocity doesn't come at the price of security.
Gross frames the stakes in stark phrases. AI methods are bettering exponentially. If that trajectory holds, he believes superhuman software program engineering is inevitable — able to designing methods extra advanced than something people have ever constructed.
"And with no radically completely different economics of oversight," he stated, "we are going to find yourself deploying methods we don't management."
The machines are writing the code. Now somebody has to test their work.
[/gpt3]

