How To Verify AI Compute You Do Not Control

When you send a prompt to an AI model running in a data center, you are trusting a machine you do not own to do three things honestly. Run the model you asked for, and not a cheaper one. Run it correctly, and not halfway. Return the real output, and not a plausible guess. You have no way to check any of it. The answer looks the same whether the provider ran a frontier model on your full input or a tiny substitute on half of it. For most of the history of cloud computing this did not matter, because the buyer and the provider trusted each other and reputation filled the gap. It matters enormously the moment the computation is worth money and the provider is a stranger.

This is not a new problem. It is one of the oldest problems in distributed systems wearing new clothes. In 2015, four researchers at the National University of Singapore (Loi Luu, Jason Teutsch, Raghav Kulkarni, and Prateek Saxena) published the first academic paper about Ethereum and named the core of it the verifier's dilemma. A blockchain asks every node to re-execute every transaction to check it. When a transaction is cheap to verify, everyone checks and the system is secure. When a transaction is expensive to verify, rational nodes skip the check to stay competitive, and the network can no longer guarantee the result. Verification that costs as much as the original work is verification nobody does. That same dilemma now sits under every attempt to build a decentralized compute network. The promise of running proof of useful work in production collapses if the network cannot cheaply confirm the useful work was actually done.

Three families of answers have emerged. They are not variations on one idea. They are three different things to trust.

Run It Again

The oldest answer is the simplest. To check a computation, do it again and compare. The obvious problem is that re-running the whole job costs as much as the job. The fix came from blockchains. In a 2017 whitepaper, Jason Teutsch and Christian Reitwiessner described Truebit, a system built on a verification game. Instead of re-running an entire computation, two parties who disagree about the result play a game that narrows the disagreement down to a single step, and a referee checks only that one step. The technique is called refereed delegation, and it is the same machinery underneath optimistic rollups like Arbitrum and Optimism, where the blockchain itself is the referee. A dispute over an entire block of transactions resolves by re-running one instruction.

Machine learning broke this in two places. A neural network is not a tidy sequence of instructions, it is a giant graph of operations, so finding the single disputed step is hard. And the same model run on two different graphics cards does not produce identical numbers, because floating-point arithmetic is not associative and different hardware reorders it. Two honest providers would disagree by default, and the referee could not tell honest divergence from fraud.

Gensyn, one of the networks now testing this idea in production, published its answer in February 2025. Its system, Verde, pairs a two-level bisection game with a library called RepOps, for Reproducible Operators, that forces the floating-point operations into a fixed order so every honest machine produces bitwise-identical output whether it is an A100, an H100, or a laptop. With reproducibility guaranteed, the bisection game binary-searches a compute graph down to the first operation where two providers disagree, and the referee re-runs only that one operation. The total overhead is roughly double the original work, against the four-orders-of-magnitude cost of cryptographic proofs. The catch is the trust assumption. Refereed delegation is secure only if at least one verifier is honest, and a purely optimistic version, where fraud is assumed absent until someone challenges it, leaves a small chance of fraud slipping through if every participant is merely rational. Hyperbolic's Proof of Sampling protocol, published in May 2024, tightens this with game theory, challenging work at random with a probability tuned so that honesty becomes the only rational strategy, at an overhead under one percent.

Prove It

The second answer refuses to trust anyone at all. Instead of re-running the work, the provider attaches a mathematical proof that the work was done correctly, and the proof is cheap to check even though the computation was expensive to perform. This is verifiable computation, an idea that runs back through decades of theory on interactive proofs, and the cryptographic tool that made it practical is the zero-knowledge proof, the same primitive that secures Zcash. Applied to machine learning it has a name: zkML.

The appeal is total. A valid proof guarantees the stated model produced the stated output, with no assumption about honest majorities or rational actors, and it can do this without revealing the model weights or the input data. The provider proves it ran the model without showing you the model. The field has moved fast. The first demonstrations arrived with SafetyNets in 2017. By 2022, Daniel Kang and collaborators showed trustless deep-network inference with zero-knowledge proofs at meaningful scale. EZKL, which compiles a standard ONNX model into a provable circuit, can prove a small network in seconds and was audited by Trail of Bits in 2025. Modulus Labs benchmarked the proving systems in a report titled The Cost of Intelligence and put a verified chess engine on-chain. Giza and Risc Zero built competing compilers, and in 2024 researchers demonstrated zkLLM, zero-knowledge proofs for large language models.

The wall is cost. Generating a proof of a computation still costs orders of magnitude more than the computation itself, which is the verifier's dilemma again, except now the burden lands on the prover instead of the verifier. For a small model the proof takes seconds. For a frontier language model it is not yet practical. The entire engineering race in zkML is the race to drive that multiple down, and it has been falling every year. The day it becomes cheap, trust becomes unnecessary.

Trust The Box

The third answer is the most pragmatic and the most uncomfortable. Do not prove the computation and do not re-run it. Run it inside a piece of hardware that physically cannot be tampered with, and have the hardware sign a statement about what it ran. This is a trusted execution environment, and in 2024 it arrived for AI in force.

In April 2024, NVIDIA made confidential computing generally available on the H100, the first graphics card with a trusted execution environment built in. An H100 in confidential mode boots only NVIDIA-signed firmware, encrypts the data moving between processor and card, and produces a signed attestation report, a cryptographic statement of exactly what hardware and software ran the job that anyone can check against NVIDIA's records before sending a single byte of data. The performance cost is under seven percent for large language model inference, and it shrinks toward zero as the model grows. Two months later, Apple shipped the same idea at consumer scale. Apple's Private Cloud Compute, announced in June 2024, runs its larger models on Apple-silicon servers whose every software image is published to a public, tamper-evident log, and an iPhone will refuse to send data to any server that cannot cryptographically prove it is running the publicly inspected code.

The trade is explicit. A trusted execution environment is cheap, fast, available today, and verifiable through attestation. But the thing you are trusting is the silicon vendor. You are trusting that NVIDIA, or Intel, or AMD, or Apple built the root of trust honestly and that nobody has broken it. That is a far smaller and better-audited trusted party than a random compute provider, but it is a trusted party, which is the exact thing the first two answers were built to remove.

The three families line up on a single axis. Re-execution trusts that rational actors will not collude. Cryptographic proof trusts only mathematics. Trusted hardware trusts the company that made the chip. Cost runs the other direction. Hardware attestation is nearly free, re-execution roughly doubles the work, and proof multiplies it by thousands, for now. There is no winner yet, and there may never be a single one, because a network training a model, a phone answering a question, and a contract settling a bet have different tolerances for cost, latency, and trust.

What is settled is that the question is no longer theoretical. The verifier's dilemma was named for money in 2015, and the answer to it for money was the same answer cryptographers spent twenty-seven years building toward, which was to remove the trusted party entirely. The same removal is now under way for computation itself. A decentralized network of machines that have never met, training and running the models the next decade will depend on, is only possible if one of these three answers holds at scale. Deciding which one is the question this publication exists to answer.