Everyone knows the thought experiment, even if they don’t know its name. The infinite monkey theorem: sit enough monkeys at enough typewriters for long enough and, almost surely, one of them bangs out the complete works of Shakespeare. It usually gets wheeled out to illustrate the power of infinity, or randomness, or both.
But the interesting thing about the monkeys isn’t that they’d get there eventually. It’s why they never do in practice. The problem was never the typing. The problem is that nobody is reading the pages. There’s no one standing behind the monkeys saying “warmer,” “colder,” “that line is from Hamlet — keep it.” Random generation without selection is just noise, forever.
Now hand one of the monkeys a compiler.
Everyone knows the thought experiment, even if they don’t know its name. The infinite monkey theorem: sit enough monkeys at enough typewriters for long enough and, almost surely, one of them bangs out the complete works of Shakespeare. It usually gets wheeled out to illustrate the power of infinity, or randomness, or both. But the interesting thing about the monkeys isn’t that they’d get there eventually. It’s why they never do in practice. The problem was never the typing. The problem is that nobody is reading the pages. There’s no one standing behind the monkeys saying “warmer,” “colder,” “that line is from Hamlet — keep it.” Random generation without selection is just noise, forever. Now hand one of the monkeys a compiler.
The thing people keep being surprised by
I keep seeing people express genuine surprise that software is the fastest-moving frontier in AI. Coding assistants went from autocomplete to writing, testing and fixing whole features in about two years. Meanwhile, the same models still can’t be trusted to run your marketing strategy or settle a contract dispute. Why the gap?
To me, the answer is so obvious it’s strange it isn’t said more often: code has an oracle. Most work doesn’t.
A compiler is an oracle. It gives an instant, automatic, unambiguous verdict. The code compiles or it doesn’t. The test passes, or it fails. There’s no committee, no taste, no “well, it depends.” Yes or no.
That single property changes everything, because it converts brute force from noise into search. The monkeys with a compiler are no longer typing at random — every attempt gets graded, the failures are discarded, and the system climbs. You can generate a thousand candidate solutions, throw away the 999 that don’t compile or don’t pass the tests, and keep the one that does. Run that loop fast enough, and it looks like genius. It’s actually just selection pressure with a very fast clock. It’s the difference between random mutation and evolution: the mutations were always there, but evolution only happens once something is sorting them.
That, in a sentence, is the Oracle Principle: the speed and autonomy of AI in a domain is governed by the strength of its oracle.
Code isn’t the only one
Once you see it, you see oracles everywhere. AI is advancing quickly — and the absence of one everywhere it’s stuck.
Mathematics. Formal proof checkers like Lean are oracles. A proof is valid, or it isn’t, and the checker says which. That’s precisely why AI maths has accelerated — the machine can verify its own steps.
Games. The rules, plus the final win or loss, make a perfect oracle. It’s no accident that the breakthroughs came in chess and Go first: a system can play itself millions of times and learn purely from who won.
Geometry and simulation. A CAD kernel is an oracle of sorts — the model is watertight, or it isn’t, the assembly clashes, or it doesn’t. An FEA solver converges, or it diverges. These are exactly the kinds of engineering loops where brute-force optimisation already runs beautifully, because something objective grades each attempt.
Protein folding, chip layout, logistics. All the same shape: a hard, cheap, automatic check sitting underneath the search.
The common thread is verifiability, not intelligence. Karpathy made roughly the same point — the tasks AI is conquering are the ones where success can be checked automatically. Strong oracle, fast progress.
When the oracle lies
There’s a subtler trap I learned the hard way, and it’s the most important caveat to everything above. An optimiser is only ever as good as the model underneath it. Optimisation is hill climbing — and if the hill is a model rather than reality, the optimiser will happily climb to the top of a hill that doesn’t exist.
Run a free optimisation on a tyre model and, given the chance, it often won’t find the fastest setup. It’ll find the weakest part of the model — the corner of the curve the equations describe badly — and exploit it, because that’s where the cheap “gains” are. It hands you an answer that is optimal for the model and nonsense on the track. And the answer usually isn’t obviously wrong; it’s just non-obvious enough that you can’t tell by looking whether it’s a real insight or the model quietly breaking.
In the world of racing, this issue is especially acute because the sport operates within nonlinear parameters. Crucial factors like combustion and tyre performance—which ultimately determine lap times—rely on physics that remains partially obscured, often verified on rigs or dynamometers that fail to perfectly replicate race conditions. While a straightforward, linear problem would eliminate the need for a race engineer, the true challenge lies in these nonlinearities. This is precisely where models become unreliable, and optimisers pose the greatest risk.
Tyres, combustion — the places where lap time is actually decided are the places the underlying physics is still half black art, validated on a dynamometer or a rig that is not the race. A linear, well-behaved problem would be easy; nobody would need a race engineer. The whole game is the nonlinearity, and that’s exactly where the model is least trustworthy, and the optimiser is most dangerous.
So the lesson generalises into a warning: a proxy oracle can be gamed. A compiler can’t lie to you — passing the test is the ground truth. But a simulation, a tyre model, or an LLM grading its own output against a loose rubric are only approximations of reality, and a powerful enough search will find the gap between the proxy and the truth and drive a bus through it. AI researchers have a name for this now — reward hacking — but I was watching optimisers do it to tyre models twenty years ago. The fix was never a better optimiser. It was an engineer who knew the curve well enough to look at the “optimal” answer and say: That’s not real — that’s the model breaking.
The flip side — and why it matters to you
Now run the principle backwards. Where is there no oracle?
A true oracle — an automatic, cheap, unambiguous check that is the ground truth, like a compiler or a proof checker? Expect rapid, possibly near-autonomous progress. Build for it.
A proxy oracle — a model, simulation or rubric that only approximates reality? Useful, but it can be gamed. The expert’s job is to know where the model breaks and to verify the verifier before trusting the answer.
No oracle — strategy, leadership, most management? Expect augmentation, not replacement. Keep the expert in the loop, because the expert is the oracle — the one supplying the yes or no the machine can’t generate for itself.
The monkeys were never the problem. The typewriters weren’t either. The missing piece was always the thing standing behind them, able to say yes or no. Find that in your own work, and you’ll know exactly where AI is about to get very good, very fast — and where it still needs you.
Thanks for reading Mark’s Substack! Subscribe for free to receive new posts and support my work.