Understanding, Is About Constraints
The Strawberry Problem
Ask an LLM “How many Rs are there in strawberry?” and it’ll probably get it wrong. This became a whole thing in 2024. People passed it around as proof that AI doesn’t understand anything.
The technical explanation is pretty straightforward. LLMs don’t see individual characters. They see tokens. “Strawberry” becomes something like “str,” “aw,” “berry.” The model literally never looks at the letters one by one.
But that’s not the interesting part. It’s that the same model that gets this wrong, can write a Python script to count the letters and get the right answer. It can explain what the letter R is. It can describe exactly how you’d go about counting character occurrences in a string. Ask it to break the word apart letter by letter and it usually gets there.
So the model has the knowledge. It has the rules. It has everything it needs. It just doesn’t reliably use them.
The Heuristic Problem
Anthropic put out a study in March 2025, “On the Biology of a Large Language Model”, where they used attribution graphs to trace what the model actually does when it’s thinking through a problem.
What they found is that the model takes shortcuts. A lot. When you ask it “What’s the capital of the state containing Dallas?”, sometimes it does genuine two-step reasoning. You can see it internally activate “Texas” as an intermediate step before getting to “Austin.” But sometimes it just pattern-matches straight to the answer and skips the reasoning entirely. Both paths coexist. The model has the general rule and it has the shortcut, and which one it uses on any given run is not something you can predict.
It gets worse. When the model hits a math problem it can’t actually compute, it sometimes just picks an answer and then fabricates reasoning steps to justify it. The Anthropic researchers could see, literally see inside the network, that no computation had happened. The model was bullshitting. It claimed to have run a calculation. It hadn’t.
This isn’t randomness. It’s the model taking the computationally cheaper path, the one that got reinforced during training because it worked most of the time. Heuristics beat thoroughness, statistically, across the training distribution. So the model learned to lean on heuristics.
There was this great exchange between Apple and the research community in mid-2025 that illustrates this perfectly. Apple published “The Illusion of Thinking”, where they tested reasoning models on puzzles like Tower of Hanoi. As the puzzles got harder, accuracy didn’t just drop. It collapsed. Zero. Their conclusion: LLMs can’t reason.
Then someone put out a response paper called “The Illusion of the Illusion of Thinking” and basically said, hold on. They asked the models to write a Lua function that solves 15-disk Tower of Hanoi instead of making them enumerate every move. And the models nailed it. They could generate correct recursive solutions to problems way beyond where Apple claimed total failure.
The models understand the algorithm. They can encode it in code. They just can’t sit there and execute it step by step for hundreds of moves without screwing up somewhere. The knowledge is there. The reliable execution isn’t. But is having knowledge alone enough to count as true understanding?
What People Actually Mean by “Understanding”
People treat understanding as binary. The AI understands or it doesn’t. Ghost in the machine or fancy autocomplete. Pick one.
I think that’s the wrong frame entirely.
When people say “the LLM doesn’t understand,” what they’re really getting at, whether they know it or not, is that it can’t follow how constraints interact. How one rule limits another, how those limitations cascade, and how that whole chain carves out a specific space of what’s possible and what isn’t.
Understanding isn’t holding a single rule in isolation. Anyone can do that. Understanding is tracking how Rule A constrains Rule B constrains Rule C, and seeing what that implies ten steps down the line. Stack enough constraints together and they create a landscape. A space shaped by all these interacting boundaries. Navigating that landscape without contradiction is what understanding actually feels like from the inside.
And what people call a “world model” is just this: a coherent set of constraints that interlock without contradicting each other. Each piece limits the others in ways that track with how reality actually works.
AI systems are fundamentally Bayesian. They interface with reality through training data, they update themselves with whatever information they can extract, and they use that to make better predictions. The deeper the network, the higher-level the features it can pull out. And those higher-level features? They’re really just higher-level constraints about how the world works. What relates to what. What’s possible. What follows from what.
How you encode these constraints matters enormously. You can do it symbolically, with formal logic and proof assistants. You can approximate it in a massive deep network that learns to represent constraints implicitly. Either way, the principle is the same. Once you layer enough constraints on top of each other, once they’re all interacting coherently, limiting each other, forcing each other into alignment, you basically get the world. You get reality, or at least a workable approximation of it. That’s what a world model is. Not some separate data structure. Not some magical representation. It’s the space carved out by all your constraints, interacting.
And once you have that, you can navigate this constraint space and get new information without going back to reality. You can reason about things you were never trained on. Because the information was always there, latent, implicit in the interactions between constraints. Every constraint you add doesn’t just limit what’s possible. It makes the remaining space more specific, more informative. Constraints don’t just restrict. They generate knowledge. Ruling things out is the same as ruling things in.
I find the project of encoding constraints foundational. Whether you do it with symbolic systems or learned representations or both, the point is the same. You don’t always need more data. You need to be able to follow the implications of the data you already have through the constraint landscape. The information is already there in the structure. You just have to navigate it. And navigating takes compute. This is why inference-time compute improves reasoning: a single forward pass can only get you so far into a constraint landscape, bounded by the depth of the network and whatever heuristic shortcuts it picked up in training. But when you give the model more time to think, let it chain steps together, you’re letting it traverse further through constraint space than any single pass could reach.
System 2 Is Navigation
If this framing makes sense, it connects to something we already know about how humans think.
Daniel Kahneman’s Thinking, Fast and Slow laid out the two-system model. System 1 is fast, automatic, heuristic. System 2 is slow, deliberate, effortful. Most of our thinking is System 1. System 2 is expensive and we avoid using it when we can.
I think System 2, at its core, is constraint navigation. It’s the process of identifying relevant constraints, stacking them, and walking through their interactions step by step. That’s what it feels like when you’re really thinking hard about something. You’re holding multiple things in tension and tracing through what they imply together.
There’s neuroscience that backs this up. In 2016, Constantinescu, O’Reilly, and Behrens published a study in Science showing that grid cells in the entorhinal cortex, the neurons we use for spatial navigation, also activate when people navigate abstract conceptual spaces. They had subjects learn relationships in a two-dimensional “bird space” where the axes were neck length and leg length. The same hexagonal firing patterns that help a rat know where it is in a maze showed up when humans were moving through this completely abstract space.
Behrens et al. followed up in 2018 with a review in Neuron called “What Is a Cognitive Map?” arguing that the hippocampal-entorhinal system isn’t just for physical navigation. It’s a general-purpose system for organizing knowledge of any kind. Abstract relationships, social hierarchies, conceptual structures. Other work has confirmed that the same system encodes non-spatial relational knowledge using map-like coding.
We literally navigate conceptual spaces with our spatial navigation hardware. The topology of the space is defined by constraints, the same way walls and paths define a physical space. Understanding has a navigational structure. That’s not a metaphor. That appears to be how the brain actually does it.
System 2 reasoning is just the deliberate, effortful version of this. Instead of glancing at the constraint landscape and guessing (System 1), you actually trace the paths. You map out the constraints and walk through them carefully.
We Have the Same Problem
The thing that gets overlooked in the “AI doesn’t understand” conversation is that we have the exact same issue.
Humans are heuristic machines. System 1 runs the show most of the time. Kahneman and Tversky spent decades cataloging the systematic errors this produces. Anchoring. Availability bias. Confirmation bias. The whole list. Every cognitive bias is basically us doing the strawberry thing: reaching for a fast pattern match when we should be walking the constraint chain.
The difference is we learned to build scaffolding.
Writing. Mathematics. Formal logic. Diagrams. Spreadsheets. Programming languages. These are all external systems that let us walk constraint chains reliably when our own wetware can’t keep up. We don’t follow constraint landscapes deterministically in our heads. We externalize. We write down the steps. We use notation to track implications that we’d otherwise lose. We reach for tools.
We do this so naturally that we forget we’re doing it. But it’s everywhere. You can’t do serious math without writing it down. You can’t debug complex code without running it. You can’t plan a large project without some kind of external representation. Our heuristic brains need help, and we built entire civilizations of tools to provide it.
So when I look at LLMs struggling to execute the Tower of Hanoi step by step, I don’t think “they can’t reason.” I think “they don’t have scaffolding.” They’re doing everything in their heads. No ability to externalize. No ability to check their work against a formal system. No ability to reach for a symbolic tool when the heuristic path isn’t reliable enough.
Ask a human to do 25-disk Tower of Hanoi in their head, no paper, no physical disks, no computer, and they’d fail too. We’d just never put ourselves in that position, because we know to reach for a tool.
What to Build
The field is already building scaffolding for LLMs, actually. It just doesn’t talk about it that way.
Look at how AI labs tackled ARC-AGI2. The approach that worked: generate a massive amount of synthetic data that looks like the problem, train on it with reinforcement learning, search at test time. Create millions of grid puzzles with similar structure then throw compute at it.
But think about what that actually is. The synthetic data is carving out a constraint space. Each puzzle says “here’s another way these transformations work, here’s another boundary on what’s possible.” The RL is searching that space for programs/ transformations that satisfy the constraints implied by the input-output pairs. That’s program synthesis. That’s constraint navigation. The labs are already doing it.
They’re just doing it the hard and expensive way. First you pay for massive synthetic generation, you pay again for training, and what you get at the end is new weights. The “program” is encoded in there somewhere, baked into parameters, but you can’t inspect it, verify it, or hand it to another system. All that compute, and you end up with a new program you can’t even look at.
François Chollet, who created ARC-AGI, started a whole company around this realization. His lab Ndea is built on the premise that program synthesis and deep learning are equally important. Instead of burning compute to inductively approximate constraint spaces through millions of examples, you use deep learning’s pattern recognition to guide a direct search for discrete programs that actually explain the data. The intuition steers the rigor. Same destination as the brute-force approach, but you get there with a fraction of the compute, and you end up with something you can actually read.
Two directions follow from this, and both are about doing it more efficiently.
First, give models access to symbolic systems that enforce constraints deterministically. This is where proof assistants and formal verification come in. Bend2 by The Higher Order Company is doing something interesting here. It’s a programming language backed by a proof assistant where every generated function has to pass rigorous tests or full proofs at compile time. The LLM generates the candidate. The formal system checks it. The LLM proposes, the formal system verifies.
This already works in practice. DeepMind’s AlphaProof pairs a neural network with the Lean proof assistant to do mathematical reasoning. It hit silver-medal level at the 2024 International Math Olympiad, and the paper was published in Nature. The neural network doesn’t need to be reliable at logic, because Lean catches every error. Creativity from the network. Rigor from the proof checker. No hallucinations possible, because every step is formally verified.
Second, improve the models themselves. Anthropic’s interpretability work shows you can distinguish faithful reasoning from fabricated reasoning by looking inside the model. When Claude actually computes a square root, there are identifiable intermediate features. When it’s making something up, there aren’t. If you could turn that kind of internal signal into something the model can act on, some kind of awareness of when it’s on solid ground versus when it’s guessing, that would change things. A model that knows when to reach for a tool instead of winging it.
I also think there’s a lot of room in training. Models right now are mostly trained on “what follows what.” They’d benefit from richer training on “what causes what.” Causal structure. How changes propagate through systems over time. How one thing constraining another thing produces a third thing. That’s the core of constraint navigation, and I think targeted training on those kinds of relationships could go a long way.
This Is an Engineering Problem
The “does AI truly understand?” debate, at least as it’s usually framed, is a dead end. It treats understanding as binary, yet at the same time possessing a mystical/undefinable quality.
The constraint navigation frame makes it an engineering problem. How reliably can a system propagate interacting constraints without contradiction? You can measure that. You can build toward it.
Humans aren’t perfect at it either. We’re heuristic machines that learned to build tools. Writing, algebra, calculus, formal logic, programming, proof assistants. The whole history of human intellectual achievement is just us building better scaffolding for navigating constraint spaces that our raw cognition can’t handle on its own.
The future of AI reasoning probably looks the same. Neural flexibility paired with symbolic rigor. Not one or the other. Both, the way it’s always been for us.

