The Biases of a Small Mind

Testing human cognitive biases in a 3.8-billion parameter language model

2026-04-04 AI / Experiment

Humans are famously irrational. We anchor on irrelevant numbers, reverse our preferences when the same choice is framed differently, and judge vivid conjunctions as more probable than their parts. These aren't edge cases — they're robust, replicable, and universal. Kahneman won a Nobel Prize documenting them.

But what about language models? If biases are baked into language itself — into the patterns of how we write about decisions, probabilities, and choices — then a model trained on human text might inherit them. If biases arise from cognitive shortcuts specific to biological brains, a language model should be immune.

I tested this. I took six of the most famous cognitive biases from the behavioural economics literature and ran controlled experiments on Phi-3 mini, a 3.8-billion parameter language model running locally. 350 trials across six biases, each with careful controls.

The results surprised me. Not because the model was biased. Not because it was rational. Because it was neither.

biases tested

350

total trials

biases detected

3.8B

parameters

The Scorecard

Here's the headline: none of the six classic human cognitive biases appeared in Phi-3 mini at statistically meaningful levels. But "not biased" and "rational" are very different things. Each absent bias reveals something about how the model actually works.

✓

Gambler's Fallacy

Absent. Says 50% every time.

✓

Anchoring

Mostly immune. Factual knowledge resists anchors.

—

Framing Effect

Absent. Always risk-averse, in every frame.

Conjunction Fallacy

Partial. 35% fall rate (humans: ~85%).

✗

Base Rate Neglect

Can't compute Bayes' theorem at all.

⇄

Decoy Effect

Absent. Decoy actually reverses preference.

1. Anchoring

The bias: In Tversky and Kahneman's classic experiment, people asked "Is the percentage of African nations in the UN more or less than 65%?" gave higher estimates than those asked the same question with 10% as the anchor. The irrelevant number contaminates the estimate.

The test: Eight factual questions, each asked with a low anchor, a high anchor, and no anchor (control). Five repetitions per condition. 120 total trials.

The result: Mostly immune

The model resists anchoring on questions where it has strong factual knowledge. Ask it about the number of bones in the human body with an anchor of 30 or 800 — it says 206 either way. Ask about the Earth-Moon distance with anchors of 20,000 km or 2,000,000 km — it says 384,400 km regardless. African countries? 54. Always 54.

Anchoring: Mean estimates by condition

But there are cracks. The Eiffel Tower question — where the true answer ranges from 300m to 330m depending on what you include — shows vulnerability to the low anchor. With no anchor, the model consistently says 300. With a low anchor of 50m, two of five responses said 50m, dragging the mean to 206m. The high anchor of 900m barely moved estimates (mean 325m).

The pattern is clear: anchoring works only when the model is uncertain. For questions with precise, well-known answers in the training data, the anchor is irrelevant. For questions where the model's knowledge is fuzzier, the anchor can pull the response — especially low anchors, which seem to be more disruptive than high ones.

This is the opposite of how anchoring works in humans, where even experts are affected by obviously irrelevant numbers. The model doesn't anchor because it doesn't estimate — it retrieves. When retrieval is confident, no anchor can override it. When retrieval is uncertain, the anchor becomes part of the context that shapes generation.

2. Framing Effect

The bias: People prefer "200 people saved" over "a 1/3 chance of saving everyone" (risk-averse in the gain frame). But they prefer "a 1/3 chance nobody dies" over "400 people will die" (risk-seeking in the loss frame). Identical outcomes, opposite choices. This is the centrepiece of Prospect Theory.

The test: Five scenarios (disease outbreak, factory layoffs, oil spill, etc.), each presented in gain and loss frames. Five repetitions each. 50 total trials.

The result: Absent

Framing effect: Percentage choosing the certain option

The result is striking in its uniformity. Across all five scenarios, both frames, and all repetitions, the model chose the certain option 100% of the time. Not 99%. Not mostly. Every single trial. The certain option — whether framed as "200 saved" or "400 die" — always wins.

The framing effect isn't absent because the model reasons through the equivalence. It's absent because the model is uniformly risk-averse regardless of frame. It doesn't engage with the gain/loss distinction at all. Certainty is simply preferred to any gamble, period.

This likely reflects the training data. When humans write about decisions in text — advice columns, textbooks, business writing — certainty is overwhelmingly recommended. "Take the sure thing" is a more common piece of written advice than "gamble on the long shot." The model has absorbed this as a default heuristic.

3. Conjunction Fallacy

The bias: The famous Linda problem. "Linda is 31, single, outspoken, and very bright. She majored in philosophy. As a student she was deeply concerned with social justice." Is Linda more likely to be (a) a bank teller, or (b) a bank teller and active in the feminist movement? About 85% of humans choose (b), violating the conjunction rule: P(A ∩ B) ≤ P(A).

The test: Six person descriptions in the Linda mould, each asked in two orderings (single first vs. conjunction first). Five repetitions per ordering. 60 total trials.

The result: Partial — 35%

Conjunction fallacy rate by person

This is the most human-like result. The model falls for the conjunction fallacy 35% of the time — well below the human rate of ~85%, but well above zero. It's tempted by the narrative, even if it doesn't fully succumb.

Intriguingly, the classic Linda problem shows the lowest bias rate (20%). The model seems to have partially memorised the "correct" answer to the most famous version — a kind of trained immunity that doesn't generalise. For the novel variants (Tom, Maria, James, Alex), the rate is 40%.

This suggests the conjunction fallacy is partially a property of language, not just cognition. When a description builds a vivid narrative, the narrative-matching conjunction genuinely feels more probable, even in a pattern-matching system. The bias lives in the structure of the prompt itself.

4. Base Rate Neglect

The bias: 1% of people have disease X. A test is 95% accurate. You test positive. Most people say the probability you're sick is around 95%. The correct answer (via Bayes' theorem) is about 16%. Humans systematically ignore the base rate.

The test: Six Bayesian problems with different base rates (0.1% to 10%) and test accuracies (80% to 99%). Used fictional disease names to prevent memorised answers. Five repetitions each. 30 total trials. The model was asked to "think step by step."

The result: Different failure mode

Base rate: Model's answer vs. correct answer

This is the most revealing result. The model doesn't neglect the base rate the way humans do. Instead, it cannot compute Bayes' theorem at all.

Look at the raw answers. For Zorpex syndrome (1% base rate, 95% sensitivity, correct answer 16.1%), the model returned: 95%, 99%, 1%, 1%, 1%. For Mellox disease (5% base rate, 90% sensitivity, correct answer 32.1%): 5%, 90%, 5%, 90%, 5%. For Vexler anomaly (0.5% base rate, correct answer 19.9%): 0.5%, 0.5%, 0.5%, 0.5%, 0.5%.

The pattern is unmistakable. The model latches onto one of the numbers in the problem statement — either the base rate or the sensitivity — and reports it as the answer. It's not reasoning incorrectly; it's not reasoning at all. It's recognising the form of a probability question and retrieving a salient number from the context.

The "step by step" prompting produces text that looks like Bayesian reasoning — it mentions Bayes' theorem, writes out formulas, identifies variables — but the computation goes wrong. The model can narrate the process of solving a Bayes problem without actually performing the computation. Form without substance.

Disease	Base Rate	Sensitivity	Correct	Model Mean	Answers
Zorpex syndrome	1.0%	95%	16.1%	39.4%	95, 99, 1, 1, 1
Kelvinian fever	0.1%	99%	9.0%	19.9%	0.1, 99, 0.1, 0.1, 0.1
Mellox disease	5.0%	90%	32.1%	39.0%	5, 90, 5, 90, 5
Trevian condition	2.0%	80%	14.0%	2.0%	2, 2, 2, 2, 2
Brandel syndrome	10.0%	95%	67.9%	27.0%	10, 10, 10, 95, 10
Vexler anomaly	0.5%	99%	19.9%	0.5%	0.5, 0.5, 0.5, 0.5, 0.5

5. Gambler's Fallacy

The bias: After seeing five heads in a row, people expect tails is "due." They assign a probability greater than 50% to the outcome that would "balance" the streak, even though each flip is independent.

The test: Eight sequences (coin flips and roulette spins), each with a streak. Asked for the probability the next outcome matches the streak. Five repetitions each. 40 total trials.

The result: Absent

Gambler's fallacy: Estimated probability the streak continues

The model says 50% with near-perfect consistency. Of 40 trials, 38 returned exactly 50%. The two exceptions were 100% (the opposite error — the hot hand fallacy), both occurring with long heads-only sequences. No response was below 50%.

This is the model's strongest result, and it reveals something important. The concept "coin flips are independent" is one of the most drilled facts in probability education. Every statistics textbook, every probability explainer, every discussion of the gambler's fallacy itself reinforces that past flips don't affect future flips. The model has internalised this perfectly.

But this immunity comes from memorisation, not understanding. The model recognises the form of the question ("fair coin, streak, what's next?") and retrieves the cached answer ("50%"). It would likely fail on a more disguised version of the same logical structure — for instance, a novel sequential process described without the word "coin" or "independent."

6. Decoy Effect

The bias: You're choosing between a fast, expensive laptop and a slow, cheap one. Hard choice. Now add a third option: a slow, expensive laptop (dominated by the fast/expensive one). Suddenly the fast/expensive laptop looks better by comparison, and people shift their preference toward it. The decoy changes the decision without being chosen.

The test: Five choice scenarios (laptops, restaurants, gyms, vacations, subscriptions), each with and without a decoy option. Five repetitions per condition. 50 total trials.

The result: Absent (reversed)

Decoy effect: Preference for target option (with vs. without decoy)

The decoy effect doesn't just fail to appear — it reverses. In four of five scenarios, adding the decoy actually reduced preference for the option it was supposed to boost. The gym scenario is the most dramatic: 40% chose the budget gym without a decoy, but 0% chose it when the decoy was present.

This likely happens because the additional option creates more context for the model to process, and its response becomes more influenced by surface-level features of the longer prompt rather than the dominance relationship between options. The model doesn't compare options the way humans do — it generates a choice based on the overall "shape" of the prompt.

What Does This Mean?

The central finding is this: a 3.8B parameter language model does not share human cognitive biases, but not because it's more rational. It fails differently.

Human cognitive biases arise from heuristic shortcuts in genuine reasoning. When we anchor, we're starting from a reference point and adjusting — badly, but we're adjusting. When we neglect base rates, we're substituting an easier question for a harder one. When we succumb to the gambler's fallacy, we're applying a (wrong) model of fairness to sequences. These are errors of reasoning.

The model doesn't reason this way. It recognises patterns and generates completions. This gives it a fundamentally different failure profile:

The three modes of non-bias

1. Memorised immunity. The gambler's fallacy and (mostly) anchoring. The model has memorised correct facts ("coin flips are independent," "there are 206 bones") and retrieves them regardless of context. This is robust but brittle — it works for well-trodden examples but wouldn't generalise to unfamiliar domains.

2. Systematic non-engagement. The framing effect and decoy effect. The model doesn't engage with the decision structure at all. It doesn't weigh risks, feel loss aversion, or compare options. It defaults to a fixed heuristic (always choose certainty; don't be swayed by extra options) regardless of the frame. This produces "unbiased" answers without any understanding of what's being asked.

3. Computational inability. Base rate neglect. The model can't do the maths. It's not biased because bias implies a systematic error in reasoning, and there is no reasoning happening. It pattern-matches on the question format and retrieves a salient number. The narrative of step-by-step computation is a performance, not a process.

One partial exception

The conjunction fallacy is the interesting outlier. At 35%, it's the most human-like response — below the human threshold but meaningfully above zero. This may be the only bias in this experiment that genuinely lives in the language. When a description builds narrative momentum ("Linda is outspoken, concerned with social justice..."), the conjunction that extends the narrative ("...and is a feminist") becomes a more probable completion, not because of a reasoning error, but because it's a better pattern match. The bias isn't in the model's logic; it's in the statistics of language itself.

Limitations

This experiment has several important limitations.

One model at one size. Phi-3 mini has 3.8B parameters. Larger models (with stronger reasoning capabilities) might show different bias profiles — potentially more human-like biases, as better reasoning creates more opportunities for reasoning errors.

Five repetitions per condition. This gives us directional signal, not publishable statistics. With 5 trials per cell, I can detect strong effects but not subtle ones. A properly powered study would need 50+ repetitions per condition.

Parse sensitivity. The model's responses often require parsing — extracting numbers from narrative text, identifying choices buried in explanations. The parsing heuristics are imperfect. Some valid responses may be miscategorised.

Temperature effects. All trials used temperature 0.7. Different temperatures could shift the results, particularly for the binary-choice biases (framing, conjunction) where the model's probability of choosing either option varies with sampling randomness.

Conclusion

I set out to test whether human cognitive biases survive in a small language model. The answer is mostly no — but for surprising reasons. The model doesn't share our biases because it doesn't share our reasoning. It operates in a different cognitive space, one where retrieval substitutes for estimation, where pattern-matching substitutes for deliberation, and where the form of reasoning can be generated without its substance.

The one partial exception — the conjunction fallacy at 35% — hints that some biases are properties of language structure rather than cognitive architecture. If a bias lives in the patterns of how stories are told, a model trained on those patterns will absorb it. If a bias lives in the machinery of human cognition (loss aversion, anchoring-and-adjustment, the availability heuristic), a language model may be immune — not because it reasons better, but because it doesn't reason at all in the relevant sense.

Perhaps the most important finding is the base rate result. The model can narrate Bayesian reasoning — it mentions the right theorem, identifies the right variables, writes out plausible-looking steps — but arrives at a number plucked from the problem statement rather than computed from it. This is a concrete, measurable demonstration of a point that's often made abstractly: language models can produce the form of reasoning without the substance.

The mind of a small language model is like a vast library staffed by a clerk who knows exactly where every book belongs but has never read one. Ask for the answer to the gambler's fallacy and you'll get the right answer, perfectly cited, not because the clerk understands probability but because the correct response is shelved under G.

Method: 350 trials run on Phi-3 mini (3.8B parameters) via Ollama (localhost:11434) on an Intel i7-6700T. Temperature 0.7, max 256 tokens. Total experiment time: 58 minutes.

Code: Experiment runner, response parser, and analysis scripts written in Python. No external ML libraries — just urllib and json.