Why does AI lie?
A practical look at hallucinations, missing context, and why reliable AI products need retrieval, provenance, and restraint.
To understand why, we first have to understand how.
Artificial intelligence, the pinnacle of modern society. Human efforts to make machines think have existed for centuries. In 1770, a Hungarian inventor named Wolfgang von Kempelen unveiled a chess-playing robot that toured European provinces and defeated some of the sharpest minds of the era, Napoleon Bonaparte among them. The machine was housed in a cabinet with a carved wooden figure seated above it, and it won consistently for decades. Eventually, it turned out out there was a skilled chess master hidden inside, working the mechanism through a concealed compartment.
Hundreds of years into the future, and the endeavor became a reality. Machines learned to play chess for real. Not through understanding the game, but through pattern recognition across tens of millions of positions. A modern chess engine does not think in any meaningful sense. It evaluates millions of positions per second and selects the best move by sheer brute force. Nobody accuses it of lying when it moves a piece. It does exactly what it was built to do.
So why does an LLM - a large language model sometimes make things up?
AI can take on many forms and the comparison is worth sitting with, because it shows how different two systems that both get called “AI” can be. A chess engine has one job, a fixed space of legal moves, and a clear win condition. A language model has none of these because today it’s meant for literally everything. Programming? Translating? Advanced Calculus? Sure. Its job is to produce text that sounds like a reasonable response to whatever you ask. That is what makes it useful across so many things, and it is also why something that looks a lot like lying was always going to be part of the picture.
How it works
You may have heard that AI systems are unreliable because no one knows how they work and that “we” can’t see what’s going on inside the program. This is poorly worded and a bit untrue, let’s take a look:
Picture a field of one billion levers arranged in rows that stretch as far as the eye can see. In front of you is a screen showing a number. Your goal is to get that number to exactly 500.
You do not know what any individual lever does. Each one adds or subtracts some amount from the total, and levers in later rows multiply the effects of earlier ones. With a billion of them, inspecting each is not an option. What you can do is tell a computer to start making adjustments and watch the screen.
It shows 1,687. Too high. The computer adjusts: -997. Too low. It adjusts again. 1,587. Too high. Down. And so it goes, thousands of iterations, each guided by whether the last result moved closer to 500 or further away. After tens of thousands of attempts, you finally ask the computer for the last time to go a little higher, and the screen shows 500.
Now, do you know which levers the computer switched, in what order, when, and why? No, you don’t, and the computer doesn’t understand it any more than you do. What you have is a system whose internal logic you cannot trace, that reliably delivers the output you wanted.
Those levers are what machine learning researchers call weights or parameters. A modern large language model has hundreds of billions of them. Training the model means running that same iterative process, except instead of converging on a number, it converges on something far harder to specify: the ability to predict what word comes next.
Feed the model every book, article, and web page you can find. Let it predict the next word. Check how wrong it was. Adjust the weights accordingly. Run this loop trillions of times and the program sort of gains the ability to predict the language.
What comes out of the machine is a system that has absorbed the statistical patterns of human language so thoroughly that, given any fragment of text, it can produce a continuation that sounds right. Not because it understands the meaning. Because it has seen enough sentences to know what fits.
Human language, for all its expressiveness, follows surprisingly small number of patterns. The way we structure a question, build an argument, or close a paragraph repeats across millions of texts. A model trained on enough of them gets very good at reproducing those patterns. Good enough, often, that it sounds like it knows what it is talking about.
Sometimes it does. And sometimes it does not. At it’s core it is still just a computer, nothing else. It does not have consciousness in any sense, not that we would be able to define what consciousness even is anyway. For our purposes it might as well be sentient. If it quacks like a duck, and behaves like a duck, then it probably is a duck. But is it actually? Nobody knows. Now You’re most likely pondering what ducks have to do with the fact that AI blatantly lies to you. Let me explain:
Where lying comes from
When a model does not have enough signal to answer confidently, it does not say so. It does not have a mechanism for expressing uncertainty the way you do. What it has is the ability to predict plausible-sounding continuations, and it uses that ability regardless of whether it has any real basis for doing so.
Researchers call this hallucination. The model generates text that is coherent, well-structured, and factually wrong. Not lying in any intentional sense. It is producing what the statistical patterns of language suggest should come next, and those patterns do not always lead to the truth. In another words it is doing what we made it to do and that is to predict the language, not to verify it’s veracity.
This was a bigger problem for earlier models than it is now. Training methods have become significantly more refined, and the feedback loops that shape a model’s behavior now include explicit correction of fabrications. Models have also gotten better calibrated about when to say “I’m not sure” rather than filling a gap with something plausible-sounding. The hallucination problem, as the dominant complaint about AI, belongs mostly to 2022 and 2023.
What replaced it is a different problem.
The internet problem
Most AI assistants now search the web before answering factual questions. The thought behind it is very reasonable: rather than relying on training data that may be years out of date, the model pulls in current information.
The problem is that the internet is not a curated source of correct information. It is everything humans have ever written online, including what is wrong, outdated, published in bad faith, and simply made up. A page that confidently states a false date or a fabricated statistic looks the same to the model as one that does not. The model reads text. It cannot assess a source’s credibility by instinct the way you might, and it does not automatically cross-reference a claim against other sources.
To verify every claim the model retrieves, you would need to run additional searches for each one. To verify those, more searches still. Fetching twenty pages to answer a question could fan out into hundreds of lookups. At current infrastructure costs, that is not economically realistic at scale. Even companies spending billions on compute cannot verify everything for every query.
So when an AI gives you something wrong today, it usually did not invent it from nothing. It found it somewhere. The source was wrong, or the model misread it, or the information was accurate at some point and has since changed. The blame has shifted from the model to the ecosystem it draws from and the fact that we ourselves, humans, have tendencies to lie.
The fix, and what it costs
There is a setup that mostly solves this. Build a private, curated database on a specific topic. Connect your AI model to that database only. Cut off internet access entirely. Force the model to draw only from what is in the database, and always keep the database up to date.
A system built this way, say for plant care, standardized legal definitions, or equipment maintenance, will rarely give you a wrong answer. The only source it can draw from is controlled and maintained. The hallucination rate drops close to zero.
The cost is scope. That system answers questions only about plants. Not about your lease. Not about the news. Narrow scope is what makes the reliability possible. Reliable general knowledge is, for now, a different thing entirely.
What you can do
The model does not know what you do not know. Pay close attention to what the previous sentence says.
When you ask a broad question, the model produces the statistically most likely response across everything it has seen on that topic. If you ask about tenant rights in the United States, you get the general answer, the one applicable to the widest range of people who might ask that question. You do not automatically get the answer for your state, your lease type, your exact situation, and the clause you are unsure about. The model does not know you need that. You did not ask for it.
Prompting is the practice of closing that gap. Tell the model where you are. Tell it what you already know and what you want to find out. Tell it what kind of answer you need and what you want to avoid. The more precisely you define the question, the less room there is for the averaged response to miss what you are after. It is the difference between walking into a doctor’s office and saying “I feel bad” versus telling them where it hurts, when it started, and what makes it worse.
Some people might argue prompting matters less than it used to and that it is an archaic way of trying to force the model to give better answers, that modern models have gotten good enough at reading intent to make precise phrasing unnecessary. For casual questions, there is something to that. For anything where accuracy and knowledge matters over long workflows, the gap has not closed. At the time of writing this article, human experience and judgment is still needed.
At CAIROS, prompting and working around the limitations of AI is something we do every day. We have put together two resources: a set of instructions you can apply directly to your AI assistant to reduce hallucination and stop it from guessing when it should say it does not know, and a practical guide to writing prompts that produce precise answers instead of averaged ones. You can download both here.