Your customers are screenshotting it. Your lawyers are already watching.
In June 2023, a law firm was fined for submitting fictitious legal research to court. The citations looked real. The case law sounded authoritative. Every word of it was fabricated by ChatGPT. The lawyers did not catch it. The judge did.
That is not an edge case. That is a preview.
Here is what nobody selling you an AI platform wants to say out loud: RAG does not eliminate hallucination. It shifts it. It moves the failure point from the model’s training data to your retrieval system — and then adds two brand new failure points on top. Your system can now hallucinate at retrieval, at augmentation, and at generation. Three bites at the apple. All of them invisible to the user receiving the confident, grammatically perfect, completely fabricated answer.
Hallucination is not a bug in reasoning. It is a fundamental consequence of how these systems work. They optimise for what sounds plausible. Not for what is true.
When Retrieval Fails
When your RAG system fails at retrieval — because your chunks are the wrong size, your embedding model is too weak, or the right document ranked at position 15 when you only send the top 10 — your model does not say “I don’t know.” It fills the gap. It generates a plausible-sounding answer from whatever context it received. And it delivers that answer with the same confidence it delivers correct ones.
When chunks are the wrong size, embeddings are too weak, or the right document ranks outside your top-k threshold, the model fills the gap with fabricated but plausible-sounding content. The user cannot tell the difference.
When Augmentation Fails
When it fails at augmentation — because too many documents exceed the context window, or conflicting sources appear in the same prompt — the model does not flag the contradiction. It picks arbitrarily, blends incorrectly, or invents a synthesis that satisfies neither source. Researchers call this contextual hallucination. Your users call it a wrong answer. Your legal team calls it liability.
When conflicting sources appear in the same prompt or too many documents exceed the context window, the model picks arbitrarily, blends incorrectly, or invents a synthesis that satisfies neither source.
When Generation Fails
When it fails at generation — because models sample outputs based on statistical patterns, not on truth — it produces what the training data said was most probable. Not what the retrieved document actually said. The retrieved document was right there in the prompt. The model ignored it.
The model produces what its training data said was most probable — not what the retrieved document actually said. The source was right there in the prompt. The model ignored it.
The Consequences Are Already Here
In healthcare, a hallucinated treatment recommendation can harm patients. In legal, fabricated case citations undermine entire cases. In finance, organisations become legally responsible for decisions made based on AI outputs — whether those outputs are accurate or fabricated.
And here is the part that should keep you awake: hallucinations propagate to thousands of users before anyone notices. Users who catch the system making up facts will stop trusting it — even when it is correct. You do not get a second chance at that trust.
What Effective Defence Looks Like
There are seven specific interventions that catch hallucinations before your users do — at retrieval, at augmentation, and at generation. Hard confidence thresholds that filter irrelevant documents before they reach the model. Post-generation claim extraction that breaks every response into individual factual statements and links each one to source evidence. Corrective RAG with fallback mechanisms that returns “I don’t have enough information” instead of guessing. Evaluation frameworks that measure your retrieval precision and generation faithfulness as separate metrics — because aggregate accuracy hides which component is failing.
The full prevention, detection, and evaluation framework is in the report below. It is the difference between an AI system your organisation can stand behind — and one your lawyers will be explaining to a judge.