6  Reliable AI needs a New Paradigm

Hallucinations, in the context of AI systems, refer to the phenomenon where an AI model generates outputs that appear plausible and coherent, but do not accurately reflect the real-world facts or the intended purpose of the system. These hallucinations can manifest in various forms, such as fabricated information, nonsensical responses, or outputs that are inconsistent with the input data. Hallucinations can occur in a wide range of AI applications, from natural language processing to computer vision and beyond.

Addressing the issue of hallucinations in AI is crucial for ensuring the reliable and trustworthy deployment of these systems. Hallucinations can lead to erroneous decision-making, false conclusions, and potentially harmful outcomes, especially in critical applications such as healthcare, finance, and public safety, just to mention a few. By understanding the causes and mechanisms behind hallucinations, researchers and developers can work to mitigate these issues, improving the overall robustness and reliability of AI systems.

The main thesis of this article is that, although hallucinations can be reduced or mitigated with a variety of practical approaches, the core issue is a fundamental flaw in the assumptions about the nature of language and truth that are intrinsic to the prevalent language modeling paradigms used today. If this thesis is correct, we won’t be able to solve AI hallucinations entirely with incremental improvements to current tech; we will need a new machine learning paradigm altogether.

What are Hallucinations in AI?

The term “hallucination” in the context of AI refers to the phenomenon where a large language model (LLM) or other generative AI system produces outputs that appear plausible and coherent, but do not accurately reflect reality or the intended purpose of the system. These hallucinations manifest as the generation of false, inaccurate, or nonsensical information that the AI presents with confidence, as if it were factual.

First, a caveat. Unlike human hallucinations, which involve perceiving things that are not real, AI hallucinations are associated with the model producing unjustified responses or beliefs, rather than perceptual experiences. The name “hallucination” is therefore imperfect, and it often leads to mistakes as people tend to antropomorphize these models and make erroneous assumptions about how they work and the causes of these failures. However, we will stick to this name in this article because it is the prevalent nomenclature used everywhere people talk about AI. Just keep in mind we’re talking about something completely different to what the term “hallucination” means in general.

Before diving in the why of AI hallucinations, let’s distinguish them from other typical failures of generative models, such as out-of-distribution errors or biased outputs.

Out-of-distribution errors occur when an AI model is presented with input data that is significantly different from its training data, causing it to produce unpredictable or nonsensical outputs. In these cases, the model’s limitations are clear, and it is evident that the input is outside of its capabilities. This is just an error of generalization which usually points to either: 1) the model’s hypothesis space is too constrained to entirely capture the actual distribution of data, or 2) the available data or training regimes are insuficient to pinpoint a general-enough hypothesis.

Hallucinations are more insidious than out-of-distribution errors because they happen within the input distribution, where the model is supposedly well-behaved. Even worse, due to the stochastic nature of generative models, hallucinations tend happen entirely randomly, which means that for the same input the model can hallucinate once out of 100 times, making it almost impossible to evaluate and debug.

Biased outputs, on the other hand, arise when an AI model’s training data or algorithms contain inherent biases, leading to the generation of outputs that reflect those biases, such as stereotypes or prejudices. These are often not hallucinations, but realistic reproductions of the very human biases that pervade our societies: The model is producing something that reflects the reality underlying its training data. It’s just that such reality is an ugly one. Dealing with biases in AI is one of the most important challenges in making AI safe, but it is a completely different problem that we can tackle in a future issue.

Hallucinations, in contrast, involve the AI model generating information that is not necessarily biased, but completely fabricated or detached from reality. This makes the problem of detecting them far hard, because the model’s responses appear confident and coherent, and there is no obvious telltale that helps human evaluators quickly identify it.

Real-World Implications of AI Hallucinations

The occurrence of hallucinations in AI systems, and particularly in large language models (LLMs), can have significant consequences, especially in high-stakes applications such as healthcare, finance, or public safety. For example, a healthcare AI model that incorrectly identifies a malignant skin lession as benign can doom a patient. On the other hand, identifying a benign skin lesion as malignant could lead to unnecessary medical interventions, also causing harm to the patient. Similarly, in the financial sector, hallucinated outputs from an AI system could result in poor investment decisions with potentially devastating economic impacts.

However, even in low-stakes applications, the insidious nature of hallucinations make then a fundamental barrier to the widespread adoption of AI. For example, imagine you’re using an LLM to generate summaries from audio transcripts of a meeting, extracting relevant talking points and actionable items. If the model tends to hallucinate once in a while, either failing to extract one key item, or worse, producing an spurious item, it will be virtually impossible for anyone to detect that without manually revising the transcript, thus rendering the whole application of AI in this domain useless.

For this reason, one of the key challenges in addressing the real-world implications of language model hallucinations is the difficulty in effectively communicating the limitations of these systems to end-users. LLMs are trained to produce fluent, coherent outputs that can appear plausible, even when they are factually incorrect. If the end-users of an AI system are not sufficiently informed to review the output of the system with a critical eye, they may never spot any hallucinations. This leads to a chain of mistakes as the errors from the AI system propagate upstream through the layers of decision makers in an organization. Ultimately, you could be making a very bad decision that seems entirely plausible given all the available information because the source of the error –an AI hallucination– is impossible to detect.

Thus, the development and deployment of LLMs with hallucination capabilities raises important ethical considerations. There is a need for responsible AI development practices that prioritize transparency, accountability, and the mitigation of potential harms. This includes establishing clear guidelines for testing and validating LLMs before real-world use, as well as implementing robust monitoring and oversight mechanisms to identify and address hallucinations as they arise.

Crucially, there are absolutely zero generative AI systems today that can guarantee they don’t hallucinate. This tech is simply unreliable in fundamental ways, so every actor in this domain, from developers to users, must be aware there will be hallucinations in your system, and you must have guardrails in place to deal with the output of unreliable AIs. And this is so perverse because we are used to software just working. Whenever software doesn’t do what it should, that’s a bug. But hallucinations are not a bug of AI, at least in the current paradigm. As we will see in the next section, they are an inherent feature of the way generative models work.

Why Hallucinations Happen?

There are many superficial reasons for hallucinations, from data and modelling problems, to issues with prompting. However, the underlying cause of all hallucinations, at least in large language models, is that the current language modeling paradigm used in these systems is, by design, a hallucination machine. Let’s unpack that.

Generative AI models, including LLMs, rely on capturing statistical patterns in their training data to generate outputs. Rather than storing explicit factual claims, LLMs implicitly encode information as statistical correlations between words and phrases. This means the models do not have a clear, well-defined understanding of what is true or false, they can just generate plausibly sounding text.

The reason this mostly works, is because generating plausibly sounding text has a high probabilty of reproducing something that is true, provided you trained on mostly truthful data. But large language models (LLMs) are trained on vast corpora of text data from the internet, which contains inaccuracies, biases, and even fabricated information. So these models have “seen” many true sentences and thus picked up correlations between words that tend to generate true sentences, but they’ve also seen many variants of the same sentences which are slightly or even entirely wrong.

So one of the primary reasons for the occurrence of hallucinations is the lack of grounding in authoritative knowledge sources. Without a strong foundation in verified, factual knowledge, the models struggle to distinguish truth from falsehood, leading to the generation of hallucinated outputs. But this is far from the only problem. Even if you only train on factual information—assuming there would be enough of such high-quality data to begin with—the statistical nature of language models make them susceptible to hallucinate.

Suppose your model has only seen truthful sentences, and learned the correlations between words in these sentences. Imagine there are two very similar sentences, both factually true, that differ in just a couple of words –maybe a date and a name, for example “Person A was born in year X” and “Person B was born in year Y”. Given the way these models work, the probability of generating a mixed-up sentence like “Person B was born in year X” is only slightly smaller than generating either of the original sentences.

What’s going on here is that the statistical model implicitely assumes that small changes in the input (the sequence of words) lead to small changes in the output (the probability of generating a sentence). In more technical terms, the statistical model assumes a smooth distribution, which is necessary because the amount of data the model needs to encode is orders of magnitude bigger than the memory (i.e., number of parameters) in the model. Thus, the models has to compress the training corpus, and compression implies loosing some of the information.

In other words, statistical language models inherently assume that sentences very similar to what they have seen in the training data are also plausible sentences. They encode a smooth representation of language, and that’s fine, as long as you don’t equate plausible with factual. See, these models weren’t designed with factuality in mind. They were originally designed for tasks like translation, where plausibility and coherence are all that matters. It’s only when you turn them into answering machines that you run into a problem.

The problem is there is nothing smooth about facts. A sentence is eithre factual or not, there are no degrees of truthfulness —for the most part; let’s not get dragged into epistemological discussions here. But LLMs cannot, by design, define a strict frontier between true and false sentences. All the frontiers are fuzzy, so there is no clear cutoff point where you can say, if a sentence has less than X value of perplexity then it is false. And even if you could define such a threshold, it would different for all sentences.

You may ask why can’t we avoid using this “smooth” representation altogether. The reason is that you want to generate sentences that are not in the training set. This means you need to somehow guess that some sentences you have never seen are also plausible, and guessing means you have to make some assumptions. The smooth hypothesis is very reasonable —and computationally convenient, as these models are trained with gradient descent, which requieres smoothness in the loss function— again, as long as you don’t care about factuality. If you don’t compress the training data in this smooth, lossy way, you will simply I can’t wait for you to start training your own chatbot and building exciting applications with LLMs!not be able to generate novel sentences at all.

In summary, this is the underlying reason why the current paradigm of generative AI will always hallucinate, no matter how good is your data and how ellaborated are your training procedures or guardrails. The statistical language modeling paradigm, at its core, is a hallucination machine. It is concocting plausibly-sounding sentences by mixing and matching words that is has seen together in similar contexts in the training set. It has no inherent notion of whether a given sentence is true or false. All it can tell is that it looks like sentences that appear in the training set.

Now, a silver-lining could be this idea that even if some false sentences will unavoidably be generated, we can train the system to minimize their ocurrence by showing it lots and lots of high quality data. That is, can we push the probability of a hallucination to a sufficiently low value that, in practice, almost never happens? Recent research suggests that if there is a sentence that can be generated at all, no matter how low its base probability, then there is a prompt that will generate it with almost 100% certainty. This means that if we introduce malicious actors into our equation, we can never be sure our system can’t be jailbroken.

Mitigating Hallucinations in AI

So far we’ve argued that hallucinations are inherently impossible to eliminate completely. But this doesn’t mean we can’t do anything about it in practice. I want to end this article with a short summary of mitigation approaches that are being used today by researchers and developers.

One key strategy is to incorporate external knowledge bases and fact-checking systems into the AI models. By grounding the models in authoritative, verified information sources, the risk of generating fabricated or inaccurate outputs can be reduced.

Researchers are also exploring ways to develop more robust model architectures and training paradigms that are less susceptible to hallucinations. This may involve techniques like increasing model complexity, incorporating explicit reasoning capabilities, or using specialized training data and loss functions.

Enhancing the transparency and interpretability of AI models is also crucial for addressing hallucinations. By making the models’ decision-making processes more transparent, it becomes easier to identify and rectify the underlying causes of hallucinations.

Alongside these technical approaches, the development of standardized benchmarks and test sets for hallucination assessment is crucial. This will enable researchers and developers to quantify the prevalence and severity of hallucinations, as well as compare the performance of different models in this regard. Thus, if you can’t completely eliminate the problem, at least you can quantify it and make informed decisions about where and when it is safe enough to deploy a generative model.

Finally, addressing the challenge of hallucinations in AI requires an interdisciplinary approach, involving collaboration between AI researchers, domain experts, and authorities in fields like scientific reasoning, legal argumentation, and other relevant disciplines. By fostering cross-disciplinary knowledge sharing and research, the understanding and mitigation of hallucinations can be further advanced.

Conclusion

The issue of hallucinations in AI systems, particularly in large language models, poses a significant challenge for the reliable and trustworthy deployment of these powerful technologies. Hallucinations, where AI models generate plausible-sounding but factually inaccurate outputs, can have serious consequences in high-stakes applications and undermine user trust.

The underlying causes of hallucinations stem from the fundamental limitations of current language modeling approaches, including the lack of grounding in authoritative knowledge sources, the reliance on statistical patterns in training data, and the inherent difficulty in reliably distinguishing truth from falsehood using statistics alone. These limitations highlight the need for more advanced techniques that can better understand the nuances of language and factual claims, probably involving some fundamental paradigm shifts in machine learning that take us beyond what pure statistical models can achieve.

Mitigating hallucinations in practice requires a multifaceted approach, involving the incorporation of external knowledge bases and fact-checking systems, the development of more robust model architectures and training paradigms, the leveraging of human-in-the-loop and interactive learning strategies, and the improvement of model transparency and interpretability. Standardized benchmarks and test sets for hallucination assessment, as well as interdisciplinary collaboration between AI researchers, domain experts, and authorities in related fields, will be crucial for advancing the understanding and mitigation of this challenge.

I hope this article has given you some food for thought. We went a bit deep into the technical details of how generative models work, but that is necessary to understand why these issues are so hard to solve. If you like this type of article, please let me know with a comment, and feel free to suggest future topics of your interest.