Less Hallucinations and Better UX with Claim References

When working with Large Language Models (LLMs), hallucinations are an open challenge. Hallucinations occur when a system produces misleading or false outputs during interaction with a user. We've been researching potential solutions, and believe incorporating claim references might be a promising route. In this article, we'll explore several techniques that tap into the power of references to enhance the reliability of LLMs and their associated challenges.

Understanding the challenge: What's the deal with hallucinations? 

How to mitigate these risks?

The Role of Referencing

References in articles reinforce credibility. Whether it's a scientific paper, news story, or blog post, they serve as proof, backing up the information presented. They allow readers, regardless of their expertise, to verify facts and figures from trusted sources and even delve deeper into the subject matter if possible. This boosts trust and encourages transparency.

Similarly, in Large Language Models (LLMs) or other Machine Learning systems, references play the same role. They act as a validation tool. When LLMs include citations, it not only ensures the information's reliability but also guides users to the original sources, enhancing confidence in both the AI and its produced content.

Referencing techniques

  1. Procedure-injected References
    References are added to the end of the generated answer without LLM intervention. This technique enhances user experience since the context is referenced, and users can validate the output. However, it doesn't reduce hallucinations, as it doesn't affect the LLM generated, or specify which parts of the text correspond to each reference.

  2. LLM-Generated References
    References are created through the LLM's output. This method works as a context-reinforcement mechanism, similar to what Chain-of-thought techniques do, grounding the output a lot more to the context provided by requiring to add the source of the content, reducing the risk of hallucinations while also providing a better experience for users to validate the output generated. However, achieving this is a much more complex task, as it requires adjusting the LLM input or its configuration to achieve the task.

Approaches to LLM-generated referencing:

Overall challenges of LLM-generated references

  1. False negatives: Incomplete referencing
    The inherent limitations of LLMs mean that it's not guaranteed that every claim will be accurately referenced.

    Even if we provide the proper context and the LLM uses it and doesn't generate claims/references that don't exist, the system may not reference claims that should be referenced. Tuning the model and/or the context provided may help to achieve fewer false negatives.

  2. False positives: Hallucinations are still something to deal with
    Even if requiring references to LLM could help to ground it and therefore reduce hallucinations, it is not guaranteed that hallucinations will be eliminated entirely.
    There are potential risks that both the content and the reference generated are not provided by the context, and therefore, we can consider it a hallucination. Further work is required in order to find a mechanism that helps us to mitigate hallucinations completely.

Conclusions

Incorporating claim references in Large Language Models (LLMs) holds promise for enhancing content reliability and reducing errors. Grounding techniques like Procedure-injected and LLM-Generated References offer an added layer of user verification, enhancing both user experience and credibility. Nevertheless, the intricate task of grounding and referencing in LLMs comes with challenges, including the computational cost of fine-tuning and the difficulty of ensuring accurate references for all claims. Despite progress in grounding efforts, an ongoing discussion about the robustness and dependability of these models is essential. Hallucination risks, particularly for users without specialized expertise, emphasize the need for continuous updates, rigorous testing, and complementary strategies to bolster reliability.