Reranking Embeddings to Improve How They Retrieve Information
We can use the power of embeddings -- numerical representations of words or entire documents -- to better understand user questions. Here's how the process works:
Creation: We use Large Language Models (LLMs) to convert content into embeddings, ensuring that similar concepts have closely related values.
Storage: Embeddings are then stored in specialized databases designed to check vector similarities.
Matching: Users' questions are turned into an embedding. The system then compares it with the database to find the closest matching content embeddings.
However, we found out there's a challenge: sometimes embeddings, despite having high similarity scores, fail to provide relevant information for answering a user question. To address this, we've explored an approach known as embedding reranking.
Understanding the Challenge
Embeddings can sometimes provide content that, while semantically similar, may not be contextually relevant for answering specific user queries. This can lead to a situation where the LLM receives content that doesn't aid in producing a meaningful answer. The risk? LLMs might "hallucinate", or produce answers based on irrelevant content, which could result in misleading or incorrect answers. Therefore, it's crucial to integrate methods that will enhance the quality of the content.
Embedding reranking is one of these methods. After the initial retrieval of potential embeddings, reranking serves as an additional filter. It goes through the top-k results (those with the highest similarity to the query) and re-evaluates them for relevance. The reranking process assigns a new score to each embedding, considering its importance in answering the user's question. Those that are deemed irrelevant are filtered out.
By reranking, we're left with quality content that, when provided to an LLM, results in a more accurate answer. This method ensures that the content used for generating answers is both relevant and of high quality.
Using LLMs for Reranking
In our research on reranking using embeddings, we did some experiments and found out that the most effective method involves using LLMs because LLMs, compared to standard embedding retrievals, offer a more in-depth analysis of a document's relevance to a question.
We experimented using an LLM with gpt-3.5-turbo to filter relevant text based on custom prompts. This improved answer accuracy by removing irrelevant documents and focusing the model on the task, but it introduced some latency to the general workflow. However, the latency was often offset when passing fewer embeddings to another LLM to generate the answers later on. In the end, this method gave the model a better context, boosting its performance.
Challenges Associated with Reranking
Hallucinations: Using an LLM for reranking could make the model produce false or inaccurate results, although our experiments showed minimal occurrences.
Latency Issues: Querying the LLM adds delay to the process, especially when handling a large number of embeddings. However, filtering out irrelevant embeddings can later reduce this delay during the answering phase.
Overall, Our findings suggest that these challenges are outweighed by the improved accuracy and relevance of answers.
Embedding reranking is like a superpower when it comes to improving the retrieval of relevant documents. It can even tackle questions that cannot be answered using the existing information database. For this reason, we believe it is an essential step when working with embeddings. We plan to support this technique within our ecosystem of tools. If you have any ideas, questions or would like to know more about embedding reranking capabilities in your AI solutions, contact us.←