The MLnotes Newsletter

The MLnotes Newsletter

What's Order-Preserve RAG (OP-RAG)?

Why use RAG in the Era of Long-Context Language Models? Part 2

Mehdi Allahyari's avatar
Mehdi Allahyari
Oct 07, 2024
∙ Paid
2
1
Share

In recent years, the field of natural language processing has seen remarkable advancements, particularly in the development of large language models (LLMs) with increasingly expansive context windows. Last week we introduced the research from Li et al. (2024) comparing RAG with and without long-context (LC) LLMs.

Today, we’ll delve into another intriguing paper by Nvidia researchers, "In Defense of RAG in the Era of Long-Context Language Models." Interestingly, this paper presents a contrasting conclusion: using a proposed mechanism, RAG outperforms models that rely solely on long-context LLMs in delivering higher-quality answers.

The issue

Despite the impressive capabilities of long-context LLMs, there are inherent limitations to processing extremely large amounts of text in a single pass. As the context window expands, the model's ability to focus on relevant information can diminish, potentially leading to a degradation in answer quality. This phenomenon highlights a critical trade-off between the breadth of available information and the precision of the model's output.

P.S. We have also previously covered this in this post:

The Loss-in-the-Middle Problem in RAG🙀

Angelina Yang and Mehdi Allahyari
·
January 18, 2024
The Loss-in-the-Middle Problem in RAG🙀

Did you know that as smart as advanced models like GPT, when faced with questions involving extensive context, can sometimes struggle to provide accurate responses?

Read full story

Order-Preserve RAG (OP-RAG)

In light of these challenges, researchers have proposed a novel approach called Order-Preserve Retrieval-Augmented Generation (OP-RAG). This method aims to combine the strengths of both RAG and long-context LLMs while mitigating their respective weaknesses.

The key idea of OP-RAG lies in its preservation of the original order of retrieved chunks from the source document. Unlike traditional RAG, which typically arranges retrieved chunks by relevance, OP-RAG maintains the sequential structure of the original text. This approach has shown significant improvements in answer quality, particularly when dealing with a large number of retrieved chunks.

At first, this might seem counterintuitive. However, upon further reflection, it makes sense—it also preserves the logical flow of the original content.

Source: Vanilla RAG versus the proposed order-preserve the RAG. As shown in the example, a long document is cropped into 13 chunks, {ci}i=113. The similarity score is appended to each chunk. We retrieve top 4 chunks with the highest similarity scores. Vanilla RAG places the chunks in a score-descending order, whereas the proposed order-preserve RAG places the chunks based on the order in the original document.

The Inverted U-Shaped Curve: Finding the Sweet Spot

Keep reading with a 7-day free trial

Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 MLnotes
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture