Retrieval-augmented generation

Retrieval-augmented generation (RAG) is a type of information retrieval process. It modifies interactions with a large language model (LLM) so that it responds to queries with reference to a specified set of documents, using it in preference to information drawn from its own vast, static training data. This allows LLMs to use domain-specific and/or updated information. Use cases include providing chatbot access to internal company data, or giving factual information only from an authoritative source.

Process
The RAG process is made up of four key stages. First, all the data must be prepared and indexed for use by the LLM. Thereafter, each query consists of a retrieval, augmentation and a generation phase.

Indexing
The data to be referenced must first be converted into LLM embeddings, numerical representations in the form of large vectors. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval.



Retrieval
Given a user query, a document retriever is first called to select the most relevant documents which will be used to augment the query. This is done by encoding the query as a vector embedding and then comparing it to the vectors of the source documents. This comparison can be done using a variety of methods, which depend in part on the type of indexing used.

Augmentation
The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains, and using memory and self-improvement to learn from previous retrievals.

Generation
Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output such as the re-ranking of retrieved information, context selection and fine tuning.

Challenges
If the external data source is large, retrieval can be slow. The use of RAG does not completely eliminate the general challenges faced by LLMs, including hallucination.