Document datasets already have structure. Take advantage of it.
Building retrieval-augmented generation (RAG) applications poses layered challenges, with document retrieval being a significant component of the workflow. The complexity of document retrieval can be approached in various ways depending on the specific use case. However, by utilizing the inherent structure within document datasets, we can unlock performance improvements.
RAG systems often struggle to identify the best set of documents for a nuanced input prompt, especially when relying solely on vector search for candidate selection. Yet, documents themselves provide clues for where to find more information on a given topic through citations, cross-references, and hyperlinks. In this article, we explore a new data model – linked documents – that enables us to parse and preserve direct references to other texts, enhancing simultaneous retrieval capabilities independent of vector search oversights.
By leveraging the structured nature of documents and preserving connections through document linking, we can significantly enhance the performance and accuracy of RAG systems. Let’s delve deeper into how document linking can enrich the retrieval process within AI applications.