Create a Retrieval Pipeline
Build a LLM retrieval pipeline with Metal (RAG)
What is RAG (Retrieval-Augmented Generation)?
RAG combines LLMs with external collections of data, like the Datasources we’ll create next. It works in two steps:
- Retrieval: The model taps into an external source (a Datasource) to find relevant data based on a query.
- Augmented Generation: With the data it found, the model crafts a more informed response.
1. Create a Datasource
A Datasource is the collection of files that need to be preprocessed to feed our application.
To start, go to your organization Dashboard, navigate to the Datasources tab, and click on the Add Datasource
button to create a new Datasource.
2. Add an Index and Connect it to a Datasource
Indexes are where your preprocessed data will be transformed and made queryable for your application.
To set up an Index, go to the Indexes tab, and click on the Add Index
button to create a new Index.
You can then connect your Index to the existing Datasource.
3. Add Files to your Datasource
Go to the Datasource tab, and click on the Upload File
button to upload your files.
Accepted file formats are PDF, DOC, DOCX, XLSX, and CSV. This will now become Data Entities
.
After you’ve added a file, Metal preprocess the data, create the chunks and generate the embeddings that will be used for retrieval.