Overview
LLM Retrieval is the process of making your data compatible with LLMs. This is done by embedding your data into a vector space, then indexed into a database. This allows you to run semantic searches on your data, which is the core of LLMs.Use Cases
| Title | Description | Demo |
|---|---|---|
| Semantic Search | LLMs give your search the knowledge of semantic meaning | Multi Tenant search |
| Chatbots | Utilize LLMs to create chatbots that can answer questions on your data | RAG Chatbot |
| Question+Answering | Build Chat apps that answer questions on top of your unstructured data | Chat App |
| Tabular Analysis | Analyze tabular data with LLMs | Financial Analysis Chatbot |
| Clustering | Uncover hidden trends within your unstructured data | Clustering Tool |
| Image Search | Search for images based on semantic meaning | Image Retrieval with CLIP |
Usage
Indexing
We provide APIs to easily push data into our system. We support the following “primitive” data types for ingestion:- Image URLs (.jpg, .png, .gif, .bmp, .tiff)
- Text (string)
- Embeddings (number[])
File Importing
We provide APIs to easily push larger collections of data into our system as well. We support the following file types for ingestion:.csv.docx.pdf.pptx.txt.xlsx
- The file runs through a series of metadata extractors + augmenters
- File is converted into a text representation via OCR (if applicable)
- The text is split into overlapping 500 token chunks
- These chunks are embedded based on the chosen embeddings model (ada, clip, etc)
- The embeddings are indexed into a Vector database
- Vector is indexed into our database
Searching
Run semantic search out of the box with our API. We support the following search term types:- Images (.jpg, .png, .gif, .bmp, .tiff)
- Text (string)
- Embeddings (number[])
Definitions
| Term | Definition |
|---|---|
| Document | A record that stores an embedding & metadata |
| Embedding | A vector representation of your data. |
| Index | A database of your embeddings. |
| Indexing | Pushing data (raw text, files, images, etc) into our system. |
| Search | An operation to run semantic searches on your index. |
| Tuning | A mechanism to improve the quality of your embeddings for your particular use case. |