How Metal Works
AI Extraction
Metal Attribute Extraction allows you to easily pull and organize information from unstructured data.
Overview
Attribute Extraction is the process of identifying specific information within unstructured data and converting it into a structured format. It scans Datasources, isolates particular pieces of information, and presents them in a structured manner.
Title | Description | Demo |
---|---|---|
Content Organization | Structure scattered data into defined formats. | Organize Data Formats |
Tabular Data Extraction | Extract tabular data from PDFs and images. | Tabular Data Extraction |
Financial Analysis Chatbot | Analyze financial documents and answer questions. | Financial Analysis Chatbot |
Compare Documents | Compare documents and identify differences. | Comparing Insurance Policies |
Usage
Add Datasource
Create a datasource by calling the Add datasource endpoint to define the field attributes to extract. Use the description parameter to guide the LLM.
Get Datasource
Get the datasource by calling the Get datasource endpoint. This will return the datasource with the corresponding id.
File Uploading (Add Data Entities)
We support the following file types for Attribute Extraction:
.pdf
.csv
.docx
.xlsx
Upon upload, these files run through the following pipeline:
- File is converted into a text representation via OCR (if applicable)
- The text runs through a series of metadata extractors + augmenters
- Attributes are stored in our database and ready to use in your indexes.
Definitions
Glossary
Term | Definition |
---|---|
Attribute | Specific pieces of information identified and extracted from the raw data. |
Attribute Extraction template | An outline of specified attributes and its descriptions. |
Data Entity | A unique entry in a Datasource. Eg: an uploaded file and the extracted attributes. |
Datasource | A collection of Data Entities addressing a specific data concern. This could be from an integration, grouping of files, etc. |
OCR (Optical Character Recognition) | A technology that recognizes and converts different types of documents, such as scanned paper documents, PDF files, or images, into editable and searchable text. |