Overview

Attribute Extraction is the process of identifying specific information within unstructured data and converting it into a structured format. It scans Datasources, isolates particular pieces of information, and presents them in a structured manner.

TitleDescriptionDemo
Content OrganizationStructure scattered data into defined formats.Organize Data Formats
Tabular Data ExtractionExtract tabular data from PDFs and images.Tabular Data Extraction
Financial Analysis ChatbotAnalyze financial documents and answer questions.Financial Analysis Chatbot
Compare DocumentsCompare documents and identify differences.Comparing Insurance Policies

Usage

Add Datasource

Create a datasource by calling the Add datasource endpoint to define the field attributes to extract. Use the description parameter to guide the LLM.

Get Datasource

Get the datasource by calling the Get datasource endpoint. This will return the datasource with the corresponding id.

File Uploading (Add Data Entities)

We support the following file types for Attribute Extraction:

  • .pdf
  • .csv
  • .docx
  • .xlsx

Upon upload, these files run through the following pipeline:

  1. File is converted into a text representation via OCR (if applicable)
  2. The text runs through a series of metadata extractors + augmenters
  3. Attributes are stored in our database and ready to use in your indexes.

Definitions

Glossary

TermDefinition
AttributeSpecific pieces of information identified and extracted from the raw data.
Attribute Extraction templateAn outline of specified attributes and its descriptions.
Data EntityA unique entry in a Datasource. Eg: an uploaded file and the extracted attributes.
DatasourceA collection of Data Entities addressing a specific data concern. This could be from an integration, grouping of files, etc.
OCR (Optical Character Recognition)A technology that recognizes and converts different types of documents, such as scanned paper documents, PDF files, or images, into editable and searchable text.