Skip to content

Index Document

Adds a new document to the indexing queue for processing and inclusion in the search index.

Endpoint

POST /index

Request Body

Field Type Description
data string or object The main data to be indexed. Can be a URI, base64 encoded data, or a structured object.
type string Specifies the type of data being sent to ensure correct parsing.
id string A unique identifier for the document.
metadata object (optional) Additional metadata for the document.

Possible type values

Type Description Use Case
document_uri URL pointing to the document Web pages, remotely hosted documents
base64 Base64 encoded document data Binary files, images
plain_text Raw text content Articles, blog posts
structured JSON or other structured data API responses, database exports

Usage Examples (Request Body)

Index a Web Page

{
  "data": "https://example.com/document/123",
  "type": "document_uri",
  "id": "doc_123",
  "metadata": {
    "title": "Introduction to AI",
    "author": "Jane Doe",
    "publication_date": "2024-03-15",
    "tags": ["artificial intelligence", "machine learning"],
    "category": "Technology",
    "content_type": "text/html/landing-page"
  }
}
The data field contains the URL of the web page to be indexed. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Index Plain Text Content

{
  "data": "Artificial intelligence (AI) is transforming industries across the globe...",
  "type": "plain_text",
  "id": "ai_overview_001",
  "metadata": {
    "title": "Overview of AI",
    "word_count": 150,
    "language": "en"
  }
}
The data field contains the raw text content to be indexed. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Index Structured Data

{
  "data": [
    {
      "title": "Product Launch Announcement",
      "type": "plain_text"
    },
    {
      "description": "Company XYZ is excited to announce the launch of our latest product...",
      "type": "plain_text"
    },
    {
      "legal_document": "https://example.com/legal/product_launch_terms.pdf",
      "type": "document_uri"
    },
    {
      "category": "Technology",
      "type": "plain_text"
    },
    {
      "tags": ["product launch", "technology", "innovation"],
      "type": "plain_text"
      // will be converted to list of plain_text
    },
    {
      "date": "2024-04-01",
      "type": "plain_text"
    }
  ],
  "id": "product_launch_001",
  "metadata": {
      "author": "Company XYZ",
      "publication_date": "2024-03-15",
      "content_type": "structured blog",
      "search_space": "global"
    }
}
The data field contains structured data in JSON format. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Note: Similar to the main indexer the only supported types are plain_text, document_uri, and base64.

Index a base64 Image File

{
  "data": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/4Q...",
  "type": "base64",
  "id": "image_001",
  "metadata": {
    "title": "Beautiful Sunset",
    "description": "A stunning view of the sunset over the ocean...",
    "location": "Beach",
    "photographer": "John Smith"
  }
}
The data field contains the base64 encoded image data. The metadata field provides additional information about the image which will be directly added to the search index metadata.

Index a PDF Document

{
  "data": "https://example.com/document/annual_report_2023.pdf",
  "type": "document_uri",
  "id": "annual_report_2023",
  "metadata": {
    "title": "Annual Report 2023",
    "author": "Company XYZ",
    "publication_date": "2024-02-15",
    "category": "Finance",
    "content_type": "application/pdf"
  }
}
The data field contains the URL of the PDF document to be indexed. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Response

Field Type Description
Index_Status string The current status of the indexing process.

Possible Index_Status values

Status Description
queued Document is in the indexing queue
processing Document is being processed
indexed Document has been successfully indexed
failed Indexing process failed

Example Response

{
  "Index_Status": "queued"
}

Error Responses

  • 400 Bad Request: Invalid input data
  • Examples:
    • Missing required fields
    • Invalid data type
    • Invalid document ID format
  • 409 Conflict: Document with the given ID already exists
  • 413 Payload Too Large: Document size exceeds the allowed limit

This endpoint initiates the indexing process for a document. The actual indexing may take some time to complete, depending on the size and complexity of the document. Use the Get Index Status endpoint to check the current status of the indexing process.