Index Document

Adds a new document to the indexing queue for processing and inclusion in the search index.

Endpoint

POST /index

Request Body

Field	Type	Description
data	string or object	The main data to be indexed. Can be a URI, base64 encoded data, or a structured object.
type	string	Specifies the type of data being sent to ensure correct parsing.
id	string	A unique identifier for the document.
metadata	object (optional)	Additional metadata for the document.

Possible `type` values

Type	Description	Use Case
document_uri	URL pointing to the document	Web pages, remotely hosted documents
base64	Base64 encoded document data	Binary files, images
plain_text	Raw text content	Articles, blog posts
structured	JSON or other structured data	API responses, database exports

Usage Examples (Request Body)

Index a Web Page

{
  "data": "https://example.com/document/123",
  "type": "document_uri",
  "id": "doc_123",
  "metadata": {
    "title": "Introduction to AI",
    "author": "Jane Doe",
    "publication_date": "2024-03-15",
    "tags": ["artificial intelligence", "machine learning"],
    "category": "Technology",
    "content_type": "text/html/landing-page"
  }
}

The data field contains the URL of the web page to be indexed. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Index Plain Text Content

{
  "data": "Artificial intelligence (AI) is transforming industries across the globe...",
  "type": "plain_text",
  "id": "ai_overview_001",
  "metadata": {
    "title": "Overview of AI",
    "word_count": 150,
    "language": "en"
  }
}

The data field contains the raw text content to be indexed. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Index Structured Data

The data field contains structured data in JSON format. The metadata href="#__codelineno-3-1">{ "data": [ { "title": "Product Launch Announcement", "type": "plain_text" }, { "description": "Company XYZ is excited to announce the launch of our latest product...", "type": "plain_text" }, { "legal_document": "https://example.com/legal/product_launch_terms.pdf", "type": "document_uri" }, { "category": "Technology", "type": "plain_text" }, { "tags": ["product launch", "technology", "innovation"], "type": "plain_text" // will be converted to list of plain_text }, { "date": "2024-04-01", "type": "plain_text" } ], "id": "product_launch_001", "metadata": { "author": "Company XYZ", "publication_date": "2024-03-15", "content_type": "structured blog", "search_space": "global" } } field provides additional information about the document which will be directly added to the search index metadata.

Note: Similar to the main indexer the only supported types are plain_text, document_uri, and base64.

Index a base64 Image File

{
  "data": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/4Q...",
  "type": "base64",
  "id": "image_001",
  "metadata": {
    "title": "Beautiful Sunset",
    "description": "A stunning view of the sunset over the ocean...",
    "location": "Beach",
    "photographer": "John Smith"
  }
}

The data field contains the base64 encoded image data. The metadata field provides additional information about the image which will be directly added to the search index metadata.

Index a PDF Document

{
  "data": "https://example.com/document/annual_report_2023.pdf",
  "type": "document_uri",
  "id": "annual_report_2023",
  "metadata": {
    "title": "Annual Report 2023",
    "author": "Company XYZ",
    "publication_date": "2024-02-15",
    "category": "Finance",
    "content_type": "application/pdf"
  }
}

The data field contains the URL of the PDF document to be indexed. The metadata field provides additional information about the document which will be directly added to the search index metadata.

Response

Field	Type	Description
Index_Status	string	The current status of the indexing process.

Possible `Index_Status` values

Status	Description
queued	Document is in the indexing queue
processing	Document is being processed
indexed	Document has been successfully indexed
failed	Indexing process failed

Example Response

{
  "Index_Status": "queued"
}

Error Responses

400 Bad Request: Invalid input data
Examples:
- Missing required fields
- Invalid data type
- Invalid document ID format
409 Conflict: Document with the given ID already exists
413 Payload Too Large: Document size exceeds the allowed limit

This endpoint initiates the indexing process for a document. The actual indexing may take some time to complete, depending on the size and complexity of the document. Use the Get Index Status endpoint to check the current status of the indexing process.