Index Document
Adds a new document to the indexing queue for processing and inclusion in the search index.
Endpoint
Request Body
Field |
Type |
Description |
data |
string or object |
The main data to be indexed. Can be a URI, base64 encoded data, or a structured object. |
type |
string |
Specifies the type of data being sent to ensure correct parsing. |
id |
string |
A unique identifier for the document. |
metadata |
object (optional) |
Additional metadata for the document. |
Possible type
values
Type |
Description |
Use Case |
document_uri |
URL pointing to the document |
Web pages, remotely hosted documents |
base64 |
Base64 encoded document data |
Binary files, images |
plain_text |
Raw text content |
Articles, blog posts |
structured |
JSON or other structured data |
API responses, database exports |
Usage Examples (Request Body)
Index a Web Page
{
"data": "https://example.com/document/123",
"type": "document_uri",
"id": "doc_123",
"metadata": {
"title": "Introduction to AI",
"author": "Jane Doe",
"publication_date": "2024-03-15",
"tags": ["artificial intelligence", "machine learning"],
"category": "Technology",
"content_type": "text/html/landing-page"
}
}
The data
field contains the URL of the web page to be indexed. The metadata
field provides additional information about the document which will be directly added to the search index metadata.
Index Plain Text Content
{
"data": "Artificial intelligence (AI) is transforming industries across the globe...",
"type": "plain_text",
"id": "ai_overview_001",
"metadata": {
"title": "Overview of AI",
"word_count": 150,
"language": "en"
}
}
The data
field contains the raw text content to be indexed. The metadata
field provides additional information about the document which will be directly added to the search index metadata.
Index Structured Data
{
"data": [
{
"title": "Product Launch Announcement",
"type": "plain_text"
},
{
"description": "Company XYZ is excited to announce the launch of our latest product...",
"type": "plain_text"
},
{
"legal_document": "https://example.com/legal/product_launch_terms.pdf",
"type": "document_uri"
},
{
"category": "Technology",
"type": "plain_text"
},
{
"tags": ["product launch", "technology", "innovation"],
"type": "plain_text"
// will be converted to list of plain_text
},
{
"date": "2024-04-01",
"type": "plain_text"
}
],
"id": "product_launch_001",
"metadata": {
"author": "Company XYZ",
"publication_date": "2024-03-15",
"content_type": "structured blog",
"search_space": "global"
}
}
The data
field contains structured data in JSON format. The metadata
field provides additional information about the document which will be directly added to the search index metadata.
Note: Similar to the main indexer the only supported types are plain_text
, document_uri
, and base64
.
Index a base64 Image File
{
"data": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/4Q...",
"type": "base64",
"id": "image_001",
"metadata": {
"title": "Beautiful Sunset",
"description": "A stunning view of the sunset over the ocean...",
"location": "Beach",
"photographer": "John Smith"
}
}
The data
field contains the base64 encoded image data. The metadata
field provides additional information about the image which will be directly added to the search index metadata.
Index a PDF Document
{
"data": "https://example.com/document/annual_report_2023.pdf",
"type": "document_uri",
"id": "annual_report_2023",
"metadata": {
"title": "Annual Report 2023",
"author": "Company XYZ",
"publication_date": "2024-02-15",
"category": "Finance",
"content_type": "application/pdf"
}
}
The data
field contains the URL of the PDF document to be indexed. The metadata
field provides additional information about the document which will be directly added to the search index metadata.
Response
Field |
Type |
Description |
Index_Status |
string |
The current status of the indexing process. |
Possible Index_Status
values
Status |
Description |
queued |
Document is in the indexing queue |
processing |
Document is being processed |
indexed |
Document has been successfully indexed |
failed |
Indexing process failed |
Example Response
{
"Index_Status": "queued"
}
Error Responses
400 Bad Request
: Invalid input data
- Examples:
- Missing required fields
- Invalid data type
- Invalid document ID format
409 Conflict
: Document with the given ID already exists
413 Payload Too Large
: Document size exceeds the allowed limit
This endpoint initiates the indexing process for a document. The actual indexing may take some time to complete, depending on the size and complexity of the document. Use the Get Index Status endpoint to check the current status of the indexing process.