Search Documents
Searches for documents based on the provided query and parameters.
Endpoint
Request Body
Field | Type | Description |
---|---|---|
query | string | The search query/data. Can be a URI, base64 encoded data, or plain text. |
search_type | string | Specifies the type of search to perform. Default is "hybrid_vector_graph". |
search_space | string | Defines the search space. Default is "global". |
search_params | object (optional) | Additional search type/space related parameters. |
additional_filters | array of objects (optional) | Additional search filters. |
page | object | Pagination information. |
Possible search_type
values
Search Type | Speed | Accuracy | Description | Use Case |
---|---|---|---|---|
dense | Fast | Meidum | Pure Semantic Search (Dense Vector) | Captures the semantic meaning of the query and documents. |
sparse | Fast | Low | Keyword Search (Sparse Vector) | Captures the keyword-based relevance of the query and documents. |
graph | Slow | High | Knowledge Graph Search (Graph) | Captures the relationships between entities and concepts in the documents. Providing more factual and contextual information. |
hybrid_vector | Medium | High | Semantic (Dense) + Keyword (Sparse) Search | Combines the strengths of semantic and keyword-based search. Provides a balance between Semantic and Keyword information. |
hybrid_vector_graph | Slow | Highest | Semantic (Dense) + Keyword (Sparse) + Graph Search (Default) | Combines the strengths of semantic, keyword-based, and graph search. Provides the most comprehensive and accurate results. |
Possible search_space
values
Search Space | Description | Use Case |
---|---|---|
global | Complete site-wide search (includes everything) | General search across all content types. |
textual_content | All textual content | Search only within textual content like articles, blogs, etc. |
media | Media-only like Documents, Images & anything that goes through media | Search only within media content like images, documents, etc. |
custom_file_type | Specific media types (e.g., Images, Documents, Videos) | Search within specific media types. Example: Search within documents only |
custom_file_extension | Specific file extensions | Search within specific file extensions. Example: Search within .docx document files only |
search_params
object
Field | Type | Description |
---|---|---|
doc_id | string | Limits the search to a specific document ID. Can be used to retrieve related chunks from the same document. |
return_metadata | boolean | Whether to return metadata along with the search results. Default is false. |
Note:
- Other implementations for search_params are yet to be added.
- Using return_metadata
will return metadata along with the search results. Metadata includes fields like title, description, doc_type, and URL. This is useful for displaying search results with additional information. But it may impact the search performance as it increases the response size.
additional_filters
array
Each object in the additional_filters array should have the following structure:
These filters provide a way to send additional filtering criteria to the backend search engine. For example:
If the metadata fielddrupal_content_type
is present in the documents, this filter will limit the search to only those documents where the drupal_content_type
is set to article
.
Note: - These filters are specific to the search engine implementation and may vary based on the search engine used. - You need to add such metadata to your documents to make use of these filters.
page
object
Field | Type | Description |
---|---|---|
current_page | int | The current page number of results. |
results_per_page | int | The number of results to return per page. Default is 10. |
Usage Examples (Request Body)
Basic Search
This will perform a basic search across all content types with default search type and space.Different Search Type
This will perform a search using the graph search type.Different Search Space
This will perform a search within the complete textual content type only. This will perform a search within the media content type only. Useful for searching images, documents, etc in the media library.Different Search Parameters
This will perform a search within the document with ID "12345". A document ID is a unique identifier for a document in the search index. It can be a page ID, media ID, or any other unique identifier. Used to retrieve related chunks from the same document.Note: The doc_id is the highest priority parameter and will override search_space, search_type, and additional_filters.
Additional Filters
{
"query": "artificial intelligence applications",
"additional_filters": [
{
"field": "drupal_content_type",
"value": "article"
}
]
}
Combined Options
For more complex searches, you can combine multiple options like so.
{
"query": "artificial intelligence applications",
"search_type": "hybrid_vector",
"search_space": "textual_content",
"additional_filters": [
{
"field": "drupal_content_type",
"value": "article"
}
],
"page": {
"current_page": 1,
"results_per_page": 10
}
}
- An hybrid_vector search within all the textual content type.
- Limited to the "article" content type.
- Paginated results with 10 results per page.
Response
Field | Type | Description |
---|---|---|
results | array of objects | List of search results, each containing an ID, a relevant chunk of content, and a score. |
page | object | Pagination information about the returned results. |
Example Response
{
"results": [
{
"doc_id": "d123",
"content": "This is a sample document about artificial intelligence...",
"score": 1.0
},
{
"doc_id": "b783",
"content": "Machine learning is a subset of AI that focuses on data and algorithms...",
"score": 0.98
},
{
"doc_id": "h142",
"content": "AI is transforming industries with its applications in various fields...",
"score": 0.96
},
{
"doc_id": "i980",
"content": "The future of AI is bright with new advancements and applications...",
"score": 0.90
},
{
"doc_id": "f645",
"content": "Deep learning is a powerful tool in the AI toolkit with many applications...",
"score": 0.89
}
],
"page": {
"current_page": 1,
"results_per_page": 5,
"total_pages": 3,
"total_results": 14
}
}
With metadata return enabled
{
"results": [
{
"doc_id": "d123",
"content": "This is a sample document about artificial intelligence...",
"score": 1.0,
"metadata": {
"title": "Artificial Intelligence",
"description": "A brief overview of artificial intelligence and its applications...",
"doc_type": "article",
"url": "https://example.com/article/artificial-intelligence"
}
},
{
"doc_id": "b783",
"content": "Machine learning is a subset of AI that focuses on data and algorithms...",
"score": 0.98,
"metadata": {
"title": "Machine Learning",
"description": "An introduction to machine learning and its applications...",
"doc_type": "article",
"url": "https://example.com/article/machine-learning"
}
},
{
"doc_id": "h142",
"content": "AI is transforming industries with its applications in various fields...",
"score": 0.96,
"metadata": {
"title": "AI Transforming Industries",
"description": "A look at how AI is transforming industries with real-world examples...",
"doc_type": "pdf",
"url": "https://example.com/article/ai-transforming-industries.pdf"
}
},
{
"doc_id": "i980",
"content": "The future of AI is bright with new advancements and applications...",
"score": 0.90,
"metadata": {
"title": "Future of AI",
"description": "A glimpse into the future of AI and the possibilities it holds...",
"doc_type": "video",
"url": "https://example.com/video/future-of-ai.mp4"
}
},
{
"doc_id": "f645",
"content": "Deep learning is a powerful tool in the AI toolkit with many applications...",
"score": 0.89,
"metadata": {
"title": "Deep Learning",
"description": "An in-depth look at deep learning and its applications in AI...",
"doc_type": "image",
"url": "https://example.com/article/deep-learning.jpg"
}
}
],
"page": {
"current_page": 1,
"results_per_page": 5,
"total_pages": 3,
"total_results": 14
}
}
Error Responses
400 Bad Request
: Invalid input data- Examples:
- Invalid search type
- Invalid search space
- Invalid pagination parameters
- Examples:
404 Not Found
: No results found for the given query500 Internal Server Error
: Search engine error or database connection issue503 Service Unavailable
: Search service is temporarily unavailable
Note: Generic API errors like "Endpoint not found" are managed at a level above this API and are not included here.
This endpoint allows you to search across your indexed documents using various search types and parameters. It's designed to be flexible and powerful, accommodating a wide range of search requirements.