Search Documents

Searches for documents based on the provided query and parameters.

Endpoint

GET /search

Request Body

Field	Type	Description
query	string	The search query/data. Can be a URI, base64 encoded data, or plain text.
search_type	string	Specifies the type of search to perform. Default is "hybrid_vector_graph".
search_space	string	Defines the search space. Default is "global".
search_params	object (optional)	Additional search type/space related parameters.
additional_filters	array of objects (optional)	Additional search filters.
page	object	Pagination information.

Possible `search_type` values

Search Type	Speed	Accuracy	Description	Use Case
dense	Fast	Meidum	Pure Semantic Search (Dense Vector)	Captures the semantic meaning of the query and documents.
sparse	Fast	Low	Keyword Search (Sparse Vector)	Captures the keyword-based relevance of the query and documents.
graph	Slow	High	Knowledge Graph Search (Graph)	Captures the relationships between entities and concepts in the documents. Providing more factual and contextual information.
hybrid_vector	Medium	High	Semantic (Dense) + Keyword (Sparse) Search	Combines the strengths of semantic and keyword-based search. Provides a balance between Semantic and Keyword information.
hybrid_vector_graph	Slow	Highest	Semantic (Dense) + Keyword (Sparse) + Graph Search (Default)	Combines the strengths of semantic, keyword-based, and graph search. Provides the most comprehensive and accurate results.

Possible `search_space` values

Search Space	Description	Use Case
global	Complete site-wide search (includes everything)	General search across all content types.
textual_content	All textual content	Search only within textual content like articles, blogs, etc.
media	Media-only like Documents, Images & anything that goes through media	Search only within media content like images, documents, etc.
custom_file_type	Specific media types (e.g., Images, Documents, Videos)	Search within specific media types. Example: Search within documents only
custom_file_extension	Specific file extensions	Search within specific file extensions. Example: Search within .docx document files only

`search_params` object

Field	Type	Description
doc_id	string	Limits the search to a specific document ID. Can be used to retrieve related chunks from the same document.
return_metadata	boolean	Whether to return metadata along with the search results. Default is false.

Note: - Other implementations for search_params are yet to be added. - Using return_metadata will return metadata along with the search results. Metadata includes fields like title, description, doc_type, and URL. This is useful for displaying search results with additional information. But it may impact the search performance as it increases the response size.

`additional_filters` array

Each object in the additional_filters array should have the following structure:

{
  "field": "string",
  "value": "string"
}

These filters provide a way to send additional filtering criteria to the backend search engine. For example:

{
  "field": "drupal_content_type",
  "value": "article"
}

If the metadata field drupal_content_type is present in the documents, this filter will limit the search to only those documents where the drupal_content_type is set to article.

Note: - These filters are specific to the search engine implementation and may vary based on the search engine used. - You need to add such metadata to your documents to make use of these filters.

`page` object

Field	Type	Description
current_page	int	The current page number of results.
results_per_page	int	The number of results to return per page. Default is 10.

Usage Examples (Request Body)

Basic Search

{
  "query": "artificial intelligence applications",
}

This will perform a basic search across all content types with default search type and space.

Different Search Type

{
  "query": "artificial intelligence applications",
  "search_type": "graph"
}

This will perform a search using the graph search type.

Different Search Space

{
  "query": "artificial intelligence applications",
  "search_space": "textual_content"
}

This will perform a search within the complete textual content type only.

{
  "query": "artificial intelligence applications",
  "search_space": "media"
}

This will perform a search within the media content type only. Useful for searching images, documents, etc in the media library.

Different Search Parameters

{
  "query": "artificial intelligence applications",
  "search_params": {
    "doc_id": "12345"
  }
}

This will perform a search within the document with ID "12345". A document ID is a unique identifier for a document in the search index. It can be a page ID, media ID, or any other unique identifier. Used to retrieve related chunks from the same document.

Note: The doc_id is the highest priority parameter and will override search_space, search_type, and additional_filters.

Additional Filters

{
  "query": "artificial intelligence applications",
  "additional_filters": [
    {
      "field": "drupal_content_type",
      "value": "article"
    }
  ]
}

This will perform a search within the "article" content type only.

Combined Options

For more complex searches, you can combine multiple options like so.

{
  "query": "artificial intelligence applications",
  "search_type": "hybrid_vector",
  "search_space": "textual_content",
  "additional_filters": [
    {
      "field": "drupal_content_type",
      "value": "article"
    }
  ],
  "page": {
    "current_page": 1,
    "results_per_page": 10
  }
}

An hybrid_vector search within all the textual content type.
Limited to the "article" content type.
Paginated results with 10 results per page.

Response

Field	Type	Description
results	array of objects	List of search results, each containing an ID, a relevant chunk of content, and a score.
page	object	Pagination information about the returned results.

Example Response

{
  "results": [
    {
      "doc_id": "d123",
      "content": "This is a sample document about artificial intelligence...",
      "score": 1.0
    },
    {
      "doc_id": "b783",
      "content": "Machine learning is a subset of AI that focuses on data and algorithms...",
      "score": 0.98
    },
    {
      "doc_id": "h142",
      "content": "AI is transforming industries with its applications in various fields...",
      "score": 0.96
    },
    {
      "doc_id": "i980",
      "content": "The future of AI is bright with new advancements and applications...",
      "score": 0.90
    },
    {
      "doc_id": "f645",
      "content": "Deep learning is a powerful tool in the AI toolkit with many applications...",
      "score": 0.89
    }
  ],
  "page": {
    "current_page": 1,
    "results_per_page": 5,
    "total_pages": 3,
    "total_results": 14
  }
}

With metadata return enabled

href="#__codelineno-11-1">{ "results": [ { "doc_id": "d123", "content": "This is a sample document about artificial intelligence...", "score": 1.0, "metadata": { "title": "Artificial Intelligence", "description": "A brief overview of artificial intelligence and its applications...", "doc_type": "article", "url": "https://example.com/article/artificial-intelligence" } }, { "doc_id": "b783", "content": "Machine learning is a subset of AI that focuses on data and algorithms...", "score": 0.98, "metadata": { "title": "Machine Learning", "description": "An introduction to machine learning and its applications...", "doc_type": "article", "url": "https://example.com/article/machine-learning" } }, { "doc_id": "h142", "content": "AI is transforming industries with its applications in various fields...", "score": 0.96, "metadata": { "title": "AI Transforming Industries", "description": "A look at how AI is transforming industries with real-world examples...", "doc_type": "pdf", "url": "https://example.com/article/ai-transforming-industries.pdf" } }, { "doc_id": "i980", "content": "The future of AI is bright with new advancements and applications...", "score": 0.90, "metadata": { "title": "Future of AI", "description": "A glimpse into the future of AI and the possibilities it holds...", "doc_type": "video", "url": "https://example.com/video/future-of-ai.mp4" } }, { "doc_id": "f645", "content": "Deep learning is a powerful tool in the AI toolkit with many applications...", "score": 0.89, "metadata": { "title": "Deep Learning", "description": "An in-depth look at deep learning and its applications in AI...", "doc_type": "image", "url": "https://example.com/article/deep-learning.jpg" } } ], "page": { "current_page": 1, "results_per_page": 5, "total_pages": 3, "total_results": 14 } }

Error Responses

400 Bad Request: Invalid input data
- Examples:
  - Invalid search type
  - Invalid search space
  - Invalid pagination parameters
404 Not Found: No results found for the given query
500 Internal Server Error: Search engine error or database connection issue
503 Service Unavailable: Search service is temporarily unavailable

Note: Generic API errors like "Endpoint not found" are managed at a level above this API and are not included here.

This endpoint allows you to search across your indexed documents using various search types and parameters. It's designed to be flexible and powerful, accommodating a wide range of search requirements.