Skip to content

Search Documents

Searches for documents based on the provided query and parameters.

Endpoint

GET /search

Request Body

Field Type Description
query string The search query/data. Can be a URI, base64 encoded data, or plain text.
search_type string Specifies the type of search to perform. Default is "hybrid_vector_graph".
search_space string Defines the search space. Default is "global".
search_params object (optional) Additional search type/space related parameters.
additional_filters array of objects (optional) Additional search filters.
page object Pagination information.

Possible search_type values

Search Type Speed Accuracy Description Use Case
dense Fast Meidum Pure Semantic Search (Dense Vector) Captures the semantic meaning of the query and documents.
sparse Fast Low Keyword Search (Sparse Vector) Captures the keyword-based relevance of the query and documents.
graph Slow High Knowledge Graph Search (Graph) Captures the relationships between entities and concepts in the documents. Providing more factual and contextual information.
hybrid_vector Medium High Semantic (Dense) + Keyword (Sparse) Search Combines the strengths of semantic and keyword-based search. Provides a balance between Semantic and Keyword information.
hybrid_vector_graph Slow Highest Semantic (Dense) + Keyword (Sparse) + Graph Search (Default) Combines the strengths of semantic, keyword-based, and graph search. Provides the most comprehensive and accurate results.

Possible search_space values

Search Space Description Use Case
global Complete site-wide search (includes everything) General search across all content types.
textual_content All textual content Search only within textual content like articles, blogs, etc.
media Media-only like Documents, Images & anything that goes through media Search only within media content like images, documents, etc.
custom_file_type Specific media types (e.g., Images, Documents, Videos) Search within specific media types. Example: Search within documents only
custom_file_extension Specific file extensions Search within specific file extensions. Example: Search within .docx document files only

search_params object

Field Type Description
doc_id string Limits the search to a specific document ID. Can be used to retrieve related chunks from the same document.
return_metadata boolean Whether to return metadata along with the search results. Default is false.

Note: - Other implementations for search_params are yet to be added. - Using return_metadata will return metadata along with the search results. Metadata includes fields like title, description, doc_type, and URL. This is useful for displaying search results with additional information. But it may impact the search performance as it increases the response size.

additional_filters array

Each object in the additional_filters array should have the following structure:

{
  "field": "string",
  "value": "string"
}

These filters provide a way to send additional filtering criteria to the backend search engine. For example:

{
  "field": "drupal_content_type",
  "value": "article"
}
If the metadata field drupal_content_type is present in the documents, this filter will limit the search to only those documents where the drupal_content_type is set to article.

Note: - These filters are specific to the search engine implementation and may vary based on the search engine used. - You need to add such metadata to your documents to make use of these filters.

page object

Field Type Description
current_page int The current page number of results.
results_per_page int The number of results to return per page. Default is 10.

Usage Examples (Request Body)

{
  "query": "artificial intelligence applications",
}
This will perform a basic search across all content types with default search type and space.

Different Search Type

{
  "query": "artificial intelligence applications",
  "search_type": "graph"
}
This will perform a search using the graph search type.

Different Search Space

{
  "query": "artificial intelligence applications",
  "search_space": "textual_content"
}
This will perform a search within the complete textual content type only.

{
  "query": "artificial intelligence applications",
  "search_space": "media"
}
This will perform a search within the media content type only. Useful for searching images, documents, etc in the media library.

Different Search Parameters

{
  "query": "artificial intelligence applications",
  "search_params": {
    "doc_id": "12345"
  }
}
This will perform a search within the document with ID "12345". A document ID is a unique identifier for a document in the search index. It can be a page ID, media ID, or any other unique identifier. Used to retrieve related chunks from the same document.

Note: The doc_id is the highest priority parameter and will override search_space, search_type, and additional_filters.

Additional Filters

{
  "query": "artificial intelligence applications",
  "additional_filters": [
    {
      "field": "drupal_content_type",
      "value": "article"
    }
  ]
}
This will perform a search within the "article" content type only.

Combined Options

For more complex searches, you can combine multiple options like so.

{
  "query": "artificial intelligence applications",
  "search_type": "hybrid_vector",
  "search_space": "textual_content",
  "additional_filters": [
    {
      "field": "drupal_content_type",
      "value": "article"
    }
  ],
  "page": {
    "current_page": 1,
    "results_per_page": 10
  }
}
  • An hybrid_vector search within all the textual content type.
  • Limited to the "article" content type.
  • Paginated results with 10 results per page.

Response

Field Type Description
results array of objects List of search results, each containing an ID, a relevant chunk of content, and a score.
page object Pagination information about the returned results.

Example Response

{
  "results": [
    {
      "doc_id": "d123",
      "content": "This is a sample document about artificial intelligence...",
      "score": 1.0
    },
    {
      "doc_id": "b783",
      "content": "Machine learning is a subset of AI that focuses on data and algorithms...",
      "score": 0.98
    },
    {
      "doc_id": "h142",
      "content": "AI is transforming industries with its applications in various fields...",
      "score": 0.96
    },
    {
      "doc_id": "i980",
      "content": "The future of AI is bright with new advancements and applications...",
      "score": 0.90
    },
    {
      "doc_id": "f645",
      "content": "Deep learning is a powerful tool in the AI toolkit with many applications...",
      "score": 0.89
    }
  ],
  "page": {
    "current_page": 1,
    "results_per_page": 5,
    "total_pages": 3,
    "total_results": 14
  }
}

With metadata return enabled

{
  "results": [
    {
      "doc_id": "d123",
      "content": "This is a sample document about artificial intelligence...",
      "score": 1.0,
      "metadata": {
        "title": "Artificial Intelligence",
        "description": "A brief overview of artificial intelligence and its applications...",
        "doc_type": "article",
        "url": "https://example.com/article/artificial-intelligence"
      }
    },
    {
      "doc_id": "b783",
      "content": "Machine learning is a subset of AI that focuses on data and algorithms...",
      "score": 0.98,
      "metadata": {
        "title": "Machine Learning",
        "description": "An introduction to machine learning and its applications...",
        "doc_type": "article",
        "url": "https://example.com/article/machine-learning"
      }
    },
    {
      "doc_id": "h142",
      "content": "AI is transforming industries with its applications in various fields...",
      "score": 0.96,
      "metadata": {
        "title": "AI Transforming Industries",
        "description": "A look at how AI is transforming industries with real-world examples...",
        "doc_type": "pdf",
        "url": "https://example.com/article/ai-transforming-industries.pdf"
      }
    },
    {
      "doc_id": "i980",
      "content": "The future of AI is bright with new advancements and applications...",
      "score": 0.90,
      "metadata": {
        "title": "Future of AI",
        "description": "A glimpse into the future of AI and the possibilities it holds...",
        "doc_type": "video",
        "url": "https://example.com/video/future-of-ai.mp4"
      }
    },
    {
      "doc_id": "f645",
      "content": "Deep learning is a powerful tool in the AI toolkit with many applications...",
      "score": 0.89,
      "metadata": {
        "title": "Deep Learning",
        "description": "An in-depth look at deep learning and its applications in AI...",
        "doc_type": "image",
        "url": "https://example.com/article/deep-learning.jpg"
      }
    }
  ],
  "page": {
    "current_page": 1,
    "results_per_page": 5,
    "total_pages": 3,
    "total_results": 14
  }
}

Error Responses

  • 400 Bad Request: Invalid input data
    • Examples:
      • Invalid search type
      • Invalid search space
      • Invalid pagination parameters
  • 404 Not Found: No results found for the given query
  • 500 Internal Server Error: Search engine error or database connection issue
  • 503 Service Unavailable: Search service is temporarily unavailable

Note: Generic API errors like "Endpoint not found" are managed at a level above this API and are not included here.

This endpoint allows you to search across your indexed documents using various search types and parameters. It's designed to be flexible and powerful, accommodating a wide range of search requirements.