Skip to content

Implement federated multi search #625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LukasKalbertodt
Copy link
Contributor

@LukasKalbertodt LukasKalbertodt commented Dec 16, 2024

Pull Request

Related issue

Fixes #609

What does this PR do?

This PR adds types and methods to use the federated multi search API. There are multiple ways to add this to this library, depending on how strictly one wants to type everything. I decided to:

  • Add a new FederatedMultiSearchResponse, which also required a new Client::execute_federated_multi_search_query. The response of federated searches is just very different from normal multi searches, and I didn't think having an enum returned by execute_multi_search_query was a particularly nice design (and it would have been a breaking change).
  • Add a federation: Option<FederationHitInfo> to each SearchResult (which is None for non-federated searches). I don't think it's worth to have a dedicated FederatedSearchResult.
  • Add MultiSearchQuery::with_federation which returns a new FederatedMultiSearchQuery. One could also change MultiSearchQuery to be able to represent federated queries, but I had a slight preference for my solution.

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • New Features

    • Introduced federated multi-search, allowing users to combine results from multiple indexes into a single, merged list.
    • Added support for specifying federation options such as weighting, offsets, limits, and facet merging in multi-search queries.
    • Search results now include federation metadata, providing more context about each hit in federated searches.
  • Documentation

    • Expanded documentation to explain federated search usage and options in multi-search queries.

@LukasKalbertodt LukasKalbertodt force-pushed the implement-federated-multi-search branch 2 times, most recently from 67c4c36 to 4051890 Compare January 14, 2025 09:36
@curquiza curquiza requested a review from irevoire February 10, 2025 22:01
@LukasKalbertodt LukasKalbertodt force-pushed the implement-federated-multi-search branch from 4051890 to 93e52d8 Compare May 19, 2025 08:40
Copy link

coderabbitai bot commented May 19, 2025

"""

Walkthrough

The changes introduce federated multi-search support by adding new query and response types, federation-specific options, and related metadata structures. The client gains an asynchronous method to execute federated multi-search queries. Documentation is updated to explain usage and type requirements for federated queries. No changes to error handling or unrelated functionality are made.

Changes

File(s) Change Summary
src/client.rs Added execute_federated_multi_search_query async method to Client for federated multi-search; expanded documentation for multi_search to explain federated search usage and type requirements.
src/search.rs Introduced federated multi-search support:
- Added FederatedMultiSearchQuery, FederationOptions, FederatedMultiSearchResponse, FederationHitInfo, and QueryFederationOptions structs.
- Enhanced MultiSearchQuery and SearchQuery with federation methods and fields.
- Updated SearchResult to include federation metadata.
- Replaced manual Serialize impl for MultiSearchQuery with derived version.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Client
    participant Server

    User->>Client: Build MultiSearchQuery
    User->>Client: Call with_federation(federation_options)
    User->>Client: Call execute() on FederatedMultiSearchQuery
    Client->>Server: POST /multi-search with federated query
    Server-->>Client: FederatedMultiSearchResponse<T>
    Client-->>User: Merged federated search results
Loading

Assessment against linked issues

Objective Addressed Explanation
Add a new search parameter named federation to the multiSearch method (#609)
Add capability for a new search parameter federationOptions in the search query (#609)
Add tests (#609) No test additions are visible in the provided summary.

Poem

A rabbit leaps through fields of search,
Now federated queries it can perch!
With options and weights, results combine,
Across indexes, the answers align.
🥕✨
New structs and methods—what a feat!
Our search is now truly elite.
"""

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@LukasKalbertodt LukasKalbertodt force-pushed the implement-federated-multi-search branch from 93e52d8 to 9d73cb9 Compare May 19, 2025 08:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (7)
src/client.rs (1)

131-144: Reduce duplication by factoring out the common HTTP call

execute_federated_multi_search_query and execute_multi_search_query are identical except for the body/response types. This means we now have two almost-copy-pasted code blocks that have to stay in sync (endpoint path, status code, error handling, etc.).

A tiny generic helper would remove this repetition and the chance that one of the two branches silently drifts:

+    async fn post_multi_search<B, R>(&self, body: &B) -> Result<R, Error>
+    where
+        Http: HttpClient,
+        B: Serialize + ?Sized,
+        R: DeserializeOwned,
+    {
+        self.http_client
+            .request::<(), &B, R>(
+                &format!("{}/multi-search", &self.host),
+                Method::Post { body, query: () },
+                200,
+            )
+            .await
+    }
+
     pub async fn execute_federated_multi_search_query<
         T: 'static + DeserializeOwned + Send + Sync,
     >(
         &self,
         body: &FederatedMultiSearchQuery<'_, '_, Http>,
     ) -> Result<FederatedMultiSearchResponse<T>, Error> {
-        self.http_client
-            .request::<(), &FederatedMultiSearchQuery<Http>, FederatedMultiSearchResponse<T>>(
-                &format!("{}/multi-search", &self.host),
-                Method::Post { body, query: () },
-                200,
-            )
-            .await
+        self.post_multi_search(body).await
     }

This keeps one single source-of-truth for the HTTP details and scales if more multi-search flavours appear later.

src/search.rs (6)

368-377: Consider documenting federation-specific weight semantics

QueryFederationOptions::weight is the only supported option for now, but its valid range/behaviour (e.g. 0–1? negative allowed?) isn’t obvious from the type alone. A short doc comment and, if applicable, #[serde(skip_serializing_if = "is_zero")] would improve ergonomics.


617-624: Guard against accidental use outside a federated query

with_federation_options can be called on any SearchQuery, including ones that will later be executed as single-index searches – the backend will then choke on the unexpected federation field.

To make misuse impossible you could:

  1. Gate the method behind a feature flag on FederatedMultiSearchQuery only, or
  2. Make it pub(crate) and funnel all access through MultiSearchQuery::with_federation.

Either way the public surface stays self-consistent.


666-676: Ownership ergonomics: return &mut self in with_federation

Consuming self forces a temporary variable:

let resp = client.multi_search()
    .with_search_query(q1)
    .with_search_query(q2)
    .with_federation(fed_opts)   // returns FederatedMultiSearchQuery
    .execute::<Foo>()
    .await?;

Returning FederatedMultiSearchQuery by value is fine, but to stay symmetrical with the other builder methods you could implement with_federation_mut(&mut self, …) -> &mut Self as sugar.
Not critical, just API nicety.


690-698: Missing #[serde(bound(serialize = \"\"))] on the whole struct

Like MultiSearchQuery, FederatedMultiSearchQuery can end up with an unnecessary Http: Serialize bound when the derive macro kicks in. Only the queries field is patched – adding the same attribute to the struct ensures consistency:

-#[derive(Debug, Serialize, Clone)]
+#[derive(Debug, Serialize, Clone)]
+#[serde(bound(serialize = ""))]
 pub struct FederatedMultiSearchQuery …

702-713: Nit: use Vec<String> for facets_by_index values?

facets_by_index is HashMap<String, Vec<String>>, but facet names are usually &str and never repeated, so HashSet or Vec is fine. If order doesn’t matter a set avoids duplicates.

Not blocking – purely optional.


716-725: Propagate lifetimes correctly in execute

The method returns Result<FederatedMultiSearchResponse<T>, Error> by borrowing self for 'a, but the response doesn’t reference self. You can shorten the borrow to &self like the non-federated variant:

-    pub async fn execute<T: 'static + DeserializeOwned + Send + Sync>(
-        &'a self,
+    pub async fn execute<T: 'static + DeserializeOwned + Send + Sync>(
+        &self,
     ) -> Result<FederatedMultiSearchResponse<T>, Error> {

No semantic change, but simpler signature.

🛑 Comments failed to post (1)
src/search.rs (1)

69-72: ⚠️ Potential issue

weighted_ranking_score is treated as mandatory – will panic on missing data

The Meilisearch docs currently don’t guarantee that _federation.weighted_ranking_score is always present (the comment below even hints at that).
Deserialising into a non-Option<f32> will make the whole response fail if that single field is omitted in a minor server update.

-    pub weighted_ranking_score: f32,
+    /// Score contributed by the federation weighting.  
+    /// Making this optional avoids hard-fail deserialisation if the server omits it.
+    #[serde(default)]
+    pub weighted_ranking_score: Option<f32>,

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/search.rs around lines 69 to 72, the field weighted_ranking_score is
currently defined as a non-optional f32, which will cause a panic if the field
is missing in the response. Change the type of weighted_ranking_score to
Option<f32> to safely handle cases where this field is absent, preventing
deserialization failures and panics.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
src/search.rs (1)

636-1406: 🛠️ Refactor suggestion

Consider adding tests for the federated search functionality.

The code has good test coverage for other search features, but there are no tests for the new federated search functionality. Adding tests would help verify that everything works as expected and provide examples of how to use the new features.

Would you like me to help draft test cases for the federated search functionality?

🧹 Nitpick comments (1)
src/search.rs (1)

751-759: Fix typo in TODO comment and consider field optionality.

There's a typo in the comment on line 757 ("TOOD" should be "TODO"). Also, if weighted_ranking_score might be optional as the comment suggests, consider using Option<f32> for the field type.

-    // TOOD: not mentioned in the docs, is that optional?
-    pub weighted_ranking_score: f32,
+    // TODO: not mentioned in the docs, is that optional?
+    pub weighted_ranking_score: Option<f32>,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 93e52d8 and 9d73cb9.

📒 Files selected for processing (2)
  • src/client.rs (2 hunks)
  • src/search.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/client.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: integration-tests
🔇 Additional comments (9)
src/search.rs (9)

69-71: LGTM: Clear field addition for federation metadata.

This new field enables storing federation-related information on search results, correctly following the pattern used for other metadata fields with proper serde attribute and documentation.


368-376: LGTM: Federation options field and struct look good.

The federation options field is properly marked as crate-internal with appropriate serialization handling, and the QueryFederationOptions struct is well-structured with the needed weight field.


409-409: LGTM: Default initialization for federation_options.

Properly initializing the new field as None in the constructor.


617-624: LGTM: Builder method for federation options.

This follows the established builder pattern consistently with other methods in the class.


636-646: LGTM: Serialization configuration for MultiSearchQuery.

The custom serde bound is correctly used here as explained in the previous review comment.


667-676: LGTM: Transformation method from MultiSearchQuery to FederatedMultiSearchQuery.

This provides a clean way to transform a regular multi-search query into a federated one, following the Builder pattern approach.


690-698: LGTM: FederatedMultiSearchQuery struct definition.

The structure properly includes all necessary fields with correct serialization configuration.


700-713: LGTM: FederationOptions struct with clear documentation.

The struct provides all the necessary options for federation with a helpful link to the documentation.


715-725: LGTM: Execute method implementation for FederatedMultiSearchQuery.

This implementation correctly delegates to the client's method for executing federated searches.

Comment on lines +727 to +749
/// Returned by federated multi search.
#[derive(Debug, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct FederatedMultiSearchResponse<T> {
/// Merged results of the query.
pub hits: Vec<SearchResult<T>>,

// TODO: are offset, limit and estimated_total_hits really non-optional? In
// my tests they are always returned, but that's not a proof.
/// Number of documents skipped.
pub offset: usize,
/// Number of results returned.
pub limit: usize,
/// Estimated total number of matches.
pub estimated_total_hits: usize,

/// Distribution of the given facets.
pub facet_distribution: Option<HashMap<String, HashMap<String, usize>>>,
/// facet stats of the numerical facets requested in the `facet` search parameter.
pub facet_stats: Option<HashMap<String, FacetStats>>,
/// Processing time of the query.
pub processing_time_ms: usize,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Address TODO comment about field optionality.

There's an open question about whether offset, limit, and estimated_total_hits fields should be optional. Consider investigating this further to ensure the API is consistent and robust.


🏁 Script executed:

#!/bin/bash
# Check Meilisearch API documentation and actual responses to determine if these fields are always present

# Search the codebase for clues about these fields' optionality in other response structures
echo "Checking other response structures for field optionality patterns:"
grep -r "pub offset:" --include="*.rs" .
grep -r "pub limit:" --include="*.rs" .
grep -r "pub estimated_total_hits:" --include="*.rs" .

# Look for any test files that might verify these fields
echo -e "\nSearching for tests that might verify these fields:"
grep -r "offset" --include="*test*.rs" .
grep -r "limit" --include="*test*.rs" .
grep -r "estimated_total_hits" --include="*test*.rs" .

Length of output: 1546


Align optionality of offset, limit, and estimated_total_hits across response structs

We currently see both plain and Option<usize> definitions for these fields in several places—this should be standardized based on the API spec:

• src/documents.rs
pub offset: u32 vs pub offset: Option<usize>
pub limit: u32 vs pub limit: Option<usize>
• src/indexes.rs
– same pattern for offset and limit
• src/key.rs
– same pattern for offset and limit
• src/search.rs
– other search structs use pub offset: Option<usize>, pub limit: Option<usize>, and pub estimated_total_hits: Option<usize>

Action items:

  1. Check the Meilisearch API docs (or real responses) to confirm whether these fields are always returned for federated multi‐search.
  2. If guaranteed, change them to non‐optional (usize or u32) everywhere and remove the TODO.
  3. If they can be omitted, update this struct to use Option<usize>, remove the TODO, and add/update tests accordingly.
🤖 Prompt for AI Agents
In src/search.rs around lines 727 to 749, the fields offset, limit, and
estimated_total_hits are currently non-optional but there is uncertainty if they
are always present in federated multi-search responses. Investigate the
Meilisearch API documentation or actual API responses to confirm if these fields
are guaranteed. If always present, keep them as non-optional usize and remove
the TODO comment. If they can be omitted, change these fields to Option<usize>,
remove the TODO, and update or add tests to reflect this optionality for
consistency with other search response structs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v1.10.0] Federated search
2 participants