Skip to content

Add hybrid vector + full-text search#75

Merged
ianmacartney merged 11 commits intoget-convex:mainfrom
richardsolomou:hybrid-text-search
Feb 10, 2026
Merged

Add hybrid vector + full-text search#75
ianmacartney merged 11 commits intoget-convex:mainfrom
richardsolomou:hybrid-text-search

Conversation

@richardsolomou
Copy link
Contributor

@richardsolomou richardsolomou commented Jan 30, 2026

Summary

  • Extend the search action with optional full-text search that combines vector and text search results using Reciprocal Rank Fusion (hybridRank)
  • Add textSearch, textWeight, and vectorWeight options to the client SearchOptions type
  • Default searchableText to chunk text content so hybrid search works out of the box for new documents

Changes

  • src/component/search.ts: Add textSearch internal query using the searchableText search index; extend search action with hybrid path (vector + text → RRF merge); scope text search to namespace; apply user filters (OR semantics) to text search matching vector search behavior; guard against empty merge results
  • src/component/chunks.ts: Extract shared buildRanges helper; add getRangesOfChunkIds and getChunkIdsByEmbeddingIds internal queries
  • src/client/index.ts: Add textSearch/textWeight/vectorWeight to SearchOptions; pass textQuery through in RAG.search(); default searchableText in createChunkArgsBatch
  • src/client/hybridRank.ts: Fix typo "Recriprocal" → "Reciprocal"
  • src/component/schema.ts: Remove TODO comment
  • src/component/search.test.ts: Add 9 tests covering text search query, namespace scoping, filters, chunk ID lookups, hybrid search with textQuery, deduplication, vector-only path, and weight parameters
  • example/: Add hybrid search toggle to example app, passing textSearch option through search and askQuestion actions

Backwards compatibility

  • All new args are optional — existing code is unaffected
  • Existing chunks without searchableText won't appear in text search but still work for vector search
  • New chunks automatically get searchableText populated
  • Return types unchanged; score semantics change only when textSearch is enabled (scores become position-based via RRF)

Test plan

  • Existing tests pass (npm test)
  • Test vector-only search still works identically (no textSearch param)
  • Test hybrid search: add documents, search with textSearch: true
  • Test that text search is scoped to namespace
  • Test that user filters are applied to text search
  • Test deduplication when results appear in both vector and text results
  • Test with textWeight/vectorWeight to confirm ranking changes
  • Test getChunkIdsByEmbeddingIds and getRangesOfChunkIds helpers

Summary by CodeRabbit

  • New Features

    • Hybrid search modes (Vector, Text, Hybrid) with a UI selector, separate scope selector (general/category/file), and propagation to search and Q&A flows.
    • Tunable text/vector weight controls and improved result grouping/context ranges.
  • Bug Fixes

    • Fixed ranking algorithm name in docs.
  • Tests

    • Added comprehensive hybrid search tests for ranking, deduplication, weighting, and namespace filtering.
  • Chores

    • Expanded search options and public surface to support new modes and weights.

Extend the search action with optional text search that combines vector
and full-text search results using Reciprocal Rank Fusion (hybridRank).

- Add textSearch internal query using the searchableText search index
- Add getRangesOfChunkIds and getChunkIdsByEmbeddingIds queries
- Extract shared buildRanges helper from getRangesOfChunks
- Add textSearch, textWeight, vectorWeight options to client API
- Default searchableText to chunk content in createChunkArgsBatch
@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a selectable search mode (vector|text|hybrid) across UI, client, and server; threads searchType and weighting parameters through RAG search calls; implements text-only, vector-only, and hybrid ranking paths; centralizes range-building; and adds hybrid search tests.

Changes

Cohort / File(s) Summary
Example app
example/convex/example.ts, example/src/Example.tsx, example/src/components/SearchInterface.tsx
Added searchType (optional) to Convex action args and UI; introduced searchScope in UI while keeping internal searchType mode; pass searchType into server actions and provide setters/props to SearchInterface.
Client API & types
src/shared.ts, src/client/index.ts
Added vSearchType validator and SearchType type; extended SearchOptions with searchType, textWeight, vectorWeight; attach searchableText to chunk creation and forward textQuery/weights/searchType to server.
Server search logic
src/component/search.ts, src/client/hybridRank.ts
Implemented vector-only, text-only, and hybrid search flows (weights, textQuery, embedding/dimension checks), added internal handlers textSearch and textAndRanges, and fixed RRF doc comment.
Chunk & range management
src/component/chunks.ts
Extracted and exported buildRanges helper; refactored getRangesOfChunks to use it (centralized range-building and deduplication).
Tests
src/component/search.test.ts
Added extensive hybrid search tests (text, vector, hybrid, namespace/filters, weighting, dedupe) and imported internal endpoints for text paths.
Minor
src/component/schema.ts
Removed a TODO comment related to text search (no behavioral change).

Sequence Diagram(s)

sequenceDiagram
    participant UI as Client UI
    participant ClientLib as Client Library
    participant SearchAPI as Search Action
    participant TextPath as Text Search Worker
    participant VectorPath as Vector Search Worker
    participant RRF as Reciprocal Rank Fusion
    participant DB as Database

    UI->>ClientLib: search(query, searchType, textWeight?, vectorWeight?)
    ClientLib->>SearchAPI: convex.action.search(payload with searchType/textQuery/weights)

    alt searchType == "text" or "hybrid"
        SearchAPI->>TextPath: textSearch(textQuery, filters)
        TextPath->>DB: full-text lookup
        DB-->>TextPath: text results (chunk ids + scores)
    end

    alt searchType == "vector" or "hybrid"
        SearchAPI->>VectorPath: vectorSearch(embedding/dimension)
        VectorPath->>DB: similarity lookup
        DB-->>VectorPath: vector results (chunk ids + scores)
    end

    alt searchType == "hybrid"
        VectorPath-->>RRF: vector results
        TextPath-->>RRF: text results
        RRF->>RRF: merge using weights (textWeight/vectorWeight) -> ranked ids
        RRF->>SearchAPI: merged ranked results
    else single-path
        VectorPath-->>SearchAPI: vector results
        TextPath-->>SearchAPI: text results
    end

    SearchAPI->>DB: buildRanges / getRangesOfChunkIds(final ids)
    DB-->>SearchAPI: chunk contents & metadata
    SearchAPI-->>ClientLib: final results
    ClientLib-->>UI: render results
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through queries, soft and spry,

Vector, text, or hybrid I try,
I blend the scores, nibble weights with care,
Ranges stitched, results laid bare,
A joyful twitch — search made spry.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add hybrid vector + full-text search' accurately summarizes the main objective of the pull request, which adds hybrid search functionality combining vector and full-text search capabilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Pass the textSearch option through all search and askQuestion actions.
Add a toggle in the advanced options panel to enable hybrid search.
- Fix typo: "Recriprocal" → "Reciprocal" in hybridRank
- Apply user filters (OR semantics) to text search, matching vector search behavior
- Document that scores become position-based when hybrid search is enabled
- Use proper Id<"chunks"> types instead of string casts in hybrid merge
… path

- Add explicit namespaceId filter in text search when user filters are present
- Guard against empty merge results in hybrid path
- Hoist vector search above the branch point to avoid duplication
Use Doc<"chunks"> instead of inline type in textSearch toResults.
Add 9 tests covering textSearch query, namespace scoping, filters,
getChunkIdsByEmbeddingIds, getRangesOfChunkIds, hybrid search with
textQuery, deduplication, vector-only path, and weight parameters.
@richardsolomou richardsolomou marked this pull request as ready for review January 31, 2026 21:08
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 2, 2026

Open in StackBlitz

npm i https://pkg.pr.new/get-convex/rag/@convex-dev/rag@75

commit: 65fa6bc

Copy link
Member

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good!
Would love, in addition to the feedback below, for you to try out the pkg-pr-new link and test it with your app to make sure it behaves well for you.
Big ask here is around reducing the number of queries in search, which might affect the factoring of various pieces of code.

My bandwidth is limited, so if I don't get back to you soon, apologies! Hoping to get more folks on the team to help (we're hiring!)

…eries

- Replace `textSearch?: boolean` with `searchType?: "vector" | "text" | "hybrid"`
  to support text-only search mode (no embedding needed)
- Combine 3 separate queries in hybrid path into single `textAndRanges` query
  (vectorSearch remains separate as it requires action context)
- Export shared `vSearchType` validator and `SearchType` type from shared.ts
- Update example app to use shared types instead of inline unions
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/client/index.ts`:
- Around line 394-427: The current handling permits array queries to silently
fall back to vector-only even when searchType === "text", which results in no
text query and empty results; update the logic so that if searchType === "text"
and Array.isArray(args.query) you throw a descriptive error (referencing
searchType and args.query) instead of only warning, while preserving the
existing warning-and-fallback behavior for hybrid/hybrid-compatible cases
(searchType === "hybrid" or vector). Ensure the change is applied around the
existing variables/flow: searchType, needsEmbedding, needsTextQuery, args.query,
the embed() call and textQuery assignment so that embedding still occurs for
array queries when allowed and textQuery remains undefined only when legitimate.
🧹 Nitpick comments (1)
src/component/chunks.ts (1)

314-444: Avoid repeated entry reads in buildRanges.

You already load all entries up front; reusing them avoids extra DB gets per chunk.

♻️ Suggested refactor
-  const entries = (
-    await Promise.all(
-      Array.from(
-        new Set(chunks.filter((c) => c !== null).map((c) => c.entryId)),
-      ).map((id) => ctx.db.get(id)),
-    )
-  )
-    .filter((d) => d !== null)
-    .map(publicEntry);
+  const entryDocs = (
+    await Promise.all(
+      Array.from(
+        new Set(chunks.filter((c) => c !== null).map((c) => c.entryId)),
+      ).map((id) => ctx.db.get(id)),
+    )
+  ).filter((d): d is Doc<"entries"> => d !== null);
+  const entries = entryDocs.map(publicEntry);
+  const entryDocById = new Map(entryDocs.map((d) => [d._id, d]));
@@
-    const entry = await ctx.db.get(entryId);
-    assert(entry, `Entry ${entryId} not found`);
+    const entry = entryDocById.get(entryId);
+    assert(entry, `Entry ${entryId} not found`);

@richardsolomou
Copy link
Contributor Author

@ianmacartney All requested changes resolved. I've been using the pkg-pr-new link for a few days now and it's been super stable. Could you approve a new build of the package so I can test with the latest changes too please?

Copy link
Member

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really close!
Only changes necessary are formatting

namespace: v.string(),
embedding: v.array(v.number()),
embedding: v.optional(v.array(v.number())),
dimension: v.optional(v.number()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah this is an interesting nuance - I wonder if finding the namespace should happen higher up and pass in a namespaceId, since doing a query from the parent isn't any more expensive than doing a query first thing in this action.. the encapsulation is nice, but exposing that dimension is part of the namespace identifier is a bit funky. We can keep it like this for now and could in the future have a separate API possible to pass in a namespaceId instead of namespace

- Rename local SearchType to SearchScope in example to avoid confusion
  with the RAG package's SearchType
- Always pass searchType verbatim instead of conditionally omitting it
- Throw error when neither embedding nor textQuery is provided
- Add explicit vSearchType validator to server search action args
- Run prettier formatting
@richardsolomou
Copy link
Contributor Author

@ianmacartney should all be sorted out now 🤞

@ianmacartney
Copy link
Member

FYI I'll be in the backcountry for a few days so may not get to this until next week.
The package build should work for you in the meantime

Copy link
Member

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'll just push this out now

@ianmacartney ianmacartney merged commit 13bbef1 into get-convex:main Feb 10, 2026
2 checks passed
@ianmacartney
Copy link
Member

0.7.1

@ianmacartney
Copy link
Member

thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants