feat(redis): Advanced pre-filter for document metadata #9240

tishun · 2025-10-20T16:57:30Z

First off - massive apologies to @Dag7 - we seem to both started to work on generally the same issue and we only realised that when we had to merge his changes in #8963.

Goal

The purpose of this change is to align the implementation of langchainjs with that of langchain, specifically in regards to using Redis as a vector store. There are some major drifts in both solutions, some of which make them incompatible with each other (one app using langchainjs and one langchain might have trouble saving to the same Redis store).

Solution

Some of these were addressed by @Dag7 already, but we wanted to provide a more complete implementation, including:

API abstractions that make it easier to build up custom queries, possibly addressing problems such as How to use RedisVectorStoreFilterType? #5010
do not expose driver specific (node-redis) models as part of the API of langchainjs (potentially allowing for driver change of required)
using custom filters does not rely on providing a custom schema, instead the langchainjs driver would infer the schema from the provided metadata
attempt to provide backwards compatibility with the old implementation, instead of providing a new one
extend the integration tests with a lot of scenarios that were missing
use UUIDs for generating keys, similar to langchain
etc.

Please let us know if there is something we can do to improve this solution.

IMHO it would be very good to have it in the 1.0 release, otherwise we would have to change the contract again after releasing the solution provided by @Dag7

changeset-bot · 2025-10-20T16:57:34Z

⚠️ No Changeset found

Latest commit: 121d7a0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Dag7 · 2025-10-20T20:21:48Z

Hey @tishun 👋
No worries at all 😅

My main goal with this change was to keep it fully backward compatible — for example, I intentionally avoided forcing UUIDs, since many existing users already rely on predictable or custom keys. Forcing UUIDs could break compatibility or make adoption slower for those users.

I really like your approach to making the implementation driver-agnostic — that direction makes a lot of sense. We’ll just need to make sure we add some form of adapter layer since different clients (like ioredis vs node-redis) behave differently in a few areas (pipelines, return types, etc.).

Regarding the schema, I decided to have it explicitly defined at index creation time. That way, the index and the schema are always aligned, and we can validate metadata on insert. If we infer the schema automatically from the first batch of documents, we might not capture all possible fields (since not all are required), which leads to a tricky situation:

The index would be created based on an incomplete schema
Later inserts could include new fields or types we will have to decide whether to re-index, ignore, or error out — all of which have trade-offs.

So defining the schema first felt like the safest and most predictable approach.

let me know what you think

tishun · 2025-10-22T11:09:53Z

My main goal with this change was to keep it fully backward compatible — for example, I intentionally avoided forcing UUIDs, since many existing users already rely on predictable or custom keys. Forcing UUIDs could break compatibility or make adoption slower for those users.

Agreed, backwards compatibility (and predictability) will be sacrificed if we choose to use UUIDs. We would gain - however - compatibility with the python implementation. To be fair they are not really incompatible and collisions are highly unlikely even now, but strictly speaking they will generate keys in two different ways. One could argue that this is also a benefit, allowing us to identify which driver added which vectors, so I am not at all adamant on this change. If you recommend that we revert to the old implementation I see no problem for us to do that (specifically in that regard).

I really like your approach to making the implementation driver-agnostic — that direction makes a lot of sense. We’ll just need to make sure we add some form of adapter layer since different clients (like ioredis vs node-redis) behave differently in a few areas (pipelines, return types, etc.).

True, and this is also why I stopped short at a complete solution. The change becomes quite large; and the benefit right now is not entirely visible (Redis intends to support node-redis primarily and in terms of functionality, quality and performance it is also better). This part of my change was more "good-practise" and would allow (should this becomes necessary) to have a smaller impact if we migrate from one driver to another.

So defining the schema first felt like the safest and most predictable approach.

Completely reasonable. Using a custom schema is definitely the more stable solution; and I assume most users would do that. Inferring the schema would be a generally lazier (but still valid - if used correctly) approach. I think they both serve different use cases:

a simple approach where metadata is always the same for all documents and it could (safely) be inferred by the driver; and thus a much simpler usage is required
an advanced use-case - perhaps the one most user would choose - where the metadata schema is defined by the user and the driver follows these definitions strictly

BTW I am not sure it was apparent from my description, but currently both modes are available:

legacy filter or legacy metadata field (in the existing vector storage) results in legacy metadata handling
missing custom schema results in a custom metadata schema being inferred from the first batch of documents
existing custom metadata schema results in it being applied with priority

This is also - mostly - how the Python implementation of langchain works.

Does that make sense?

vchomakov · 2025-10-24T15:49:48Z

libs/providers/langchain-redis/package.json

    "format:check": "prettier --config .prettierrc --check \"src\""
  },
  "dependencies": {
+    "uuid": "^10.0.0",


Did you run pnpm install after this change? It would probably update the pnpm-lock.yaml that should also be added to the PR.

feat(redis): Advanced pre-filter for document metadata

fc28f56

tishun mentioned this pull request Oct 20, 2025

release(*): 1.0.0 #9226

Merged

Merge issue, reverting this change

121d7a0

vchomakov reviewed Oct 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(redis): Advanced pre-filter for document metadata #9240

feat(redis): Advanced pre-filter for document metadata #9240

tishun commented Oct 20, 2025

Uh oh!

changeset-bot bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

Dag7 commented Oct 20, 2025

Uh oh!

tishun commented Oct 22, 2025

Uh oh!

vchomakov Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(redis): Advanced pre-filter for document metadata #9240

Are you sure you want to change the base?

feat(redis): Advanced pre-filter for document metadata #9240

Conversation

tishun commented Oct 20, 2025

Goal

Solution

Uh oh!

changeset-bot bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Dag7 commented Oct 20, 2025

Uh oh!

tishun commented Oct 22, 2025

Uh oh!

vchomakov Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Oct 20, 2025 •

edited

Loading