-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(redis): Advanced pre-filter for document metadata #9240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(redis): Advanced pre-filter for document metadata #9240
Conversation
|
|
Hey @tishun 👋 My main goal with this change was to keep it fully backward compatible — for example, I intentionally avoided forcing UUIDs, since many existing users already rely on predictable or custom keys. Forcing UUIDs could break compatibility or make adoption slower for those users. I really like your approach to making the implementation driver-agnostic — that direction makes a lot of sense. We’ll just need to make sure we add some form of adapter layer since different clients (like ioredis vs node-redis) behave differently in a few areas (pipelines, return types, etc.). Regarding the schema, I decided to have it explicitly defined at index creation time. That way, the index and the schema are always aligned, and we can validate metadata on insert. If we infer the schema automatically from the first batch of documents, we might not capture all possible fields (since not all are required), which leads to a tricky situation:
So defining the schema first felt like the safest and most predictable approach. let me know what you think |
Agreed, backwards compatibility (and predictability) will be sacrificed if we choose to use UUIDs. We would gain - however - compatibility with the python implementation. To be fair they are not really incompatible and collisions are highly unlikely even now, but strictly speaking they will generate keys in two different ways. One could argue that this is also a benefit, allowing us to identify which driver added which vectors, so I am not at all adamant on this change. If you recommend that we revert to the old implementation I see no problem for us to do that (specifically in that regard).
True, and this is also why I stopped short at a complete solution. The change becomes quite large; and the benefit right now is not entirely visible (Redis intends to support node-redis primarily and in terms of functionality, quality and performance it is also better). This part of my change was more "good-practise" and would allow (should this becomes necessary) to have a smaller impact if we migrate from one driver to another.
Completely reasonable. Using a custom schema is definitely the more stable solution; and I assume most users would do that. Inferring the schema would be a generally lazier (but still valid - if used correctly) approach. I think they both serve different use cases:
BTW I am not sure it was apparent from my description, but currently both modes are available:
This is also - mostly - how the Python implementation of langchain works. Does that make sense? |
| "format:check": "prettier --config .prettierrc --check \"src\"" | ||
| }, | ||
| "dependencies": { | ||
| "uuid": "^10.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run pnpm install after this change? It would probably update the pnpm-lock.yaml that should also be added to the PR.
First off - massive apologies to @Dag7 - we seem to both started to work on generally the same issue and we only realised that when we had to merge his changes in #8963.
Goal
The purpose of this change is to align the implementation of langchainjs with that of langchain, specifically in regards to using Redis as a vector store. There are some major drifts in both solutions, some of which make them incompatible with each other (one app using langchainjs and one langchain might have trouble saving to the same Redis store).
Solution
Some of these were addressed by @Dag7 already, but we wanted to provide a more complete implementation, including:
Please let us know if there is something we can do to improve this solution.
IMHO it would be very good to have it in the 1.0 release, otherwise we would have to change the contract again after releasing the solution provided by @Dag7