-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate derived source from filter to mask #2612
Migrate derived source from filter to mask #2612
Conversation
05a55c1
to
0c0536f
Compare
Evidence bwc test works: https://github.com/opensearch-project/k-NN/actions/runs/13933569801/job/38996400312?pr=2612. Will fix now |
6659ee5
to
7493a93
Compare
...org/opensearch/knn/index/codec/backward_codecs/KNN9120Codec/KNN9120DerivedSourceReaders.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/derivedsource/DerivedMapHelper.java
Outdated
Show resolved
Hide resolved
7493a93
to
293ea53
Compare
src/main/java/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010Codec.java
Show resolved
Hide resolved
...va/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010DerivedSourceStoredFieldsReader.java
Show resolved
Hide resolved
...va/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010DerivedSourceStoredFieldsWriter.java
Outdated
Show resolved
Hide resolved
...va/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010DerivedSourceStoredFieldsWriter.java
Show resolved
Hide resolved
...va/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010DerivedSourceStoredFieldsWriter.java
Outdated
Show resolved
Hide resolved
...rch/knn/index/codec/backward_codecs/KNN9120Codec/KNN9120DerivedSourceStoredFieldsReader.java
Show resolved
Hide resolved
...ensearch/knn/index/codec/backward_codecs/KNN9120Codec/RootPerFieldDerivedVectorInjector.java
Show resolved
Hide resolved
...a/org/opensearch/knn/index/codec/derivedsource/AbstractPerFieldDerivedVectorTransformer.java
Outdated
Show resolved
Hide resolved
.../java/org/opensearch/knn/index/codec/derivedsource/RootPerFieldDerivedVectorTransformer.java
Outdated
Show resolved
Hide resolved
.../java/org/opensearch/knn/index/codec/derivedsource/RootPerFieldDerivedVectorTransformer.java
Show resolved
Hide resolved
Hi @jmazanec15 , just throwing an idea regarding to get the first child doc id for nested case. Lemma 1. The DFS visit order at the last doc, which is the top level doc having
|
@0ctopus13prime - I think thats a pretty interesting idea. I worry that it might run into issues when there are arrays that are not necessarily related to nested documents. However, I think it makes sense to follow up in #2626 |
293ea53
to
6b65f36
Compare
Discovered #2625 - will address this in a follow up PR. @shatejas and @0ctopus13prime could I get a follow up review? |
Tests failing because common utils needs to upgrade to beta1 - see opensearch-project/common-utils#808 |
...va/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010DerivedSourceStoredFieldsReader.java
Show resolved
Hide resolved
6b65f36
to
89a725e
Compare
Migrates derived source functionality from filter to mask based approach. Moves old read functionality and related classes to backwards codecs KNN9120... Removes old write as no longer necessary. In order to support bwc, we add custom functionality in the stored fields writer merge logic to fall back to base, non-optimized merge if it detects older readers in the merge state. This is needed because for these segments, we need to rebuild the source and then apply filter to migrate to new write format. Signed-off-by: John Mazanec <[email protected]>
89a725e
to
a117d38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Once it passes CI, will approve it.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can approve once CI passes
Description
Migrates derived source functionality from filter to mask based approach in a bwc way. The change can be summed up as instead of remove vector fields from source, it replaces them with a smaller representation (mask). When we need to add them back, we transform the source map to replace the masks with the actual vectors. This makes handling nested docs much easier.
For backwards compatibility, this PR moves old read functionality and related classes to backwardscodecs/KNN9120Codec package and removes old write as no longer necessary. On merge, we add custom functionality in the stored fields writer merge logic to fall back to base, non-optimized merge if it detects older readers in the merge state. For this, we will reconstruct the source document with the reader and then apply the mask again on top of it to remove the vectors. This ensures that segments are migrated to the mask approach. To verify this, I added several backwards compatibility tests.
There are 40+ file changes, but most of them are just moving files to backwards codec.
Adding a few more test cases today.
Related Issues
Resolves #2377
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.