-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Revise strategy for opening an index for reading #14607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rahulgoswami
wants to merge
16
commits into
apache:main
Choose a base branch
from
rahulgoswami:update-created-version-major
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+62
−22
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
2ae700d
Changes for API to allow updating indexCreatedVersionMajor for Segmen…
rahulgoswami 60461d2
Merge branch 'main' into update-created-version-major
rahulgoswami 128b13a
better naming and comments
rahulgoswami 01820b6
Merge branch 'main' into update-created-version-major
rahulgoswami 8975898
made changes more generic
rahulgoswami ff5bde9
look at version of individual segments to make decision to open index…
rahulgoswami 10303e8
Merge branch '14607_revise_index_open_strategy' into update-created-v…
rahulgoswami 43b0efc
Merge branch 'main' into update-created-version-major
rahulgoswami f9b7ec2
revert changes to overrite indexCreatedVersionMajor in IndexWriter an…
rahulgoswami 8015f2e
changes.txt
rahulgoswami 96c278c
backwards compatibility tests
rahulgoswami 841e7bb
realign for better readability and simplify logic
rahulgoswami ba9ab0c
rename variable and add comment
rahulgoswami b789067
Merge branch 'main' into update-created-version-major
rahulgoswami 150e82b
Merge branch 'main' into update-created-version-major
rahulgoswami c84c526
1)Push entire commit after previous reversion for clean diff 2)Handle…
rahulgoswami File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readCodec is private in this class so we're free to change it. Let's make it throw
IndexFormatTooOldExceptiondirectly; then there is no need to catch and rethrow.Also -- it is still possible that the user actually did forget to load lucene-backward-codecs.jar and the version of the index they are trying to load actually is a supported one.
I think what we need to do is to add an explicit check on the index version somewhere so we can distinguish between a supported codec that failed to load and an unsupported (too old) codec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm thinking more about this, maybe we are unable to know what version the old index is if we cannot read the segmentinfos file and the best we can do is issue an exception like we did before? At least we should acknowledge that the problem might be caused by missing jars on the classpath rather than a very old index - we can't tell for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is tricky. As such, we are able to read the SegmentInfos file and extract the indexCreatedVersionMajor which is how it was done previously. However if we base our decision on it, it would defeat the purpose of this PR.
The idea is that if the reindexing is done in a way as described in my talk and also in the discussion on this PR, then we could end up in a scenario where indexCreatedVersionMajor is pretty old, but all segments have still been reindexed with supported codec. And we should be able to open such an index.
I agree that the best course of action might be to retain the existing behavior of throwing IllegalArgumentException outlining the two possibilities for failure (backward-codecs-jar not being present or index is too old). Unless I can think of anything better, I guess I will just change TestAncientIndicesCompatibility instead.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempting to solve in the readCodec() private method as below. Reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... I am pretty sure that
Codec.availableCodecsis simply listing what is available on the classpath? I think the best we can do is attempt to read the segmentinfos file, and if we fail, give a fairly generic exception message.You seemed to be saying your system would be able to read an index for which we don't have any backward codec available ("As such, we are able to read the SegmentInfos file and extract the indexCreatedVersionMajor which is how it was done previously"), but how can that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess with your change we can still read indexes that were originally created with 8x and 9x (which would not have been possible before) as long as their segments have been rewritten by 10x or 11x, but if the index was created by 7x or earlier we would not.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the index, but looks like the nightly test failure suggests we can at least read the 'indexCreatedVersion' from SegmentInfos (aka 'segments_*' file) written in 7x without the codec present. If you see there are only two places at which readCommit() throws IndexFormatTooOldException. The previous nightly failure occurred because the below check was removed
which led to checking individual segments in parseSegmentInfos() (which is what we intend). I think it is in reading the segment level metadata/data where it needs the codec (
Codec codec = readCodec(input)which is where the failure occurred)?Which also means even the original implementation assumes that we'll always be able to read the "indexCreatedVersion" from SegmentInfos once we make it past the first check in readCommit() for too old index (
if (magic != CodecUtil.CODEC_MAGIC))Exception from nightly for reference:
Exactly
Going by the above explanation, I think at least for index created by 7x, we will be able to read, provided the segments have been rewritten by 10x or 11x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something different, changing
to
since we can't read 7.4 anyway, why do we pretend to do so here?
and this makes the test pass up until version 8.0 indexes, when it fails in a new way I haven't understood yet (the below assertion fails for 8.0):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me we still need a "min supported major" version check for the index, but it will be for a different version than the per-segment check (ie 8 instead of 10).