Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@ Other
* GITHUB#14761: Use more Comparators for PriorityQueue implementations. (Simon Cooper)
* GITHUB#14817: Refactor some complex uses of PriorityQueue to use Comparators. (Simon Cooper)

* GITHUB#14607: Index open performs version check on each segment, ignores indexCreatedVersionMajor (Rahul Goswami)

======================= Lucene 10.4.0 =======================

API Changes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -864,7 +864,7 @@ public void testFailOpenOldIndex() throws IOException {
assertTrue(
ex.getMessage()
.contains(
"This Lucene version only supports indexes created with major version "
"This Lucene version only supports indexes with major version "
+ Version.LATEST.major
+ " or later"));
// now open with allowed min version
Expand Down
80 changes: 59 additions & 21 deletions lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java
Original file line number Diff line number Diff line change
Expand Up @@ -346,30 +346,12 @@ public static final SegmentInfos readCommit(
input);
}

if (indexCreatedVersion < minSupportedMajorVersion) {
throw new IndexFormatTooOldException(
input,
"Index created with Lucene "
+ indexCreatedVersion
+ ".x is not supported by Lucene "
+ Version.LATEST
+ ". This Lucene version only supports indexes created with major version "
+ minSupportedMajorVersion
+ " or later (found: "
+ indexCreatedVersion
+ ", minimum: "
+ minSupportedMajorVersion
+ "). To resolve this issue: (1) Re-index your data using Lucene "
+ Version.LATEST.major
+ ".x, or (2) Use an older Lucene version that supports your index format.");
}

SegmentInfos infos = new SegmentInfos(indexCreatedVersion);
infos.id = id;
infos.generation = generation;
infos.lastGeneration = generation;
infos.luceneVersion = luceneVersion;
parseSegmentInfos(directory, input, infos, format);
parseSegmentInfos(directory, input, infos, format, minSupportedMajorVersion);
return infos;

} catch (Throwable t) {
Expand All @@ -385,7 +367,12 @@ public static final SegmentInfos readCommit(
}

private static void parseSegmentInfos(
Directory directory, DataInput input, SegmentInfos infos, int format) throws IOException {
Directory directory,
DataInput input,
SegmentInfos infos,
int format,
int minSupportedMajorVersion)
throws IOException {
infos.version = CodecUtil.readBELong(input);
// System.out.println("READ sis version=" + infos.version);
infos.counter = input.readVLong();
Expand All @@ -402,11 +389,38 @@ private static void parseSegmentInfos(
}

long totalDocs = 0;

for (int seg = 0; seg < numSegments; seg++) {
String segName = input.readString();
byte[] segmentID = new byte[StringHelper.ID_LENGTH];
input.readBytes(segmentID, 0, segmentID.length);
Codec codec = readCodec(input);
Codec codec = null;
try {
codec = readCodec(input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readCodec is private in this class so we're free to change it. Let's make it throw IndexFormatTooOldException directly; then there is no need to catch and rethrow.

Also -- it is still possible that the user actually did forget to load lucene-backward-codecs.jar and the version of the index they are trying to load actually is a supported one.

I think what we need to do is to add an explicit check on the index version somewhere so we can distinguish between a supported codec that failed to load and an unsupported (too old) codec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm thinking more about this, maybe we are unable to know what version the old index is if we cannot read the segmentinfos file and the best we can do is issue an exception like we did before? At least we should acknowledge that the problem might be caused by missing jars on the classpath rather than a very old index - we can't tell for sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is tricky. As such, we are able to read the SegmentInfos file and extract the indexCreatedVersionMajor which is how it was done previously. However if we base our decision on it, it would defeat the purpose of this PR.

The idea is that if the reindexing is done in a way as described in my talk and also in the discussion on this PR, then we could end up in a scenario where indexCreatedVersionMajor is pretty old, but all segments have still been reindexed with supported codec. And we should be able to open such an index.

I agree that the best course of action might be to retain the existing behavior of throwing IllegalArgumentException outlining the two possibilities for failure (backward-codecs-jar not being present or index is too old). Unless I can think of anything better, I guess I will just change TestAncientIndicesCompatibility instead.

Copy link
Contributor Author

@rahulgoswami rahulgoswami Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to solve in the readCodec() private method as below. Reasonable?

 catch (IllegalArgumentException e) {
      for (String codecName : Codec.availableCodecs()) {

      // other backward codecs are present in the classpath
        // so if this was a supported index version, we should have found it in the class path too
        // aka this is likely an old index.
        if (<condition to detect whether codecName belongs to to org.apache.lucene.backward_codecs>) {
        
          throw new IndexTooOldException(message);
        }
      }

      //older logic
      if (name.startsWith("Lucene")) {
        throw new IllegalArgumentException(
            "Could not load codec '"
                + name
                + "'. Did you forget to add lucene-backward-codecs.jar?",
            e);
      }
      throw e;
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... I am pretty sure that Codec.availableCodecs is simply listing what is available on the classpath? I think the best we can do is attempt to read the segmentinfos file, and if we fail, give a fairly generic exception message.

You seemed to be saying your system would be able to read an index for which we don't have any backward codec available ("As such, we are able to read the SegmentInfos file and extract the indexCreatedVersionMajor which is how it was done previously"), but how can that be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with your change we can still read indexes that were originally created with 8x and 9x (which would not have been possible before) as long as their segments have been rewritten by 10x or 11x, but if the index was created by 7x or earlier we would not.

Copy link
Contributor Author

@rahulgoswami rahulgoswami Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you seemed to be saying your system would be able to read an index for which we don't have any backward codec available ("As such, we are able to read the SegmentInfos file and extract the indexCreatedVersionMajor which is how it was done previously"), but how can that be?

Not the index, but looks like the nightly test failure suggests we can at least read the 'indexCreatedVersion' from SegmentInfos (aka 'segments_*' file) written in 7x without the codec present. If you see there are only two places at which readCommit() throws IndexFormatTooOldException. The previous nightly failure occurred because the below check was removed

if (indexCreatedVersion < minSupportedMajorVersion) {...}

which led to checking individual segments in parseSegmentInfos() (which is what we intend). I think it is in reading the segment level metadata/data where it needs the codec (Codec codec = readCodec(input) which is where the failure occurred)?
Which also means even the original implementation assumes that we'll always be able to read the "indexCreatedVersion" from SegmentInfos once we make it past the first check in readCommit() for too old index (if (magic != CodecUtil.CODEC_MAGIC))

Exception from nightly for reference:

    java.lang.IllegalArgumentException: Could not load codec 'Lucene70'. Did you forget to add lucene-backward-codecs.jar?
        at __randomizedtesting.SeedInfo.seed([14FBA2DCF615D67D:EF0090FF468E7121]:0)
        at [email protected]/org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:532)
        at [email protected]/org.apache.lucene.index.SegmentInfos.parseSegmentInfos(SegmentInfos.java:397)
        at [email protected]/org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:354)

I guess with your change we can still read indexes that were originally created with 8x and 9x (which would not have been possible before) as long as their segments have been rewritten by 10x or 11x

Exactly

but if the index was created by 7x or earlier we would not.

Going by the above explanation, I think at least for index created by 7x, we will be able to read, provided the segments have been rewritten by 10x or 11x.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something different, changing

      format = CodecUtil.checkHeaderNoMagic(input, "segments", VERSION_74, VERSION_CURRENT);

to

      format = CodecUtil.checkHeaderNoMagic(input, "segments", VERSION_86, VERSION_CURRENT);

since we can't read 7.4 anyway, why do we pretend to do so here?

and this makes the test pass up until version 8.0 indexes, when it fails in a new way I haven't understood yet (the below assertion fails for 8.0):

     if (version.startsWith("8.") || version.startsWith("9.")) {
       assertTrue(indexStatus.clean);
     }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me we still need a "min supported major" version check for the index, but it will be for a different version than the per-segment check (ie 8 instead of 10).

} catch (IllegalArgumentException e) {
if (e.getMessage() != null && e.getMessage().contains("Could not load codec")) {
// maybe we tried loading an old default codec which isn't present in backward-codecs
// anymore.
// aka index is too old
throw new IndexFormatTooOldException(
input,
"Index has segments derived from Lucene version "
+ infos.indexCreatedVersionMajor
+ ".x and is not supported by Lucene "
+ Version.LATEST
+ ". This Lucene version only supports indexes with major version "
+ minSupportedMajorVersion
+ " or later (found: "
+ infos.indexCreatedVersionMajor
+ ", minimum supported: "
+ minSupportedMajorVersion
+ "). To resolve this issue re-index your data using Lucene "
+ minSupportedMajorVersion
+ ".x or later.");
} else {
throw e;
}
}
SegmentInfo info =
codec.segmentInfoFormat().read(directory, segName, segmentID, IOContext.READONCE);
info.setCodec(codec);
Expand Down Expand Up @@ -495,6 +509,30 @@ private static void parseSegmentInfos(
+ infos.indexCreatedVersionMajor,
input);
}

int createdOrSegmentMinVersion =
info.getMinVersion() == null
? infos.indexCreatedVersionMajor
: info.getMinVersion().major;

// version >=7 are expected to record minVersion
if (info.getMinVersion() == null || info.getMinVersion().major < minSupportedMajorVersion) {
throw new IndexFormatTooOldException(
input,
"Index has segments derived from Lucene version "
+ createdOrSegmentMinVersion
+ ".x and is not supported by Lucene "
+ Version.LATEST
+ ". This Lucene version only supports indexes with major version "
+ minSupportedMajorVersion
+ " or later (found: "
+ createdOrSegmentMinVersion
+ ", minimum supported: "
+ minSupportedMajorVersion
+ "). To resolve this issue re-index your data using Lucene "
+ minSupportedMajorVersion
+ ".x or later.");
}
}

infos.userData = input.readMapOfStrings();
Expand Down
Loading