Skip to content

Conversation

expani
Copy link
Contributor

@expani expani commented Apr 16, 2025

Description

Upgrading to Lucene 10.2.1
https://lucene.apache.org/core/10_2_1/changes/Changes.html

Performance Testing Areas

  • Snapshot generation and testing search heavy workloads like Big5 via multiple runs
  • Ensuring no new regressions seen in indexing like force merge time seen with Lucene 10.1.0

Copy link
Contributor

❌ Gradle check result for 5e74113: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 93c4e0c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 9a34fb8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 84276e4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harshavamsi
Copy link
Contributor

@expani
Copy link
Contributor Author

expani commented Apr 16, 2025

@harshavamsi I was thinking if you can merge the constant scorer change with some context as to why it helps. I can rebase it once merged and fix Lucene 10.2.0 upgrade stuff.

I want to focus on test failures in this PR. Like this one

REPRODUCE WITH: ./gradlew ':plugins:analysis-icu:test' --tests "org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile" -Dtests.seed=5147BC3990E890F4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=kw -Dtests.timezone=Pacific/Gambier -Druntime.java=21

IcuTokenizerFactoryTests > testIcuCustomizeRuleFile FAILED
    java.lang.ExceptionInInitializerError
        at __randomizedtesting.SeedInfo.seed([5147BC3990E890F4:91E71A7FBEA22BE7]:0)
        at org.opensearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:541)
        at org.opensearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:338)
        at org.opensearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:241)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1767)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1755)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.createTestAnalysis(IcuTokenizerFactoryTests.java:129)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile(IcuTokenizerFactoryTests.java:67)

        Caused by:
        com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
            at app//com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:506)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:354)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:347)
            at app//com.ibm.icu.impl.SoftCache.getInstance(SoftCache.java:69)
            at app//com.ibm.icu.impl.Norm2AllModes.getInstance(Norm2AllModes.java:344)
            at app//com.ibm.icu.text.Normalizer2.getInstance(Normalizer2.java:219)
            at app//org.opensearch.index.analysis.IcuFoldingTokenFilterFactory.<clinit>(IcuFoldingTokenFilterFactory.java:57)
            ... 7 more

            Caused by:
            java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
                at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:606)
                at com.ibm.icu.impl.ICUBinary.readHeaderAndDataVersion(ICUBinary.java:557)
                at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:453)
                ... 13 more

@harshavamsi
Copy link
Contributor

@harshavamsi I was thinking if you can merge the constant scorer change with some context as to why it helps. I can rebase it once merged and fix Lucene 10.2.0 upgrade stuff.

I want to focus on test failures in this PR. Like this one

REPRODUCE WITH: ./gradlew ':plugins:analysis-icu:test' --tests "org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile" -Dtests.seed=5147BC3990E890F4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=kw -Dtests.timezone=Pacific/Gambier -Druntime.java=21

IcuTokenizerFactoryTests > testIcuCustomizeRuleFile FAILED
    java.lang.ExceptionInInitializerError
        at __randomizedtesting.SeedInfo.seed([5147BC3990E890F4:91E71A7FBEA22BE7]:0)
        at org.opensearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:541)
        at org.opensearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:338)
        at org.opensearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:241)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1767)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1755)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.createTestAnalysis(IcuTokenizerFactoryTests.java:129)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile(IcuTokenizerFactoryTests.java:67)

        Caused by:
        com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
            at app//com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:506)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:354)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:347)
            at app//com.ibm.icu.impl.SoftCache.getInstance(SoftCache.java:69)
            at app//com.ibm.icu.impl.Norm2AllModes.getInstance(Norm2AllModes.java:344)
            at app//com.ibm.icu.text.Normalizer2.getInstance(Normalizer2.java:219)
            at app//org.opensearch.index.analysis.IcuFoldingTokenFilterFactory.<clinit>(IcuFoldingTokenFilterFactory.java:57)
            ... 7 more

            Caused by:
            java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
                at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:606)
                at com.ibm.icu.impl.ICUBinary.readHeaderAndDataVersion(ICUBinary.java:557)
                at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:453)
                ... 13 more

sounds good, i'll add some context

Copy link
Contributor

❌ Gradle check result for 48d60bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@expani
Copy link
Contributor Author

expani commented Apr 16, 2025

Failure due to known flaky test #15806 with different seeds

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=A77C55DF82AC524C -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ann -Dtests.timezone=W-SU -Druntime.java=21

Tried the same with current mainline and it fails as well. So, don't think it's related to Lucene 10.2.0 upgrade.

@andrross andrross force-pushed the lucene_10_2_0_upgrade branch from d7b39c5 to 9d8ce0b Compare May 15, 2025 19:42
Copy link
Contributor

✅ Gradle check result for 9d8ce0b: SUCCESS

@andrross
Copy link
Member

@expani What do you think? Should we merge this?

@getsaurabh02
Copy link
Member

getsaurabh02 commented May 19, 2025

@andrross @expani @harshavamsi Can we please merge this and follwup on the regressions separately, with a plan for 3.1

@asimmahmood1
Copy link
Contributor

@andrross @expani is OOO, I'm ok the merge this. I can help track any of the performance regression with goal to get fixed by 3.1 release.

@andrross
Copy link
Member

I've got some commits in this PR, so I'd like to get another maintainer review. @mch2 @msfroh Can one of you take a look?

@mch2 mch2 merged commit 370cd8c into opensearch-project:main May 19, 2025
29 of 30 checks passed
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Jun 1, 2025
* Upgrade lucene to version 10.2.0

Signed-off-by: expani <[email protected]>

* Removed usage of non public constructor for DocIdSetBuilder

Signed-off-by: expani <[email protected]>

* Increment version and fixed another compilation error

Signed-off-by: expani <[email protected]>

* Updating license sha for lucene 10.2.0

Signed-off-by: expani <[email protected]>

* Upgraded icu4j in conjunction with Lucene 10.2.0

Signed-off-by: expani <[email protected]>

* update sha for icu4j

Signed-off-by: expani <[email protected]>

* Update to 10.2.1

Signed-off-by: Andrew Ross <[email protected]>

* Add changelog entry

Signed-off-by: Andrew Ross <[email protected]>

* Updated test based on Lucene-opensearch-project#14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Delegating nextDoc to advance as previous assumption doesn't hold with current ConstantScoreSupplier

Signed-off-by: expani <[email protected]>

* Implemented cost function

Signed-off-by: expani <[email protected]>

---------

Signed-off-by: expani <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
Co-authored-by: Andrew Ross <[email protected]>
neuenfeldttj added a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
* Upgrade lucene to version 10.2.0

Signed-off-by: expani <[email protected]>

* Removed usage of non public constructor for DocIdSetBuilder

Signed-off-by: expani <[email protected]>

* Increment version and fixed another compilation error

Signed-off-by: expani <[email protected]>

* Updating license sha for lucene 10.2.0

Signed-off-by: expani <[email protected]>

* Upgraded icu4j in conjunction with Lucene 10.2.0

Signed-off-by: expani <[email protected]>

* update sha for icu4j

Signed-off-by: expani <[email protected]>

* Update to 10.2.1

Signed-off-by: Andrew Ross <[email protected]>

* Add changelog entry

Signed-off-by: Andrew Ross <[email protected]>

* Updated test based on Lucene-opensearch-project#14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Delegating nextDoc to advance as previous assumption doesn't hold with current ConstantScoreSupplier

Signed-off-by: expani <[email protected]>

* Implemented cost function

Signed-off-by: expani <[email protected]>

---------

Signed-off-by: expani <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
Co-authored-by: Andrew Ross <[email protected]>Signed-off-by: TJ Neuenfeldt <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
* Upgrade lucene to version 10.2.0

Signed-off-by: expani <[email protected]>

* Removed usage of non public constructor for DocIdSetBuilder

Signed-off-by: expani <[email protected]>

* Increment version and fixed another compilation error

Signed-off-by: expani <[email protected]>

* Updating license sha for lucene 10.2.0

Signed-off-by: expani <[email protected]>

* Upgraded icu4j in conjunction with Lucene 10.2.0

Signed-off-by: expani <[email protected]>

* update sha for icu4j

Signed-off-by: expani <[email protected]>

* Update to 10.2.1

Signed-off-by: Andrew Ross <[email protected]>

* Add changelog entry

Signed-off-by: Andrew Ross <[email protected]>

* Updated test based on Lucene-opensearch-project#14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Updated test based on Lucene-14561

Signed-off-by: expani <[email protected]>

* Delegating nextDoc to advance as previous assumption doesn't hold with current ConstantScoreSupplier

Signed-off-by: expani <[email protected]>

* Implemented cost function

Signed-off-by: expani <[email protected]>

---------

Signed-off-by: expani <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
Co-authored-by: Andrew Ross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.