Releases · apache/lucene

06 Oct 14:37

benwtrent

releases/lucene/10.3.1

51190f3

10.3.1 Latest

Latest

Bug fixes

Fix the Impact returned from Lucene103PostingsReader when frequencies are not indexed.

What's Changed

[10.3] Fix returned Impacts when frequencies are not indexed by @iverase in #15263

Full Changelog: releases/lucene/10.3.0...releases/lucene/10.3.1

Contributors

iverase

Assets 2

28 Sep 00:23

jainankitk

releases/lucene/9.12.3

f965e93

9.12.3

Bug fixes

Support for JDK24+
Reduce sharedArenaMaxPermits from 1024 to 64
Use READONCE to read segment infos
Fix rare spin-loop in RefCountedSharedArena
Ensure vector queries handle advanceShallow correctly
Fix failure due to hole bridge being coplanar with polygon edge

Assets 2

13 Sep 20:35

vigyasharma

releases/lucene/10.3.0

e287128

10.3.0

Lucene 10.3 brings major performance improvements.

Lexical search is now vectorized to better take advantage of SIMD instructions, more efficient memory access patterns, CPU pipelining and amortize the cost of virtual function calls. Lucene's nightly benchmarks report a 40% speedup compared with Lucene 10.2 when computing top-100 hits by score on disjunctive and conjunctive queries.

Vector search now better parallelizes fetching vectors into the CPU cache. Lucene's nightly benchmarks report a 15%-20% speedup compared with Lucene 10.2.

The terms dictionary performs about 30% faster than in Lucene 10.2 on primary-key lookups according to Lucene's nightly benchmarks. This should help speed up workloads that rely on terms dictionary lookup performance including primary-key lookups, indexing operations that specify an ID and TermInSet queries.

New Features

Supports reranking with late interaction model multi-vectors, full precision vector similarity scores, or any provided DoubleValuesSource, enabling improved ranking of search results.
Adds a MultiIndexMergeScheduler – a multi-tenant wrapper that allows sharing a common merge scheduler across multiple instances.

API Changes

Adds API to fetch the size of off-heap memory required by a KNN field. This size can be used to help determine the memory requirements for optimal search performance, which can be greatly affected by page faults when not enough memory is available.
RandomVectorScorer now supports a bulk scoring interface.
LeafReader#searchNearestVectors now accepts an AcceptDocs instance instead of a Bits instance to identify document IDs to filter.
Collectors can now take advantage of pre-aggregated data to speed up faceting using LeafCollector#collectRange.

Improvements and Optimizations

Adds optimistic knn search to vector queries. Optimistic knn search addresses a major issue where we return inconsistent results due to race conditions in the shared queue previously used over multi-segment search.
Faster vector search on HNSW graphs through GroupVarInt encoding.
Searcher managers now support 'Adaptive Refresh', enabling users to control the commit points it refreshes on. This helps with graceful handling of large replication payloads in segment-replicated systems.

Runtime Behavior Changes and Bug Fixes

The default ReadAdvice has been changed from RANDOM to NORMAL. MMapDirectory will no longer set any specific read advice out-of-the-box.
Default RefCountedSharedArena.DEFAULT_MAX_PERMITS are reduced to 64. Also fixes the infinite loop when RefCountedSharedArena's underlying Arena#close fails due to concurrent usage of segments.
Uses READONCE when reading segment infos, to fix mmap leaks on segment info files. Includes fixes for multiple other resource leaks.

Assets 2

20 Jun 16:19

ChrisHegarty

releases/lucene/9.12.2

5a26234

9.12.2

Bug fixes

Reduce NeighborArray on-heap memory during HNSW graph building
Fix IndexSortSortedNumericDocValuesRangeQuery for int sort
ValueSource.fromDoubleValuesSource(dvs).getSortField() would throw errors when used if the DoubleValuesSource needed scores
Disable connectedComponents logic in HNSW graph building.

Assets 2

20 Jun 20:51

ChrisHegarty

releases/lucene/10.2.2

279eb7a

10.2.2

Bug fixes

Reduce NeighborArray on-heap memory during HNSW graph building
Fix IndexSortSortedNumericDocValuesRangeQuery for int sort
ValueSource.fromDoubleValuesSource(dvs).getSortField() would throw errors when used if the DoubleValuesSource needed scores

Assets 2

01 May 13:21

ChrisHegarty

releases/lucene/10.2.1

1b2451b

10.2.1

This patch release contains bug fixes that are highlighted below.

Fix DISIDocIdStream::count so that it does not try to count beyond max.
Correct TermOrdValComparator competitive iterator so that it forces sparse field iteration to be at least scoring window baseline when doing intoBitSet.
Provide better impacts for fields indexed with IndexOptions.DOCS
Fixed lead cost computations for bulk scorers of conjunctive queries that mix MUST and FILTER clauses, and disjunctive queries that configure a minimum number of matching SHOULD clauses.

Assets 2

10 Apr 10:17

iverase

releases/lucene/10.2.0

f624336

10.2.0

Lucene 10.2 includes major search-time performance improvements for a wide variety of queries. This is most notably due to:

Improved storage format of doc IDs in BKD trees for faster decoding.
More vectorization when processing PointRangeQuerys and non-scoring BooleanQuerys.
Encoding of dense blocks of postings lists as bit sets instead of FOR-delta. This change also saves a bit of storage.
Merging matches of dense conjunctive clauses using bitwise ANDs. This especially helps on postings blocks that are encoded as bit sets.
Implementing the ACORN-1 algorithm for pre-filtered vector searches.
Searches that don't require scores and match many docs should generally see good speedups, depending on how expensive the Collector is. Compared with Lucene 10.1.0, Lucene's nightly benchmarks report the following speedups when counting the number of hits of a the following queries:
* Disjunctions of term queries: 77% to 4x faster
* Conjunctions of term queries: 38% to 5x faster
* Filtered disjunctions of term queries: 2.5x to 4x faster
* Filtered PointRangeQuery: 3.5x faster
And the following speedup when computing top-100 hits:
* Pre-filtered vector search: 3.5x faster

Changes in Runtime Behavior

TieredMergePolicy's default floor segment size was increased from 2MB to 16MB. This is expected to result in slightly slower indexing and about 10 fewer segments per index for applications that flush frequently. This should in-turn help speed up queries that have a high per-segment overhead such as multi-term queries, point queries and vector search.

New Features

Added TopDocs#rrf to combine multiple TopDocs instances using reciprocal rank fusion.
Added SeededKnnVectorQuery, an optimization to KnnVectorQuery that allows selecting better entry points for vector search using a seed Query.

Improvements

RegexpQuery support for unicode case-insensitive characters and ranges.
Optimizations
Java 24 vector API support
Efficiency improvements to Automaton and RegExp
Faster merging of HNSW graphs which translated in a 25% indexing speedup in Lucene's nightly benchmarks.
Conjunctive queries can now skip applying clauses when they have long runs of matching docs, a case which is not uncommon when an index sort is configured.
Reduce heap usage during BKD tree merges.

Assets 2

20 Dec 20:32

javanna

releases/lucene/10.1.0

8849540

10.1.0

New Features

Add IndexInput::isLoaded to determine if the contents of an input is resident in physical memory
FeatureField now supports storing term vectors.

Improvements

TieredMergePolicy now allows merging up to maxMergeAtOnce segments for merges below the floor segment size, even if maxMergeAtOnce is greater than segmentsPerTier. This makes it more efficient to configure TieredMergePolicy to merge segments aggressively by configuring a high value of floorSegmentSize (e.g. 64MB), a low value of segmentsPerTier (e.g. 4) and a high value of maxMergeAtOnce (e.g. 32).

Optimizations

Many speedups to top-k query evaluation, in particular: top-level disjunctions, filtered disjunctions, conjunctions, DisjunctionMaxQuery.
Speedup to exhaustive evaluation of conjunctive queries by vectorizing the intersection of postings lists.
Reduced contention for top-k query evaluation when IndexSearcher is configured with an executor.

Assets 2

13 Dec 11:23

ChrisHegarty

releases/lucene/9.12.1

7a97a05

9.12.1

Improvements

Allow easier configuration of the Panama vectorization provider with newer Java versions. Set the org.apache.lucene.vectorization.upperJavaFeatureVersion system property to increase the set of Java versions that Panama vectorization will provide optimized implementations for.

Bug fixes

Fixed backwards compatibility bug that caused sparse (not all documents have a vector) KNN indices written with 9.0.0 to give silently (no exception) terrible recall results when searched by any 9.x release
Improve Tessellatorlogic when two holes share the same vertex with the polygon which was failing in valid polygons.
Fix backwards compatibility bug that caused 9.12.0 to incorrectly throw IllegalStateException when trying to open an IndexReader on an index created with quantized (int4, int7, int8) KNN vectors using Lucene99HnswScalarQuantizedVectorsFormat.

Assets 2

14 Oct 13:02

javanna

releases/lucene/10.0.0

eadc07c

10.0.0

System requirements

Lucene 10.0 requires JDK 21 or newer

API changes

KNN vector values now have a random-access API.
Deprecated APIs have been removed and a number of API changes have been made. Please consult the migrate guide for an extensive list and actions to take to migrate to 10.0.

New Features

A new IndexInput#prefetch API has been added, allowing query evaluation logic to let the Directory know about regions of data that are about to be read. This helps perform I/O concurrently under the hood. MMapDirectory implements this API using the madvise system call and the MADV_WILLNEED flag on Linux and Mac OS.
Lucene now supports sparse indexing on doc values via FieldType#setDocValuesSkipIndexType. The sparse index will record the minimum and maximum values per block of doc IDs. Used in conjunction with index sorting to cluster similar documents together, this allows for very space-efficient and CPU-efficient filtering.
Search concurrency is now decoupled from the index geometry, so that an index can be searched using any number of threads, regardless of its number of segments.
Kmeans clustering on vectors

Improvements

Lucene now opens files with the MADV_RANDOM advice by default on Linux and Mac OS. This results in better efficiency for indexes that exceed the size of the page cache, but can make it slower to load indexes in the page cache. It is possible to revert to the MADV_NORMAL read advice by default by passing -Dorg.apache.lucene.store.defaultReadAdvice=NORMAL as a JVM startup flag.
Snowball dictionaries have been upgraded, resulting in improved tokenization. This may require reindexing to ensure consistency of search results with pre-10.0 indexes.
The expressions module is now using MethodHandles and Dynamic Class-File Constants (JEP 309) in combination with hidden classes (JEP 371) to implement a strict and type-safe call to external functions. This allows to easier extend expressions with custom functions in secure way because runtime linking of custom functions is no longer the responsibility of the expressions scripting engine. In addition, the hidden classes created by the expressions engine no longer suffer from global classloader locks.

... plus a multitude of helpful bug fixes!

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fixes

What's Changed

Contributors

Uh oh!

Bug fixes

Uh oh!

Uh oh!

Bug fixes

Uh oh!

Bug fixes

Uh oh!

Uh oh!

Changes in Runtime Behavior

New Features

Improvements

Uh oh!

New Features

Improvements

Optimizations

Uh oh!

Improvements

Bug fixes

Uh oh!

System requirements

API changes

New Features

Improvements

Uh oh!

Releases: apache/lucene

10.3.1

Bug fixes

What's Changed

Contributors

Uh oh!

9.12.3

Bug fixes

Uh oh!

10.3.0

Uh oh!

9.12.2

Bug fixes

Uh oh!

10.2.2

Bug fixes

Uh oh!

10.2.1

Uh oh!

10.2.0

Changes in Runtime Behavior

New Features

Improvements

Uh oh!

10.1.0

New Features

Improvements

Optimizations

Uh oh!

9.12.1

Improvements

Bug fixes

Uh oh!

10.0.0

System requirements

API changes

New Features

Improvements

Uh oh!