Add binary doc value compression with variable doc count blocks #137139

parkertimmins · 2025-10-24T23:15:50Z

Add compression for binary doc values using Zstd and blocks with a variable number of values.

Block-wise LZ4 was previously added to Lucene in LUCENE-9211. This was subsequently removed in LUCENE-9378 due to query performance issues.

We investigated adding to adding the original Lucene implementation to ES in #112416 and #105301. This approach stores a constant number of values per block (specifically 32 values). This is nice because it makes it very easy to map a given value index (eg docId for dense values) to the block containing it with blockId = docId / 32. Unfortunately, if values are very large we cannot reduce the number of values per block and (de)compressing a block could cause an OOM. Also, since this is a concern, we have to keep the number of values lower than ideal.

This PR instead stores a variable number of documents per block. It stores a minimum of 1 document per block and stops adding values when the size of a block exceeds a threshold. Like the previous version is stores an array of address for the start of each block. Additionally, it stores are parallel array with the value index at the start of each block. When looking up a given value index, if it is not in the current block, we binary search the array of value index starts to find the blockId containing the value. Then look up the address of the block.

Reintroduce the LZ4 binary doc values compression originally added to Lucene in LUCENE-9211. Modify so that works in ES819TSDBDocValuesFormat

parkertimmins · 2025-10-24T23:17:24Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/DelayedOffsetAccumulator.java

+        boolean success = false;
+        try {
+            tempOutput = dir.createTempOutput(data.getName(), suffix, context);
+            CodecUtil.writeHeader(


Does it make sense to add the header/footer and then check checksum, given that we are immediately using and deleting the temp file?

server/src/main/java/org/elasticsearch/index/codec/PerFieldFormatSupplier.java

martijnvg

Good stuff Parker! I did a first review round.

Additionally I also think we should get real bwc test coverage, I think we can get that by adding a bwc java integration test for wildcard field type. Similar to MatchOnlyTextRollingUpgradeIT or TextRollingUpgradeIT.

server/src/main/java/org/elasticsearch/index/codec/tsdb/BinaryDVCompressionMode.java

server/src/main/java/org/elasticsearch/index/codec/zstd/Zstd814StoredFieldsFormat.java

server/src/main/java/org/elasticsearch/index/codec/PerFieldFormatSupplier.java

...er/src/test/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormatTests.java

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormat.java

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormat.java

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

Kubik42

Nice! I've left more questions than comments:

Kubik42 · 2025-10-27T19:26:54Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/DelayedOffsetAccumulator.java

+            success = true;
+        } finally {
+            if (success == false) {
+                IOUtils.closeWhileHandlingException(this); // self-close because constructor caller can't


this should be tested

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/DelayedOffsetAccumulator.java

Kubik42 · 2025-10-27T19:35:53Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

    }

+    public void doAddCompressedBinary(FieldInfo field, DocValuesProducer valuesProducer) throws IOException {
+        try (CompressedBinaryBlockWriter blockWriter = new CompressedBinaryBlockWriter()) {


[nit] could use some comments here, explaining what each chunk of code does

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

dnhatn

I've left some comments, but this looks great. Thanks Parker!

dnhatn · 2025-11-13T05:36:24Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+            numDocsInCurrentBlock = uncompressedBlockLength = 0;
+        }
+
+        void compressOffsets(DataOutput output, int numDocsInCurrentBlock) throws IOException {


Should we encode the lengths using GroupVIntUtil#writeGroupVInts instead? I'm not sure TSDBDocValuesEncoder is suitable for encoding these offsets. Also, always padding 128 offsets may be wasteful.

Good point, especially after limiting the number of docs per block to 1024 the padding could be a concern. Sounds good, I'll give this a try 👍

Hmm, so I'm seeing a slow-down with readGroupVInts on some benchmark queries. Mostly small decreases that could be noise, but some in the 25-40% range that are concerning. I'd think that GroupVIntUtil would be quite fast. Is there possibly something I'm missing in the decompression code that could speed it up? I'm currently benchmarking with uncompressed offsets to get a baseline for offset (de)compression.

dnhatn · 2025-11-13T05:37:35Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+        void compress(byte[] data, int uncompressedLength, DataOutput output) throws IOException {
+            ByteBuffer inputBuffer = ByteBuffer.wrap(data, 0, uncompressedLength);
+            ByteBuffersDataInput input = new ByteBuffersDataInput(List.of(inputBuffer));
+            compressor.compress(input, output);


Should we use Zstd from NativeAccess directly to avoid copying data to an intermediate buffer before the native buffer?

My only concern is that currently this uses lucene's Compressor/CompressionMode. Which will make it easy to add other compressors. On the other hand, as we previously spoke about, it might make sense to use LZ4 to partially decompress blocks. If that is the case, we may not want to use the Compressor interface ... though I'm actually not sure either way.

Anyway, I split a hacky version of this off here, and will benchmark it to see if it's worth doing.

I ran some benchmarks on the above hacky version and got some weird results. Some of the queries got a nice throughput increase. The weird part is that the Store Size increased by an amount that was not reflected in the output of disk_usage. There must be a bug in my version that is causing this.

To keep this PR small(er), what do you think about updating to using NativeAccess directly in a separate PR?

dnhatn · 2025-11-13T05:42:16Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+            }
+        }
+
+        void compress(byte[] data, int uncompressedLength, DataOutput output) throws IOException {


If we use Zstd directly, should we also handle cases where compression does not reduce storage and store the raw bytes instead?

I like this the idea of not compressing if it doesn't help. This would still apply with non-direct Zstd, right? I guess for non-direct Zstd we'd need a separate output buffer to check the length before sending to output.

I discussed some with Martijn and he suggested adding a signal byte now, which says whether or not the data is compressed. It would always be set to true now, but can will support false once we add Zstd direct, and enable this optimization. What do you think?

dnhatn

@parkertimmins Thanks for the extra experiment. As discussed offline, we may need a follow-up after this PR, but the current state looks great.

martijnvg

Thanks Parker! LGTM.

docs/changelog/137139.yaml

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/BlockMetadataAccumulator.java

parkertimmins · 2025-11-19T19:10:15Z

A few tests are still failing with test-release, but they are all unrelated to this change:

org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapperTests.testKnnQuantizedFlatVectorsFormat
org.elasticsearch.xpack.inference.integration.SemanticTextIndexOptionsIT.testValidateIndexOptionsWithBasicLicense (same cause as reason for checkPart1 failure on this PR)
org.elasticsearch.xpack.esql.plugin.IndexResolutionIT.testSubqueryResolution (also failing on this PR)

All logsdb tests a passing, including rolling upgrade bwc tests. Since that is the case, I'll go ahead with the merge.

Binary doc value compression was added behind a feature flag in #137139 . This PR removes the feature flag to enable the feature.

…tic#137139) Add compression for binary doc values using Zstd and blocks with a variable number of values. Block-wise LZ4 compression for binary doc values was previously added to Lucene in LUCENE-9211. This was subsequently removed in LUCENE-9378 due to query performance issues. We investigated adding to adding the original Lucene implementation to ES in elastic#112416 and elastic#105301. This previous approach used a constant number of values per block (specifically 32 values). This is nice because it makes it very easy to map a given value index (eg docId for dense values) to the block containing it with blockId = docId / 32. Unfortunately, if values are very large we cannot reduce the number of values per block and (de)compressing a block could cause an OOM. Also, since this is a concern, we have to keep the number of values lower than ideal. This PR instead stores a variable number of documents per block. It stores a minimum of 1 document per block and stops adding values when the size of a block exceeds a threshold. Like the previous version it stores an array of address for the start of each block. Additionally, it stores a parallel array with the value index at the start of each block. When looking up a given value index, if it is not in the current block, we binary search the array of value index starts to find the blockId containing the value. Then look up the address of the block.

…38524) Binary doc value compression was added behind a feature flag in elastic#137139 . This PR removes the feature flag to enable the feature.

parkertimmins added 10 commits October 23, 2025 13:02

Copy binary compression from LUCENE-9211

74880a0

Reintroduce the LZ4 binary doc values compression originally added to Lucene in LUCENE-9211. Modify so that works in ES819TSDBDocValuesFormat

Initial version of block withs variable number values

a973713

Fix issue with index output unclosed

3fc95dc

Changes docRanges to single limit per block, plus start of 0

c302cc2

Factor block address and block doc offset to accumulator class

99748c8

Rename offset accumulator

fa2ea11

Change lz4 to zstd

b67dd58

Fix direct monotonic reader size

638dbbc

Fix docRangeLen bug, use for non-logsdb wildcards

fdf3428

Change offset encoding from zstd to numeric

36b3e10

elasticsearchmachine added the v9.3.0 label Oct 24, 2025

parkertimmins commented Oct 24, 2025

View reviewed changes

elasticsearchmachine and others added 3 commits October 24, 2025 23:22

[CI] Auto commit changes from spotless

eeded36

Fix missing compression in es819 format

2d8e6dc

Merge branch 'main' into parker/compressed-binary-doc-values

efa270f

parkertimmins changed the title ~~Add binary doc value compression with variably doc count blocks~~ Add binary doc value compression with variable doc count blocks Oct 25, 2025

parkertimmins and others added 5 commits October 24, 2025 21:36

Store offsets rather than lengths

c4d67e5

[CI] Auto commit changes from spotless

06a2035

Remove forbidden APIs

7ccb18d

[CI] Auto commit changes from spotless

a57e0d4

Binary search to find block containing docNum

f156e55

parkertimmins commented Oct 27, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/codec/PerFieldFormatSupplier.java Outdated Show resolved Hide resolved

[CI] Auto commit changes from spotless

91e5842

martijnvg reviewed Oct 27, 2025

View reviewed changes

parkertimmins and others added 4 commits October 27, 2025 09:43

do not mmap temp offset files

401a041

feedback

ad55bc3

[CI] Auto commit changes from spotless

4d4e153

Move zstd (de)compressor to separate class

f1ff182

Kubik42 reviewed Oct 27, 2025

View reviewed changes

Combine doAddCompressedBinary and doAddUncompressedBinary

9d2f237

dnhatn reviewed Nov 13, 2025

View reviewed changes

parkertimmins and others added 6 commits November 13, 2025 18:02

Merge branch 'main' into parker/compressed-binary-doc-values

d56d12f

Use groupVarInt instead of TSDB encoder

980df97

Dont test bulk loading if compressed, as not implemented

21a98ac

[CI] Auto commit changes from spotless

2239732

Fix broken merge

15823e8

Merge branch 'main' into parker/compressed-binary-doc-values

25dcb56

dnhatn approved these changes Nov 14, 2025

View reviewed changes

parkertimmins and others added 9 commits November 14, 2025 18:20

Revert to using TSDBDocValueEncoder for offsets

200e14c

Better naming and minor optmization

5ca24b4

Dont need to grow offsets array

7f8fa16

And back to GroupedVarInt, this time with better delta decoding

91c23ee

Add header to control whether block is compressed or uncompressed

92c8050

Handle isCompressed in ES819DocValuesProducer, add bwc tests

016352a

Merge branch 'main' into parker/compressed-binary-doc-values

8a2af81

[CI] Auto commit changes from spotless

026406b

Skip bulk loading tests if compressed

d27bb8b

martijnvg approved these changes Nov 18, 2025

View reviewed changes

docs/changelog/137139.yaml Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/BlockMetadataAccumulator.java Outdated Show resolved Hide resolved

review feedback

db68af6

parkertimmins added >non-issue test-release Trigger CI checks against release build and removed release highlight >feature labels Nov 18, 2025

Merge branch 'main' into parker/compressed-binary-doc-values

50d9a26

parkertimmins merged commit 15709dd into elastic:main Nov 19, 2025
33 of 38 checks passed

parkertimmins mentioned this pull request Nov 24, 2025

Remove feature flag to enable binary doc value compression #138524

Merged

parkertimmins added a commit that referenced this pull request Nov 25, 2025

Remove feature flag to enable binary doc value compression (#138524)

a83acf1

Binary doc value compression was added behind a feature flag in #137139 . This PR removes the feature flag to enable the feature.

Add binary doc value compression with variable doc count blocks #137139

Add binary doc value compression with variable doc count blocks #137139

Uh oh!

Conversation

parkertimmins commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kubik42 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

parkertimmins commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

parkertimmins commented Oct 24, 2025 •

edited

Loading