Skip to content
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
74880a0
Copy binary compression from LUCENE-9211
parkertimmins Oct 23, 2025
a973713
Initial version of block withs variable number values
parkertimmins Oct 23, 2025
3fc95dc
Fix issue with index output unclosed
parkertimmins Oct 23, 2025
c302cc2
Changes docRanges to single limit per block, plus start of 0
parkertimmins Oct 23, 2025
99748c8
Factor block address and block doc offset to accumulator class
parkertimmins Oct 23, 2025
fa2ea11
Rename offset accumulator
parkertimmins Oct 24, 2025
b67dd58
Change lz4 to zstd
parkertimmins Oct 24, 2025
638dbbc
Fix direct monotonic reader size
parkertimmins Oct 24, 2025
fdf3428
Fix docRangeLen bug, use for non-logsdb wildcards
parkertimmins Oct 24, 2025
36b3e10
Change offset encoding from zstd to numeric
parkertimmins Oct 24, 2025
eeded36
[CI] Auto commit changes from spotless
Oct 24, 2025
2d8e6dc
Fix missing compression in es819 format
parkertimmins Oct 25, 2025
efa270f
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Oct 25, 2025
c4d67e5
Store offsets rather than lengths
parkertimmins Oct 25, 2025
06a2035
[CI] Auto commit changes from spotless
Oct 25, 2025
7ccb18d
Remove forbidden APIs
parkertimmins Oct 25, 2025
a57e0d4
[CI] Auto commit changes from spotless
Oct 25, 2025
f156e55
Binary search to find block containing docNum
parkertimmins Oct 27, 2025
91e5842
[CI] Auto commit changes from spotless
Oct 27, 2025
401a041
do not mmap temp offset files
parkertimmins Oct 27, 2025
ad55bc3
feedback
parkertimmins Oct 27, 2025
4d4e153
[CI] Auto commit changes from spotless
Oct 27, 2025
f1ff182
Move zstd (de)compressor to separate class
parkertimmins Oct 27, 2025
9d2f237
Combine doAddCompressedBinary and doAddUncompressedBinary
parkertimmins Oct 27, 2025
2269f9c
[CI] Auto commit changes from spotless
Oct 27, 2025
1c4e9dc
feedback
parkertimmins Oct 28, 2025
3ddb649
Add WildcardRollingUpgradeIT
parkertimmins Oct 28, 2025
dbcd1c6
need new compressor/decompressor for new block writer
parkertimmins Oct 29, 2025
5537d8c
[CI] Auto commit changes from spotless
Oct 29, 2025
d7fce75
Cleanup binaryWriter interface
parkertimmins Oct 29, 2025
bb8361c
Revert "[CI] Auto commit changes from spotless"
parkertimmins Oct 29, 2025
aa3d44f
Revert "Add WildcardRollingUpgradeIT"
parkertimmins Oct 29, 2025
2c1f143
[CI] Auto commit changes from spotless
Oct 29, 2025
636c150
Update code lookup to support other compressors
parkertimmins Oct 29, 2025
09898ff
feedback
parkertimmins Oct 29, 2025
8b8b50b
Update bwc tests
parkertimmins Oct 29, 2025
8a82c23
cleanup
parkertimmins Oct 29, 2025
cef255f
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Oct 29, 2025
718ffc6
Fix test broken from merge
parkertimmins Oct 29, 2025
ebda5b0
Update docs/changelog/137139.yaml
parkertimmins Oct 30, 2025
9fc23f1
Move block address and doc_range accumulators into BlockMetadataAccum…
parkertimmins Oct 30, 2025
49e5425
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Oct 30, 2025
80525bf
Unit tests that require multiple doc value blocks
parkertimmins Oct 31, 2025
b1d4b17
Test values near the size of a block
parkertimmins Oct 31, 2025
e332619
Self close BlockMetadataAcc if throw during construction
parkertimmins Oct 31, 2025
60ebfaa
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Nov 3, 2025
1209e78
Update tsdb doc_values bwc test to mention version 1
parkertimmins Nov 3, 2025
80c14a3
Update docs/changelog/137139.yaml
parkertimmins Nov 3, 2025
602c203
Disable compression for geo_shape type
parkertimmins Nov 4, 2025
d6293d9
Test that wildcard uses ES819 docs encoding and geo_shape does not
parkertimmins Nov 4, 2025
982386e
[CI] Auto commit changes from spotless
Nov 4, 2025
e61b8c2
Add feature flag for binary dv compression
parkertimmins Nov 6, 2025
a225b98
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Nov 7, 2025
f6fd5bd
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Nov 7, 2025
5fe2c80
Add block count threshold in addition to size threshold
parkertimmins Nov 7, 2025
51b21ae
[CI] Auto commit changes from spotless
Nov 7, 2025
07eeb5a
Add test for very small binary values
parkertimmins Nov 7, 2025
d56d12f
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Nov 14, 2025
980df97
Use groupVarInt instead of TSDB encoder
parkertimmins Nov 14, 2025
21a98ac
Dont test bulk loading if compressed, as not implemented
parkertimmins Nov 14, 2025
2239732
[CI] Auto commit changes from spotless
Nov 14, 2025
15823e8
Fix broken merge
parkertimmins Nov 14, 2025
25dcb56
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Nov 14, 2025
200e14c
Revert to using TSDBDocValueEncoder for offsets
parkertimmins Nov 15, 2025
5ca24b4
Better naming and minor optmization
parkertimmins Nov 15, 2025
7f8fa16
Dont need to grow offsets array
parkertimmins Nov 15, 2025
91c23ee
And back to GroupedVarInt, this time with better delta decoding
parkertimmins Nov 17, 2025
92c8050
Add header to control whether block is compressed or uncompressed
parkertimmins Nov 17, 2025
016352a
Handle isCompressed in ES819DocValuesProducer, add bwc tests
parkertimmins Nov 17, 2025
8a2af81
Merge branch 'main' into parker/compressed-binary-doc-values
parkertimmins Nov 17, 2025
026406b
[CI] Auto commit changes from spotless
Nov 17, 2025
d27bb8b
Skip bulk loading tests if compressed
parkertimmins Nov 17, 2025
db68af6
review feedback
parkertimmins Nov 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -127,12 +127,22 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {
}

public DocValuesFormat getDocValuesFormatForField(String field) {
if (useTSDBDocValuesFormat(field)) {
if (useTSDBDocValuesFormat(field) || isBinaryDocValueField(field)) {
return tsdbDocValuesFormat;
}
return docValuesFormat;
}

boolean isBinaryDocValueField(final String field) {
if (mapperService != null) {
Mapper mapper = mapperService.mappingLookup().getMapper(field);
if (mapper != null && "wildcard".equals(mapper.typeName())) {
return true;
}
}
return false;
}

boolean useTSDBDocValuesFormat(final String field) {
if (excludeFields(field)) {
return false;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the "Elastic License
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
* Public License v 1"; you may not use this file except in compliance with, at
* your election, the "Elastic License 2.0", the "GNU Affero General Public
* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.index.codec.tsdb;

public enum BinaryDVCompressionMode {

NO_COMPRESS((byte) 0),
COMPRESSED_WITH_ZSTD((byte) 1);

public final byte code;

BinaryDVCompressionMode(byte code) {
this.code = code;
}

public static BinaryDVCompressionMode fromMode(byte mode) {
return switch (mode) {
case 0 -> NO_COMPRESS;
case 1 -> COMPRESSED_WITH_ZSTD;
default -> throw new IllegalStateException("unknown compression mode [" + mode + "]");
};
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the "Elastic License
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
* Public License v 1"; you may not use this file except in compliance with, at
* your election, the "Elastic License 2.0", the "GNU Affero General Public
* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.index.codec.tsdb.es819;

import org.apache.lucene.codecs.CodecUtil;
import org.apache.lucene.store.ChecksumIndexInput;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.util.IOUtils;
import org.apache.lucene.util.packed.DirectMonotonicWriter;

import java.io.Closeable;
import java.io.IOException;

/**
* Like OffsetsAccumulator builds offsets and stores in a DirectMonotonicWriter. But write to temp file
* rather than directly to a DirectMonotonicWriter because the number of values is unknown. If number of
* values if known prefer OffsetsWriter.
*/
final class DelayedOffsetAccumulator implements Closeable {
private final Directory dir;
private final long startOffset;

private int numValues = 0;
private final IndexOutput tempOutput;
private final String suffix;

DelayedOffsetAccumulator(
Directory dir,
IOContext context,
IndexOutput data,
String suffix,
long startOffset
) throws IOException {
this.dir = dir;
this.startOffset = startOffset;
this.suffix = suffix;

boolean success = false;
try {
tempOutput = dir.createTempOutput(data.getName(), suffix, context);
CodecUtil.writeHeader(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add the header/footer and then check checksum, given that we are immediately using and deleting the temp file?

tempOutput,
ES819TSDBDocValuesFormat.META_CODEC + suffix,
ES819TSDBDocValuesFormat.VERSION_CURRENT
);
success = true;
}
finally {
if (success == false) {
IOUtils.closeWhileHandlingException(this); // self-close because constructor caller can't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be tested

}
}
}

public void addDoc(long delta) throws IOException {
tempOutput.writeVLong(delta);
numValues++;
}

public void build(IndexOutput meta, IndexOutput data) throws IOException {
CodecUtil.writeFooter(tempOutput);
IOUtils.close(tempOutput);

// write the offsets info to the meta file by reading from temp file
try (ChecksumIndexInput tempInput = dir.openChecksumInput(tempOutput.getName());) {
CodecUtil.checkHeader(
tempInput,
ES819TSDBDocValuesFormat.META_CODEC + suffix,
ES819TSDBDocValuesFormat.VERSION_CURRENT,
ES819TSDBDocValuesFormat.VERSION_CURRENT
);
Throwable priorE = null;
try {
final DirectMonotonicWriter writer = DirectMonotonicWriter.getInstance(
meta,
data,
numValues + 1,
ES819TSDBDocValuesFormat.DIRECT_MONOTONIC_BLOCK_SHIFT
);

long offset = startOffset;
writer.add(offset);
for (int i = 0; i < numValues; ++i) {
offset += tempInput.readVLong();
writer.add(offset);
}
writer.finish();
} catch (Throwable e) {
priorE = e;
} finally {
CodecUtil.checkFooter(tempInput, priorE);
}
}
}

@Override
public void close() throws IOException {
if (tempOutput != null) {
IOUtils.close(tempOutput, () -> dir.deleteFile(tempOutput.getName()));
}
}
}
Loading