Skip to content

Conversation

@Pulkitg64
Copy link
Contributor

Description

Add fallback support to Lucene104ScalarQuantizedVectorsFormat.getFloatVectorValues() when there are no full-precision vectors present. As part of this PR, we added this support in Lucene99ScalarQuantizedVectorsFormat but it got missed in new vector codec. This PR is trying to add back that support.

Pulkit Gupta added 2 commits November 11, 2025 16:05
@github-actions github-actions bot added this to the 10.4.0 milestone Nov 11, 2025
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I am sorry we missed this with the new format. Thank you for taking care of it.

(Ben Trent)

* GITHUB#15415: Add fallback support to Lucene104ScalarQuantizedVectorsFormat getFloatVectorValues when there are
no full-precision vectors present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your name for posterity :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed it. Fixed in next revision.

}

OffHeapScalarQuantizedFloatVectorValues(
boolean isQuerySide,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never allow this "querySide" thing. Even as it is now, it wouldn't work.

Comment on lines 138 to 149
// unpack bytes
switch (encoding) {
case PACKED_NIBBLE ->
OffHeapScalarQuantizedVectorValues.unpackNibbles(byteValue, unpackedByteVectorValue);
case SINGLE_BIT_QUERY_NIBBLE ->
OptimizedScalarQuantizer.unpackBinary(byteValue, unpackedByteVectorValue);
case UNSIGNED_BYTE, SEVEN_BIT -> {
deQuantize(byteValue, vectorValue, encoding.getBits(), correctiveValues, centroid);
lastOrd = targetOrd;
return vectorValue;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was "query side" it wouldn't work. Consequently, I think this query side thing should go away.

I think this piece is great if we always assume document quantized bits.

byte[] quantized,
float[] dequantized,
byte bits,
float[] correctiveValues,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this specifically lower upper thing instead of this array?

byte[] quantized,
float[] dequantized,
byte bits,
float[] correctiveValues,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets name these lowerInterval and upperInterval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in next revision.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Thank you

@Pulkitg64
Copy link
Contributor Author

Thanks @benwtrent for such quick review!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants