Skip to content

Conversation

oaganesh
Copy link
Contributor

@oaganesh oaganesh commented May 4, 2025

Description

Writing the segment profiler state to a file and profiling the different compression standards for benchmark testing.

Related Issues

Implements #2687

Check List

  • [ ✔️ ] New functionality includes testing.
  • [ ✔️ ] New functionality has been documented.
  • [✔️ ] API changes companion pull request created.
  • [ ✔️ ] Commits are signed per the DCO using --signoff.
  • [ ✔️ ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check ✔️ .

segmentWriteState.segmentInfo.name,
segmentWriteState.segmentSuffix,
KNNConstants.QUANTIZATION_STATE_FILE_SUFFIX
// public KNN990QuantizationStateWriter(SegmentWriteState segmentWriteState) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't leave commented code in the PR.

// output = segmentWriteState.directory.createOutput(quantizationStateFileName, segmentWriteState.context);
// }

public KNN990QuantizationStateWriter(SegmentWriteState segmentWriteState, String fileSuffix) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using quantization state writer for this? If we're doing a different segment state we should probably have a separate writer. But this is ok for POC purposes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. When trying to write another writer it required a lot of further refactoring as the other classes implement this.

this(segmentWriteState, KNNConstants.QUANTIZATION_STATE_FILE_SUFFIX);
}

// public KNN990QuantizationStateWriter(SegmentWriteState segmentWriteState) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not leave commented code here.

);
final QuantizationState quantizationState = train(field.getFieldInfo(), knnVectorValuesSupplier, totalLiveDocs);
SegmentProfilerState.profileVectors(knnVectorValuesSupplier);
profile(field.getFieldInfo(), knnVectorValuesSupplier, totalLiveDocs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we correct in our assumption that the knnVectorValuesSupplier contains information on non-compressed vectors? i.e the profiling happens before the compression.

private void initSegmentStateWriterIfNecessary() throws IOException {
if (segmentStateWriter == null) {
segmentStateWriter = new KNN990QuantizationStateWriter(segmentWriteState, KNNConstants.SEGMENT_PROFILE_STATE_FILE_SUFFIX);
segmentStateWriter.writeHeader(segmentWriteState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going with this approach we're going to have to refactor the QuantizationStateWriter to support generic file writing


@Override
public void writeTo(StreamOutput streamOutput) throws IOException {
streamOutput.writeString(shardId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this override? Did we confirm that's its even getting called in the API path?

@Getter
public class KNNProfileRequest extends BroadcastRequest<KNNProfileRequest> {

private String index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: final

}

byte[] stateBytes = readStateBytes(input, position, length);
return SegmentProfilerState.fromBytes(stateBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this logic looks ok to me but we can probably clean it up later

@oaganesh oaganesh closed this Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants