-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add Big Endian Support for Float32 in BinaryVectorWriter.WriteToBytes<T>() #1682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…<T>() Signed-off-by: Medha Tiwari <[email protected]>
5cd9ca1
to
a4384e3
Compare
Signed-off-by: Medha Tiwari <[email protected]>
a4384e3
to
2c2cae1
Compare
Hi @BorisDog, if everything if fine, can this be merged? |
Hi @BorisDog, just following up to check if there's any update on this PR. Please let me know if any further changes are needed. |
if ((vectorDataBytes.Span.Length & 3) != 0) | ||
{ | ||
throw new FormatException("Data length of binary vector of type Float32 must be a multiple of 4 bytes."); | ||
} | ||
|
||
if (BitConverter.IsLittleEndian) | ||
if (typeof(TItem) != typeof(float)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this check, as is validation is done in ValidateItemType
.
int count = vectorDataBytes.Length / 4; // 4 bytes per float | ||
float[] floatArray = new float[count]; | ||
|
||
for (int i = 0; i < count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few thoughts here:
- We can avoid the loop in the case of Little endian. So it's better to leave the old code in that case.
- The BigEndian loop probabaly can be extracted to a helper method:
float[] ToFloatArrayBigEndian(ReadOnlySpan<byte>)
// Each float32 is 4 bytes. So to extract the i-th float, we slice 4 bytes from offset i * 4. Use little-endian or big-endian decoding based on platform. | ||
floatArray[i] = BitConverter.IsLittleEndian | ||
? MemoryMarshal.Read<float>(vectorDataBytes.Span.Slice(i * 4, 4)) // fast, unaligned read on little endian | ||
: BinaryPrimitives.ReadSingleBigEndian(vectorDataBytes.Span.Slice(i * 4, 4)); // correctly reassemble 4 bytes as big-endian float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinaryPrimitives.ReadSingleBigEndian
is not available on older TFMs, so this won't compile.
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for all the empty lines changes.
resultBytes = new byte[2 + vectorDataBytes.Length]; | ||
resultBytes[0] = (byte)binaryVectorDataType; | ||
resultBytes[1] = padding; | ||
vectorDataBytes.CopyTo(resultBytes.AsSpan(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the code that handles LittleEndian case be reused here?
foreach (var value in floatSpan) | ||
{ | ||
// Each float is 4 bytes - write in Big Endian format | ||
BinaryPrimitives.WriteSingleBigEndian(floatOutput, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue with older TFMs.
switch (binaryVectorDataType) | ||
{ | ||
case BinaryVectorDataType.Float32: | ||
int length = vectorData.Length * sizeof(float); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Prefer var
over concrete type.
// Each float32 is 4 bytes. So to extract the i-th float, we slice 4 bytes from offset i * 4. Use little-endian or big-endian decoding based on platform. | ||
floatArray[i] = BitConverter.IsLittleEndian | ||
? MemoryMarshal.Read<float>(vectorDataBytes.Span.Slice(i * 4, 4)) // fast, unaligned read on little endian | ||
: BinaryPrimitives.ReadSingleBigEndian(vectorDataBytes.Span.Slice(i * 4, 4)); // correctly reassemble 4 bytes as big-endian float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to read little endian here. BSON is always little endian, so we need to swap the bytes when reading on big endian.
If I am not mistaken, if we do ReadSingleBigEndian
on a big endian machine, the bytes would not be swapped.
Same logic applies to Write as well.
In any case, we need to validate this solution with a real server, for example comparing the binary vector values read from Atlas deployment and the actual values.
Description
This PR adds Big Endian support for System.Single (Float32) to the BinaryVectorWriter.WriteToBytes() method.
Background
While running the MongoDB.Bson.Tests test suite on a Big Endian (s390x) system, we encountered 34 consistent test failures within the BinaryVectorSerializerTests class.
Each failure was caused by a System.NotSupportedException indicating that binary vector data of float32 type is not yet supported on Big Endian architectures.
Exception Observed
Sample Failing Tests
Some of the test cases that failed due to this limitation include:
BinaryVectorSerializerTests.BinaryVectorSerializer_should_deserialize_bson_vector<Float32>
BinaryVectorSerializerTests.BinaryVectorSerializer_should_serialize_bson_vector<Float32>
BinaryVectorSerializerTests.ArrayAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>
BinaryVectorSerializerTests.ArrayAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>
BinaryVectorSerializerTests.MemoryAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>
BinaryVectorSerializerTests.MemoryAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>
BinaryVectorSerializerTests.ReadOnlyMemoryAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>
BinaryVectorSerializerTests.ReadOnlyMemoryAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>
Why This Fix Is Necessary
This limitation was blocking test pass status on Big Endian platforms such as s390x. Adding support for float32 serialization in Big Endian format:
Enables consistent behavior across architectures
Completes existing deserialization support added earlier in BinaryVectorReader.cs
Changes Introduced
Added Big Endian branch to BinaryVectorWriter.WriteToBytes() for T == float.
Used
BinaryPrimitives.WriteSingleBigEndian()
to write bytes in the correct order.Left existing Little Endian logic untouched to preserve behavior.
cc: @giritrivedi