Skip to content

Add Big Endian Support for Float32 in BinaryVectorWriter.WriteToBytes<T>() #1682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

medhatiwari
Copy link
Contributor

@medhatiwari medhatiwari commented May 5, 2025

Description

This PR adds Big Endian support for System.Single (Float32) to the BinaryVectorWriter.WriteToBytes() method.

Background

While running the MongoDB.Bson.Tests test suite on a Big Endian (s390x) system, we encountered 34 consistent test failures within the BinaryVectorSerializerTests class.
Each failure was caused by a System.NotSupportedException indicating that binary vector data of float32 type is not yet supported on Big Endian architectures.

Exception Observed

System.NotSupportedException: Binary vector data is not supported on Big Endian architecture yet.

Sample Failing Tests

Some of the test cases that failed due to this limitation include:

BinaryVectorSerializerTests.BinaryVectorSerializer_should_deserialize_bson_vector<Float32>

BinaryVectorSerializerTests.BinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.ArrayAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>

BinaryVectorSerializerTests.ArrayAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.MemoryAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.MemoryAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>

BinaryVectorSerializerTests.ReadOnlyMemoryAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.ReadOnlyMemoryAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>

Why This Fix Is Necessary

This limitation was blocking test pass status on Big Endian platforms such as s390x. Adding support for float32 serialization in Big Endian format:

Enables consistent behavior across architectures

Completes existing deserialization support added earlier in BinaryVectorReader.cs

Changes Introduced

Added Big Endian branch to BinaryVectorWriter.WriteToBytes() for T == float.

Used BinaryPrimitives.WriteSingleBigEndian() to write bytes in the correct order.

Left existing Little Endian logic untouched to preserve behavior.

cc: @giritrivedi

@medhatiwari medhatiwari requested a review from a team as a code owner May 5, 2025 10:53
@medhatiwari medhatiwari requested review from rstam and removed request for a team May 5, 2025 10:53
@BorisDog BorisDog requested review from BorisDog and removed request for rstam May 5, 2025 20:14
@medhatiwari medhatiwari force-pushed the binaryvectorsupport branch from 5cd9ca1 to a4384e3 Compare May 6, 2025 09:00
Signed-off-by: Medha Tiwari <[email protected]>
@medhatiwari medhatiwari force-pushed the binaryvectorsupport branch from a4384e3 to 2c2cae1 Compare May 6, 2025 09:01
@medhatiwari
Copy link
Contributor Author

Hi @BorisDog, if everything if fine, can this be merged?

@medhatiwari
Copy link
Contributor Author

Hi @BorisDog, just following up to check if there's any update on this PR. Please let me know if any further changes are needed.

if ((vectorDataBytes.Span.Length & 3) != 0)
{
throw new FormatException("Data length of binary vector of type Float32 must be a multiple of 4 bytes.");
}

if (BitConverter.IsLittleEndian)
if (typeof(TItem) != typeof(float))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this check, as is validation is done in ValidateItemType.

int count = vectorDataBytes.Length / 4; // 4 bytes per float
float[] floatArray = new float[count];

for (int i = 0; i < count; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few thoughts here:

  1. We can avoid the loop in the case of Little endian. So it's better to leave the old code in that case.
  2. The BigEndian loop probabaly can be extracted to a helper method: float[] ToFloatArrayBigEndian(ReadOnlySpan<byte>)

// Each float32 is 4 bytes. So to extract the i-th float, we slice 4 bytes from offset i * 4. Use little-endian or big-endian decoding based on platform.
floatArray[i] = BitConverter.IsLittleEndian
? MemoryMarshal.Read<float>(vectorDataBytes.Span.Slice(i * 4, 4)) // fast, unaligned read on little endian
: BinaryPrimitives.ReadSingleBigEndian(vectorDataBytes.Span.Slice(i * 4, 4)); // correctly reassemble 4 bytes as big-endian float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryPrimitives.ReadSingleBigEndian is not available on older TFMs, so this won't compile.

break;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for all the empty lines changes.

resultBytes = new byte[2 + vectorDataBytes.Length];
resultBytes[0] = (byte)binaryVectorDataType;
resultBytes[1] = padding;
vectorDataBytes.CopyTo(resultBytes.AsSpan(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the code that handles LittleEndian case be reused here?

foreach (var value in floatSpan)
{
// Each float is 4 bytes - write in Big Endian format
BinaryPrimitives.WriteSingleBigEndian(floatOutput, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue with older TFMs.

switch (binaryVectorDataType)
{
case BinaryVectorDataType.Float32:
int length = vectorData.Length * sizeof(float);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Prefer var over concrete type.

// Each float32 is 4 bytes. So to extract the i-th float, we slice 4 bytes from offset i * 4. Use little-endian or big-endian decoding based on platform.
floatArray[i] = BitConverter.IsLittleEndian
? MemoryMarshal.Read<float>(vectorDataBytes.Span.Slice(i * 4, 4)) // fast, unaligned read on little endian
: BinaryPrimitives.ReadSingleBigEndian(vectorDataBytes.Span.Slice(i * 4, 4)); // correctly reassemble 4 bytes as big-endian float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to read little endian here. BSON is always little endian, so we need to swap the bytes when reading on big endian.
If I am not mistaken, if we do ReadSingleBigEndian on a big endian machine, the bytes would not be swapped.

Same logic applies to Write as well.

In any case, we need to validate this solution with a real server, for example comparing the binary vector values read from Atlas deployment and the actual values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants