Skip to content

Fix UBSAN errors when converting types using from_json#5207

Open
ChrisCoxArt wants to merge 13 commits into
nlohmann:developfrom
ChrisCoxArt:develop
Open

Fix UBSAN errors when converting types using from_json#5207
ChrisCoxArt wants to merge 13 commits into
nlohmann:developfrom
ChrisCoxArt:develop

Conversation

@ChrisCoxArt

@ChrisCoxArt ChrisCoxArt commented Jun 9, 2026

Copy link
Copy Markdown

Fixes #5206

from_json.hpp

  • Clamp ranges when converting from internal to external types.
  • Flush NaN and Inf to 0 when converting float to int, preserve when converting to another float.
  • Use compile time tests to remove the clamping when not necessary.

unit-conversions.cpp

  • Test range problems (overflow and underflow) for integer and float types.
  • Test +-Infinity and NaN conversions.

Checklist

  • The changes are described in detail, both the what and why.
  • If applicable, an existing issue is referenced.
  • The Code coverage remained at 100%. A test case for every new line of code.
  • If applicable, the documentation is updated.
    I'm not sure about this - this change doesn't really fit the current number_handling.md documentation.
  • The source code is amalgamated by running make amalgamate.
  • Make sure all ctests are run and passing.

Read the Contribution Guidelines for detailed information.

Note

DCO does not appear to know about digital signatures - it should be removed or replaced with smarter tests.

and flush NaN and Inf to zero before trying to convert
and types that support Inf support NaN, until someone creates a new FP standard type
and add compile time checks for ranges to reduce compares needed at runtime
long double testing is inconsistent elsewhere - but add it and make sure it gets tested
@gregmarr

Copy link
Copy Markdown
Contributor

There was a previous discussion of this about 10 years ago in #288. I don't know if his position has changed since then, but at the time it was that if you are extracting to a smaller data type than what is being used internally to hold the data value, then it is your responsibility to perform the range checking, not the library's, and thus people that extract it to the proper type don't pay the cost of range checking.

@ChrisCoxArt

Copy link
Copy Markdown
Author

Again:
You literally can't do the range checking after the fact, unless you extract everything to the exact type used inside the library, then do all the work yourself... at which point, why is the library creating errors and more work for everyone instead of doing the right thing? Yes, it is 100% the library's responsibility to not cause undefined behavior errors.

@gregmarr

Copy link
Copy Markdown
Contributor

You literally can't do the range checking after the fact, unless you extract everything to the exact type used inside the library, then do all the work yourself...

Yes, that is exactly how one does range checking when one is going to fit an unknown value from a larger datatype into a smaller datatype.

at which point, why is the library creating errors and more work for everyone instead of doing the right thing?

The library could limit the extraction function to just the internal types, but it does more as a convenience for the user. If the user cares about fitting larger values into smaller types, then it is the user's responsibility to do the checking to make sure it fits.

In general, the library follows the "don't pay for what you don't use" philosophy. If you know that you don't need to do range checking, then you can call the function that doesn't do range checking. If you do need to do range checking because you absolutely need a smaller datatype than what is used internally, do the range checking yourself before putting the data in the smaller datatype.

Yes, it is 100% the library's responsibility to not cause undefined behavior errors.

It is not causing them. If you are 100% certain that the value will fit in the smaller datatype, then you can use the function that allows you to extract it into the smaller datatype. If you are not 100% certain, extract it as the native datatype and do the range checking yourself.

I am not the maintainer, just a long time user, and in general that has been the philosophy of the library. As the maintainer, @nlohmann is definitely able to make changes, but if those changes can cause behavior differences, then it is less likely that they will be accepted.

@ChrisCoxArt

Copy link
Copy Markdown
Author

Wow. Your (almost Lewis Carroll like) "logic" belongs in a museum somewhere.

Yes, the nlohmann json code is 100% causing the undefined behavior errors because of type conversions without range testing/clamping. Fixing that requires changes to the library code, or re-implementing chunks of the library code by every user of the library. Yes, fixing the errors in the library will cause behavior changes - because it will no longer cause errors, or return random unexpected values.

This is JSON we're talking about - you can never be 100% certain of the data range unless you just wrote that data and only immediately consume your own internal data. As soon as storage is involved -- data can be changed, and ranges are undefined. As soon as third party data is involved -- ranges are undefined. As soon as end-users are involved -- ranges are undefined. If you assume that the data is always in the range you created, then you have created attack vectors and vulnerabilities in your code -- which is why tools like UBSan exist to expose those errors and vulnerabilities.

Pushing responsibility for the errors in the library onto the users of the library is irresponsible, at best.

I certainly hope that the maintainer has a better grasp on the world than you seem to be conveying.

@gregmarr

Copy link
Copy Markdown
Contributor

The fix is very simple, extract using the internal datatypes, and then you don't have to worry about introducing undefined behavior using the library functions. What you might want for range checking behavior isn't necessarily the same as someone else wants for range checking behavior, and it can vary from element to element.

This is JSON we're talking about - you can never be 100% certain of the data range unless you just wrote that data and only immediately consume your own internal data.

And yet there are users of this library that are 100% certain of the data that they are reading, or they ALREADY do their own range checking, and thus for performance reasons absolutely do not want the extra range checking.

@ChrisCoxArt

Copy link
Copy Markdown
Author

No, pushing the fix onto users of the library means re-implementing parts of the library to do the right thing. Why should the user have to re-implement parts of the library? Why can't the library do the right thing in the first place? Why ship a library with known UB errors? Can users trust the library code if maintainers think that UB shouldn't be fixed and should require third party code to replace parts of the library?

Your "logic" is beyond broken, and you are defending bad code and bad practices.

@gregmarr

Copy link
Copy Markdown
Contributor

Please define "the right thing" for every possible set of circumstances experienced by every user of this library.

@ChrisCoxArt

Copy link
Copy Markdown
Author

You can start with "don't generate errors from UBSan", and "don't return random unexpected values".

Please stop defending broken code, and start working on how to fix it.

@nlohmann

Copy link
Copy Markdown
Owner

I stand with #288. The library has a well-defined and documented behavior of storing data. When you decide to ask for conversions like get<int>(), it is not the library's duty to check if this conversion makes sense - this is also documented.

Also, it's documented how the library is handling number types. There is no surprise, and we have functions to query the stored type. This means there is some caution and work needed by clients when numbers are read from the library - but all this is possible by calling public library functions.

Finally, I would like to mention that the the UBSAN errors mentioned here are not caused by the library, but by client code making the wrong assumptions. It's basically an example of casting double to int via a static_cast and complaining about cases where this conversion does not make sense.

@ChrisCoxArt

Copy link
Copy Markdown
Author

I strongly disagree. The library is causing the UBsan errors, because the library exposes functionality that has undefined behavior and fails to take precautions against that undefined behavior.
This is 100% a bug in the json library code.
This is 100% a security vulnerability in the library code that affects most clients (returning random values based on byte aliasing when returning smaller types).
Expecting all clients to re-implement library functionality just to avoid UB and security problems is extremely irresponsible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

from_json - Type conversions without range clamping cause UBSan warnings

3 participants