You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This refactor comes after a belated realization that our numeric
conversion routines (principally from_string<>) are "locale-dependent"
via their use of strtod, as well as atof() scattered through the code.
In particular, this means that when software using this code is running
with a locale in which the decimal mark is something other than '.'
(e.g., many European locales that use ',' for decimal mark), these
functions will incorrectly parse text floating point numbers that use
the standard period as decimal mark. For example, if the locale is
"fr_FR", atof("123.45") will return 123.0, rather than the correct
123.45.
At the core of our fix is the introduction of Strutil::strtof/strtod
which by default is "locale-independent" (i.e. always uses '.') and also
has varieties that take a std::locale& explicitly. The implementation
uses strtod_l/strtof_l (nonstandard, but available on Linux, OSX, BSD,
and a differently-named but equivalent function on Windows), with a
gross but simple fallback for any straggler platforms (MINGW?) that
copies the string and replaces the '.' with the native locale's mark.
This is pretty inexpensive: on Linux, the locale-independent version of
our new Strutil::strtof has the same cost as the system locale-dependent
atof/strtof, and the explicit locale version is only about 20% slower.
On OSX, atof is remarkably fast, so our version that uses strtof_l is
about 2x slower, but that seems acceptable given that it gives us all
the control we want, and in comparison, another strategy involving
istringstream was 10x slower. Unless this is shown to be an important
bottleneck, I'm rejecting the alternative of embedding a full custom
strtod implementation as being unnecessarily inviting maintenance costs
and potential bugs. The approach we chose is minimally invasive and
relies on system libraries for the heavy lifting.
Additionally:
* Added a new stof(), stoi(), stoul() to replace the clunkier template
syntax of from_string<>. The stof(), like our new strtof(), has a
locale-independent as well as an explicit-local versions.
* A handful of places that still had old calls to std atof and strtod
were changed to Strutil::stof.
* Strutil::format and ustring::format (and our copy of tinyformat
underneath) were touched up so that the versions of format() that return
strings will use the "C" locale, versions added that return strings and
use an explicitly-passed locale, and the versions that append their
results to existing streams continue to operate as before by honoring
the existing locale conventions of the streams they are using.
* Several places where we assembled strings using std::ostringstream,
I forced the stream to use classic "C" locale in cases where it was
likely to have any floating-point data output.
* For maketx and oiiotool, globally set the locale to classic. In these
cases, we control the whole app, so this is safe to do.
TIP going forward: For any I/O that must be persistent (files saved or
read), or always be formatted identically regardless of the computer
being used, always take care to use locale-independent (i.e. classic "C"
locale) formatting functions and also making sure to initialize any
std::ostringstream with stream.imbue(std::locale::classic()). For
ephemeral I/O or UI elements that you want to display correctly
internationalized for the user's country, then use the versions with
explicit locale std::locale() and use the default initialization of
output streams.
As an aside, I do wish that C/C++ had adopted the convention of default
using the classic (globally uniform) locale for all functions that don't
take an explicit locale, and leave the explicit locale functions just
for programs that are intentionally trying to do country-by-country
localization. This is all just a huge mess, and I hate that our library
can have subtle I/O bugs when used by someone in another country because
of a particular global locale setting that we have no control over and
can't modify simply without potentially breaking the surrounding app.
0 commit comments