Skip to content

Locale-dependent serialization of integer types produces invalid XML #2

@cgri

Description

@cgri

Affected version: XSD 4.2.0 (C++/Tree mapping)
Affected files: xsd/cxx/tree/serialization/int.hxx
(and likely the analogous headers for long, short,
byte, unsigned-int, unsigned-long, integer, etc.)

Summary

The serialization operators for built-in integer types use a
std::basic_ostringstream without calling imbue(std::locale::classic()).
As a result, when the global C++ locale uses digit grouping (e.g. de_DE,
where 42000 is formatted as "42.000"), the generated XML contains
group-separator characters in numeric fields. This violates the
XML Schema specification, which requires xs:int, xs:long, xs:integer
etc. to be serialized as a plain digit sequence with no grouping.

Reproduction

Minimal schema:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="value" type="xs:int"/>
</xs:schema>

Minimal program:

#include <iostream>
#include <locale>
#include <sstream>
#include "schema.hxx"

int main() {
    std::locale::global(std::locale("de_DE.UTF-8"));

    value_t v(42000);
    xml_schema::namespace_infomap map;

    std::ostringstream os;
    os.imbue(std::locale::classic()); // does not help
    value_(os, v, map);

    std::cout << os.str() << std::endl;
}

Expected output:
42000

Actual output:
42.000

The imbue() call on the user-supplied stream has no effect because the
serialization functions internally construct their own ostringstream
which inherits the global locale at construction time.

Root cause

In xsd/cxx/tree/serialization/int.hxx, the insertion operators
construct a basic_ostringstream without imbuing the classic locale
before writing the value. Compare with xsd/cxx/tree/serialization/element.hxx,
where the insert() helper does call imbue(std::locale::classic())
correctly. The fix in element.hxx appears to have been intentional;
the integer-type headers seem to have been overlooked.

Suggested fix

Add os.imbue(std::locale::classic()) immediately after constructing
the basic_ostringstream in each of the affected operator<< overloads,
analogous to the existing fix in element.hxx. The same fix should be
applied consistently across all numeric-type serialization headers
(int.hxx, long.hxx, short.hxx, byte.hxx, the unsigned variants, and
integer.hxx / non-negative-integer.hxx etc.).

Workaround context

This is particularly painful for users who ship XSD-generated code as
part of a library or DLL, where modifying the global C++ locale is not
acceptable (it would affect the host application's behaviour). Per-thread
locale tricks (uselocale on POSIX, _configthreadlocale on Windows) do
not reliably propagate to std::basic_ostringstream in all standard
library implementations, so a library-side fix is the only robust
solution.

Environment

  • XSD: 4.2.0
  • Compiler: msvc 2022
  • OS: windows 11

Happy to provide a patch if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions