-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Summary
lxml.etree.XMLSyntaxError is raised in httpops.py when logging HTTP responses (or requests) whose XML contains non-ASCII UTF-8 characters (e.g. accented letters in attribute values).
Root Cause
In _log_response (and _log_request), the code converts response.content (bytes) to a printable string via repr(), then feeds it back into ET.fromstring():
rawtext = repr(response.content)[2:-1] # bytes → escaped str ("héllo" → "h\\xc3\\xa9llo")
# ...substitutions for \r, \n, \t...
tree = ET.fromstring(rawtext.encode()) # str → bytes with literal backslashes → invalid XMLrepr() turns non-ASCII bytes into Python escape sequences (\xc3\xa9). Re-encoding that string does not restore the original UTF-8 bytes — it produces bytes containing literal backslash-x sequences. lxml then fails because those sequences are not valid XML:
lxml.etree.XMLSyntaxError: error parsing attribute name, line 46, column 14
The same pattern exists in _log_request (repr(request.body)[1:-1]).
Affected Code
httpops.py—_log_response(response body logging)httpops.py—_log_request(request body logging)
Suggested Fix
Parse XML from the original bytes (response.content / request.body) instead of the repr()-then-encode() round-trip. Wrap in try/except so malformed content doesn't crash logging:
# _log_response — before
tree = ET.fromstring(rawtext.encode())
ET.indent(tree, space=" ")
rawtext = ET.tostring(tree).decode()
# _log_response — after
try:
tree = ET.fromstring(response.content)
ET.indent(tree, space=" ")
rawtext = ET.tostring(tree, encoding='unicode')
except Exception:
pass # keep rawtext as-is# _log_request — before
tree = ET.fromstring(rawtext)
ET.indent(tree, space=" ")
rawtext = ET.tostring(tree)
# _log_request — after
try:
body_bytes = request.body if isinstance(request.body, bytes) else request.body.encode('utf-8')
tree = ET.fromstring(body_bytes)
ET.indent(tree, space=" ")
rawtext = ET.tostring(tree, encoding='unicode')
except Exception:
pass # keep rawtext as-isReproduction
Any OSLC/DNG response containing non-ASCII UTF-8 characters in XML attribute values (e.g. requirement titles with accented characters) will trigger the crash during TRACE-level logging.
Example
content = '<?xml version="1.0"?><root name="café"/>'.encode('utf-8')
rawtext = repr(content)[2:-1]
# rawtext = '<?xml version="1.0"?><root name="caf\\xc3\\xa9"/>'
ET.fromstring(rawtext.encode()) # XMLSyntaxError