-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frontend: parse every error message to utf-8, ignore non-parsable chars #3651
base: main
Are you sure you want to change the base?
Conversation
059d40a
to
e338e9e
Compare
Pull Request validationFailed🔴 Review - Missing review from a member (2 required) Success🟢 CI - All checks have passed |
e338e9e
to
3329aaf
Compare
I updated the parsing - it doesn't store the bytes, but it rather parses the string right away so we don't work with bytes. I also updated the commit message/PR description, PTAL |
If user sends to API some weird non-utf-8 character and we then try to log this, the logging freaks out with `logging error` (see [1]). [2] fixes this, but only for one endpoint and one type of user input, but there are many ways how user can do this. On the other hand, [2] caused bad formating and now we got messages like `b'Project a/b does not exist'`. This patch tries to parse every error message we want to store, and maybe later log it, to utf-8 and store it as a string. If some non-utf-8 character appears in the message output, its byte won't be pares and it will be stored/printed as the string representation of that byte. Example: User calls https://copr.fedorainfracloud.org/coprs/g/some<nasty_characters>x02string the string value in CoprHttpException.message will be some\x01\x02string where the \x01\x02 is example of the <nasty_characters> [1] - fedora-copr#3072 [2] - fedora-copr#3076 Fix fedora-copr#3644
043086c
to
4c8b3ec
Compare
Strangely enough, when I use code from
The error on the page is:
When I simply revert the code from PR #3076, I get:
and not the original #3072. So either we can simply revert it, or I need to find a better reproducer for the original issue. |
aha, it is true that utf-8 characters like "…" work now, you are right. I've been researching why this is, and I've concluded that we must have had the locale set to US.US-ASCII or something similar on the old F39 machines at the time. Now we have US.UTF-8. That's probably why this error used to occur, because logging uses by default io.text_encoding, which defaults to Python's UTF-8 mode and returns either locale or UTF-8 if the locale is not set. In that case it was trying to log to ASCII instead of UTF-8. In any case, it just reduced the set of characters on which this error occurs. Non-UTF-8 characters (such as UTF-16) will fall with the same error. So there are two possibilities:
what do you think about it? |
The old reproducer was: b'Project t0xic0der/nvid\xe2\x80\xa6 does not exist.' That is in utf:
This doesn't fail the logger on F41 (current prod). It is ugly, yes, the logging code still dumps the message using the But at least it doesn't try to encode elpisis into ASCII.
Yes, please, revert. I tested with str=$(dd if=/dev/urandom bs=8 count=1) and I got:
We should be safe; no tracebacks. But if we miss something, we'll at least get the reproducer. |
I've concluded that we must have had the locale set to US.US-ASCII or something similar on the old F39 machines at the time. Now we have US.UTF-8. That's probably why this error used to occur, because logging uses by default io.text_encoding, which defaults to Python's UTF-8 mode and returns either locale or UTF-8 if the locale is not set. In that case it was trying to log to ASCII instead of UTF-8.
this will still mean that we got logging error when someone sends e.g. utf32 only characters |
If user sends to API some weird non-utf-8 character and we then
try to log this, the logging freaks out with
logging error
(see [1]).[2] fixes this, but only for one endpoint and one type of user input,
but there are many ways how user can do this. On the other hand, [2]
caused bad formating and now we got messages like
b'Project a/b does not exist'
.This patch tries to parse every error message we want to store, and maybe
later log it, to utf-8 and store it as a string. If some non-utf-8
character appears in the message output, its byte won't be pares and it
will be stored/printed as the string representation of that byte.
Example:
User calls https://copr.fedorainfracloud.org/coprs/g/some<nasty_characters>string
the string value in CoprHttpException.message will be some\x01\x02string
where the \x01\x02 is example of the <nasty_characters>
[1] - #3072
[2] - #3076
Fix #3644