[bitreq] Check utf-8 while deserializing JSON body#486
[bitreq] Check utf-8 while deserializing JSON body#486tnull merged 2 commits intorust-bitcoin:masterfrom
Conversation
|
This came up while me and @tnull were working on migrating to bitreq in lightningdevkit/vss-client#56 |
| T: serde::de::Deserialize<'a>, | ||
| { | ||
| match serde_json::from_str(self.as_str()?) { | ||
| match serde_json::from_slice(self.as_bytes()) { |
There was a problem hiding this comment.
Do we really prefer the serde error over the utf8 error? I'm skeptical the claimed perf difference matters (notably when deserializing a string in the json serde skips utf8 validation if you call from_str whereas has to do it if you call from_slice). I don't have a strong feeling on the error but the commit should explain why we prefer the swap.
There was a problem hiding this comment.
Thanks Matt see below, I have expanded the commit message.
I'm skeptical the claimed perf difference matters
The claim is that a reduction in cache misses leads to non-trivial performance gains.
We still do the same amount of work on the data.
While deserializing, `serde_json::from_slice` validates utf-8 as needed. So instead of making two passes on the response body, one to validate utf-8, and another to deserialize the object, we let `serde_json::from_slice` check utf-8 as needed during deserialization. Making a single pass over large response bodies reduces the number of cache misses, and hence decreases the cycles taken to fully deserialize such responses. `Response::json` now returns `Error::SerdeJsonError` instead of `Error::InvalidUtf8InBody` if invalid utf-8 is found during deserialization. For this error case, the `Error::SerdeJsonError` inner type `serde_json::error::Error` is of category `serde_json::error::Category::Syntax`. Other JSON syntax errors are also assigned to this category. We accept this loss of information given the performance gain described above.
We avoid creating a `String` only to immediately convert it back to its inner `Vec<u8>`. This also mirrors the `serde_json::from_slice` call made when parsing a `Response` body as JSON.
a3ae780 to
2c3bbf2
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Mmm, fair enough.
ACK 2c3bbf2
and