-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All node response parsers assume utf8 encoding #825
Comments
I don't think octet-stream is appropriate, because the default encoding varies by MIME type, e.g. JSON is utf8 by default. To make things worse some MIME specs are out of touch with reality (e.g. latin1 for text/* IIRC), and in the browser world anything but utf8 is asking for trouble. So I think it should be limited to using encoding that is explicitly specified, and utf8 otherwise. PR welcome. |
i've moved on from the lib so i might not be the best for a pr but here's an example of why assuming utf8 is a bad idea. // superagent (i couldn't get res.body to populate no matter what i did by my point remains valid)
var Encoding = require('encoding-japanese');
require('superagent').get('http://i1.theportalwiki.net/img/6/6b/Portal2_japanese.txt').end(function(err, res) {
console.log('as-is; incorrect output', res.text);
});
require('superagent').get('http://i1.theportalwiki.net/img/6/6b/Portal2_japanese.txt').end(function(err, res) {
console.log('correct conversion; incorrect output', Encoding.convert(res.text, { from: 'UTF16', to: 'UNICODE', type: 'string' }));
});
require('superagent').get('http://i1.theportalwiki.net/img/6/6b/Portal2_japanese.txt').end(function(err, res) {
console.log('this shouldnt work and it doesnt', Encoding.convert(res.text, { from: 'UTF8', to: 'UNICODE', type: 'string' }));
}); // wreck returns Buffer, which is the correct way to store data of unknown encoding i.e. everything on the web
var Encoding = require('encoding-japanese');
require('wreck').get('http://i1.theportalwiki.net/img/6/6b/Portal2_japanese.txt', function(err, res, payload) { console.log('correct conversion; correct output', Encoding.convert(payload, { from: 'UTF16', to: 'UNICODE', type: 'string' })); });
require('wreck').get('http://i1.theportalwiki.net/img/6/6b/Portal2_japanese.txt', function(err, res, payload) { console.log('equiv to what superagent does; incorrect output', payload.toString('utf8')); }); There is no reason to setEncoding on the response stream. You can just concat buffers (as image parser does) and then call toString('utf8') on the resulting buffer if that's what you really want to do. And FWIW JSON.parse works on Buffers: var buf = new Buffer('{"blah":"i am a buffer"}')
JSON.parse(buf); |
I see that in general there could be different approaches to handling files without any encoding specified, but the example URL you've mentioned clearly specifies that it's a UTF-8 file:
So handling it as UTF-8 is perfectly correct in that case. If that file is not really a UTF-8 plain text file, it's the fault of the server for sending wrong information. |
Yes, you're correct. And I can't come up with a way to retrieve that page using superagent and get correct output. any ideas? any chance this behavior could get a mention in the docs? |
I'm going to leave this bug open to add support for As for working around incorrect server response for that that URL, I don't think it's possible in node version of superagent. In the browser version you could set |
I noticed all parsers assume utf8.
e.g. text parser:
It seems like they all should get the encoding from the response content-type header and fall back to binary (application/octet-stream) if none exists.
Can PR, if desired.
The text was updated successfully, but these errors were encountered: