-
Notifications
You must be signed in to change notification settings - Fork 167
Description
I discovered today that non-ASCII UTF-8 characters do not survive round trips through the IDE's save and load to file functions, instead getting silently corrupted in a non-reversible manner. Although the Espruino may not have full Unicode support, I think Unicode code comments at least ought to be considered legitimate, and the IDE will mangle those irreparably if they include characters in other scripts/languages.
Specifically, in Firefox, both save and load will corrupt the characters. In Chrome/Chromium, UTF-8 files appear to load correctly but not save without corruption.
For Chrome/Chromium, I believe ticalc-travis@af96174 would fix the save issue (disclaimer: only tested briefly). The problem here is that it's trying to convert the text to a Blob by instantiating a Uint8Array and trying to write each byte individually in a loop using String.charCodeAt, which for Unicode strings may return values above 255. Trying to store those as Uint8 will naturally mangle them. Passing the string itself directly to the Blob constructor instead should preserve the encoding.
For Firefox/non-Chrome, there is a separate code path for save/load which, as far as I can tell, calls into code in the EspruinoTools repo, specifically fileOpenDialog and fileSaveDialog. This code has the same issue described above, but there's a comment in fileOpenDialog that suggests it's specifically trying to avoid UTF-8 interpretation. I'm not sure why, as I can't imagine how arbitrarily clipping Unicode points to a 0…255 range would be useful. Am I missing some context? At any rate, since that's in a separate repository, I'm not sure how to live-test changes to this code with the IDE.