Tokenizer panics when given weird unicode #70
Labels
area: lex
Issues which affect the lexing and tokenizing of code
issue: bug
Bug report
points: 1
Simple or straightforward changes to the code
priority: low
Consider higher priority issues first
Example:
This actually causes a panic in the tokenizer, because I didn't write it to return a
Result
, and there's a bunch of cases innext()
which check for various properties of thepeek
ed character:There's a (kind of hardcoded check)
char_is_symbol
but we also checkpeek.is_letter()
which is defined in theunicode_categories
crate. The default case where the panic is hit happens if a token begins with a unicode character not in the letter category or meeting any of the other conditions.The tokenizer should be changed to return a
Result
type, as here it should give a basic indication of "this symbol should not be here". This could include lexer errors for other forms of whitespace.The text was updated successfully, but these errors were encountered: