PostgreSQL Tokenization: Fix unexpected characters after question mark being silently ignored #2129
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of Issue
Upon encountering '?', the tokenizer first consumes the token and peeks to match any the following: '|' , '&', '-', '#'. If none of the symbols are present it will call
consume_and_return(chars, Token::Question)which consumes an additional character but only returns aToken::Question. This is also reflected intokenize_with_locationwhere the relevantToken::Questionwill have a span of 2 characters.Reproducing the Issue
Both tests will fail on the current main branch.
The Proposed Fix
The PR replaces the call to
self.consume_and_return(chars, Token::Question)withOk(Some(Token::Question))no longer consuming the additional token.Additional considerations
As far as I am aware,
Token::Questionis not a valid PostgreSQL token and the best course of action might be to explicitly not support it.