Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle directives in trailing trivia #75595

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jjonescz
Copy link
Member

Fixes #75583.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 23, 2024
@jjonescz jjonescz marked this pull request as ready for review October 23, 2024 11:22
@jjonescz jjonescz requested a review from a team as a code owner October 23, 2024 11:22
@AlekseyTs
Copy link
Contributor

AlekseyTs commented Oct 23, 2024

Done with review pass (commit 1) #Closed

@jjonescz jjonescz marked this pull request as draft October 23, 2024 15:40
@jjonescz jjonescz removed the request for review from CyrusNajmabadi October 23, 2024 15:40
@jjonescz jjonescz marked this pull request as ready for review October 29, 2024 12:17
@jjonescz jjonescz requested a review from a team as a code owner October 29, 2024 12:17
@AlekseyTs
Copy link
Contributor

AlekseyTs commented Oct 29, 2024

Done with review pass (commit 5) #Closed

@CyrusNajmabadi
Copy link
Member

It looks like we're still ending up with a directive in trailing trivia. Is that correct?

@jjonescz
Copy link
Member Author

It looks like we're still ending up with a directive in trailing trivia. Is that correct?

Yes, I chose to parse these trailing directives as BadDirectiveTrivia. Is that a problem?

I first tried to attach a skipped trivia onto a hash token and put that into the trailing trivia. But that failed some checks because hash token is not a trivia. Which makes sense. BadDirectiveTrivia is a trivia, so I used that instead. But if you have other syntax tree shape in mind for these scenarios, I'm open to changing this.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Oct 29, 2024

But that failed some checks because hash token is not a trivia.

With this approach the hash token should be a skipped trivia as well, i.e. not a token

#Closed

@CyrusNajmabadi
Copy link
Member

Yes, I chose to parse these trailing directives as BadDirectiveTrivia. Is that a problem?

My preference (but I could be convinced otherwise) is that we never have any sort of directive trivia as trailing trivia.

That is then very easy to remember and have invariants around.

If you guys feel otherwise, I'm happy to discuss and compromise on that. However, given that it's never legal for the language, and adds complexity in all the upper layers to consider this case, my preference instead is to make these to skipped tokens.

If you do think allowing BadDirective as trailing is preferable to skipped tokens, I'm ok with that.

Lmk what you all think! :-)

// (1,9): error CS1002: ; expected
// if (#if)
TestBase.Diagnostic(ErrorCode.ERR_SemicolonExpected, "").WithLocation(1, 9),
// (1,9): error CS1027: #endif directive expected
Copy link
Contributor

@AlekseyTs AlekseyTs Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// (1,9): error CS1027: #endif directive expected

This error feels unexpected. There are similar unexpected changes to other tests in commit 8. #Closed

Copy link
Member Author

@jjonescz jjonescz Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same set of diagnostics that would be reported without this PR.

Before commit 8, we missed some diagnostics because we parsed the directive as BadDirective. In commit 8, we parse it again as normal #if directive, just put it into skipped tokens afterwards, but the diagnostics are preserved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit 8, we parse it again as normal #if directive, just put it into skipped tokens afterwards

Calling the parse helper is an implementation detail. The tree reflects how we parsed the directive and we don't have the directive in the tree. Effectively, that means that the tokens were simply skipped (the mechanism used to skip them is not important), not treated as a directive. Therefore, it is unexpected and undesired (in my opinion) to produce any diagnostics that might imply otherwise.

but the diagnostics are the same.

We change parsing result, diagnostics should change accordingly, I think.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 8)

// Directives cannot be in trailing trivia - parse as skipped tokens trivia instead.

var directiveBuilder = new SyntaxListBuilder(10);
this.LexDirectiveAndExcludedTrivia(afterFirstToken, afterNonWhitespaceOnLine: true, ref directiveBuilder);
Copy link
Contributor

@AlekseyTs AlekseyTs Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.LexDirectiveAndExcludedTrivia(afterFirstToken, afterNonWhitespaceOnLine: true, ref directiveBuilder);

Is it possible that this process is going to accumulate some side information which we might not want to accumulate? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It accumulates diagnostics attached to the tokens, but those seem useful, I think we want to accumulate those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It accumulates diagnostics attached to the tokens

It looks like we somehow get diagnostics about missing end constructs for directives that we ignore here. Is that diagnostic reported here? Otherwise, this method has side effects beyond just tokens that it returns. I do no think we want those side effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to report a regular unexpected token error, skip just the # token, and continue parsing as if it wasn't there. What we are trying to do right now feels unnecessary complicated to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to report a regular unexpected token error, skip just the # token, and continue parsing as if it wasn't there. What we are trying to do right now feels unnecessary complicated to me.

Sounds good, I will do that, thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to report a regular unexpected token error, skip just the # token, and continue parsing as if it wasn't there. What we are trying to do right now feels unnecessary complicated to me.

@AlekseyTs I have implemented this. The disadvantage of this approach is that it makes error recovery considerably worse.

To summarize the options:

  • Skip # token, continue parsing as if it wasn't there. Worse error recovery than before.
  • Parse the trailing directives into skipped tokens. Should we parse the directive as normal, including disabled text, error reporting, then just convert the tokens to skipped tokens? If not, how should we parse?
  • Parse as BadDirective. A bit worse error recovery because we parse as a generic directive. Technically, it's still a directive in trailing trivia (so doesn't hold the invariant).
  • Allow directives in trailing trivia in error scenarios. Very simple implementation, just remove the assert and everything continues working (error recovery, IDE refactoring features). We still have the invariant that there are no directives in trailing trivia in success scenarios. (Currently, this seems as the best option to me.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still parse it like before, but represent in the tree however you want. This is very common in the parser.

My main preference is simply that we do not have this structured trivia as trailing trivia.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi would the current state of the PR work for you? We skip the invalid #, report an error for it, and continue parsing as if it wasn't there.

You can still parse it like before, but represent in the tree however you want. This is very common in the parser.

If it's so common, is there some helper for it? I found SyntaxParser.AddSkippedSyntax. I don't know how I would use it here directly because I don't have any SyntaxToken to attach the SkippedTokensTrivia onto.

Also Aleksey previously didn't like the fact we are forwarding the diagnostics from parsing the directive onto the skipped tokens. Although it seems to me that's how it's supposed to work because the SyntaxParser.AddSkippedSyntax explicitly does that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I would use it here directly because I don't have any SyntaxToken to attach the SkippedTokensTrivia onto.

Aren't you producing the token? So you would attach to the token that you're just about to make :)

If it's so common

By common, i mean the concept of: have the parser take steps to determine what is going on, but then determine how best to fit that into the syntactic model.

In other words, just because you parsed out a directive, doesn't mean you need to place it in the tree as a directive.

Also Aleksey #75595 (comment) the fact we are forwarding the diagnostics from parsing the directive onto the skipped tokens

Yeah. I don't have strong feelings/thoughts about that. My general point is simply that i would prefer teh final syntactic model not have directives within trailing trivia.

--

Note: it might be easier for us to chat. But i'll try to share my thoughts clearly here. If you need more clarity on my views, please feel free to ask :)

--

The lexer/parser is effectively a complex program that can execute arbitrary logic to try to decide what the user code means and how it should represent it. So, to me, it's fine for it to do things like actually parse out directives wherever it finds them, to have good error recovery. However, that execution logic is orthogonal to the decisions it then makes in terms of final syntactic tree representation (including diagnostics).

So, personally, i think it would be fine to:

  1. parse out as a directive, for the best recovery experience.
  2. take the results of that parsing, and re-represent them as a skipped token when placing in the tree. Note: for this i would make this extremely simple. Literally take the entire contents of the parsed directive, and just make a single '.BadToken' whose contents are teh entirety of the text of the entire directive.

The end representation then never has a directive in trailing trivia. Nor does it expose that we produced special directive diagnostics. All teh user sees is a message saying # is not expected there. And all the tree has is a skipped token representing that entire chunk that we don't feel is valid there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead VSCode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assert in SyntaxNode.ContainsDirective
4 participants