Skip to content

PHP 8.4 | Add tokenization of asymmetric visibility #871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

DanielEScherzer
Copy link
Contributor

@DanielEScherzer DanielEScherzer commented Mar 11, 2025

Description

Set up tokenization of asymmetric visibility modifiers:

  • Add tokens T_PUBLIC_SET, T_PROTECTED_SET, and T_PRIVATE_SET if not already defined
  • Consider them scope modifiers (PHP_CodeSniffer\Util\Tokens::$scopeModifiers)
  • Mark them as context sensitive (PHP_CodeSniffer\Util\Tokens::$contextSensitiveKeywords)
  • Store their known lengths (PHP_CodeSniffer\Tokenizers\PHP::$knownLengths)
  • Backfill tokenization for older versions of PHP

Suggested changelog entry

PHP 8.4 | Support tokenization of asymmetric visibility modifiers

Related issues/external references

Related to #851

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
    • This change is only breaking for integrators, not for external standards or end-users.
  • Documentation improvement

PR checklist

  • I have checked there is no other PR open for the same change.
  • I have read the Contribution Guidelines.
  • I grant the project the right to include and distribute the code under the BSD-3-Clause license (and I have the right to grant these rights).
  • I have added tests to cover my changes.
  • I have verified that the code complies with the projects coding standards.
  • [Required for new sniffs] I have added XML documentation for the sniff.

Copy link
Member

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielEScherzer Awesome work on this !

I have left a few comments and questions in-line, but generally speaking, this is looking great!

One additional question: how did you determine where in the (huge) PHP::tokenize() method, the polyfill should go ?
It is currently between the tokenization blocks for T_ENUM_CASE and PHP 8.0 namespaced names handling.
Is there a particular reason for that ?

Comment on lines +745 to +752
T_PRIVATE_SET => T_PRIVATE_SET,
T_PROTECTED => T_PROTECTED,
T_PROTECTED_SET => T_PROTECTED_SET,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is correct... Should these "keywords" ever be changed into T_STRING depending on their context.... ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but it looked like all of the other keywords were there, so I added them - should I remove?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except, while these are tokens, these are not keywords.

This list is primarily for use with the PHP Tokenizer correction of context sensitive keyword tokenization. What that means is as follows:

class Foreach {
    function exit() {}
}

Foreach and exit in the above code sample should be (re-)tokenized to T_STRING and should not be tokenized as T_FOREACH or T_EXIT (which is the PHP native tokenization in parse-error tolerant mode).

As the T_*_SET tokens are combined tokens, they cannot be used in those places anyway - class Private(set) would be a parse error -, so they are irrelevant for the context sensitive keyword list. Same as the cast tokens are not part of this list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when I remove the tokens from here, the ContextSensitiveKeywordsTest.php starts failing saying that things were T_STRING rather than the correct T_*_SET tokens

@DanielEScherzer
Copy link
Contributor Author

One additional question: how did you determine where in the (huge) PHP::tokenize() method, the polyfill should go ? It is currently between the tokenization blocks for T_ENUM_CASE and PHP 8.0 namespaced names handling. Is there a particular reason for that ?

No idea, sorry

@DanielEScherzer
Copy link
Contributor Author

@jrfnl my understanding is that this is waiting for re-review, just want to check that you aren't waiting for me to do something and have us both just wait

@jrfnl
Copy link
Member

jrfnl commented Mar 28, 2025

@jrfnl my understanding is that this is waiting for re-review, just want to check that you aren't waiting for me to do something and have us both just wait

Sorry, yes, been busy with some other things. Let me have a look now.

Copy link
Member

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielEScherzer Thanks for the updates you've made.

While looking at this PR again - and in particularly, looking at the changes made to the predefined Tokens::$* properties, I realized more changes are needed. In particular related to the correct handling by the tokenizer of |, & and parenthesis in union, intersection and DNF types. See #907 as a loosely related example.
And while with the current changes, I believe the tokenization of ? in nullable types in combination with asym visibility will be handled correctly, this will need to be safeguarded by tests. See #908 for another loosely related example.

I've also looked at the tests again.
I would prefer for the "valid" asym visibility tests and the "invalid" tests to be split into two test methods.
I believe the testAsymmetricVisibility() method should, in that case, not need the $testContent parameter.

As for the "invalid" tests method - I think it might be worth it to do a bit more strenuous testing on the other tokens after the visibility keyword not being retokenized incorrectly.
I.e. test a token sequence for those cases.

Also regarding the "invalid" tests - I still miss tests with (valid) PHP code, which is not asym visibility, like the below:

class Foo {
    function protected() {}
    function public(Set $setter) {}
}

Those tests can go either in the new test file or in the test file for ContextSensitiveKeywords, but safeguarding that this valid code is not negatively affected is important to me (and yes, your implementation should not cause any problem, but this is about making sure that potential future changes do not break this).

Comment on lines +745 to +752
T_PRIVATE_SET => T_PRIVATE_SET,
T_PROTECTED => T_PROTECTED,
T_PROTECTED_SET => T_PROTECTED_SET,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except, while these are tokens, these are not keywords.

This list is primarily for use with the PHP Tokenizer correction of context sensitive keyword tokenization. What that means is as follows:

class Foreach {
    function exit() {}
}

Foreach and exit in the above code sample should be (re-)tokenized to T_STRING and should not be tokenized as T_FOREACH or T_EXIT (which is the PHP native tokenization in parse-error tolerant mode).

As the T_*_SET tokens are combined tokens, they cannot be used in those places anyway - class Private(set) would be a parse error -, so they are irrelevant for the context sensitive keyword list. Same as the cast tokens are not part of this list.

@DanielEScherzer
Copy link
Contributor Author

@DanielEScherzer Thanks for the updates you've made.

While looking at this PR again - and in particularly, looking at the changes made to the predefined Tokens::$* properties, I realized more changes are needed. In particular related to the correct handling by the tokenizer of |, & and parenthesis in union, intersection and DNF types. See #907 as a loosely related example. And while with the current changes, I believe the tokenization of ? in nullable types in combination with asym visibility will be handled correctly, this will need to be safeguarded by tests. See #908 for another loosely related example.

Done for matching 907, after 908 is merged I'll add test cases there

I've also looked at the tests again. I would prefer for the "valid" asym visibility tests and the "invalid" tests to be split into two test methods. I believe the testAsymmetricVisibility() method should, in that case, not need the $testContent parameter.

I split them, and kept $testContent to assert that the token content is correct

class Foo {
    function protected() {}
    function public(Set $setter) {}
}

Done

@jrfnl
Copy link
Member

jrfnl commented Apr 1, 2025

Done for matching 907, after 908 is merged I'll add test cases there

@DanielEScherzer I've merged both #907 as well as #908 now, so you should be able to make that update now.

@DanielEScherzer
Copy link
Contributor Author

Failure is from HTTP 405 error when uploading code coverage, which I think is unrelated to the actual changes

@jrfnl
Copy link
Member

jrfnl commented Apr 6, 2025

Failure is from HTTP 405 error when uploading code coverage, which I think is unrelated to the actual changes

💯 Coveralls is doing some disruptive maintenance: https://status.coveralls.io/ (also the reason my own recent PRs are all marked as failing). I will retrigger the relevant builds once the maintenance has finished.

@DanielEScherzer
Copy link
Contributor Author

Done for matching 907, after 908 is merged I'll add test cases there

@DanielEScherzer I've merged both #907 as well as #908 now, so you should be able to make that update now.

This was done, but I can't remove the Status: awaiting feedback tag myself

@jrfnl
Copy link
Member

jrfnl commented Apr 9, 2025

Thanks @DanielEScherzer. I'm hoping to take another look over the next few days, but what with the PRs for 4.0 being lined up (see #924), I may delay this PR until after next week (though it would still go into 3.x). Please bear with me. It is on my radar though.

@jrfnl
Copy link
Member

jrfnl commented Apr 9, 2025

P.S.: I see there is another merge conflict. Sorry about that. PR #948 also touches that same sniff, so you may want to hold off resolving the conflict until that PR has been merged.

@DanielEScherzer
Copy link
Contributor Author

P.S.: I see there is another merge conflict. Sorry about that. PR #948 also touches that same sniff, so you may want to hold off resolving the conflict until that PR has been merged.

Done the holding off, and done the subsequent resolving of the conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants