Skip to content

Tokenizer: attribute retokenization on PHP < 8.0 may remove tokens/content #1279

@jrfnl

Description

@jrfnl

Describe the bug

Given the following code sample (which is a parse error - missing attribute closing bracket):

#[AttributeName(10)
function hasUnfinishedAttribute() {}

On PHP >= 8.0, this will tokenize as follows:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [  5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

  2 | L3 | C  1 | CC 0 | ( 0) | T_ATTRIBUTE                | [  2]: #[
  3 | L3 | C  3 | CC 0 | ( 0) | T_STRING                   | [ 13]: AttributeName
  4 | L3 | C 16 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [  1]: (
  5 | L3 | C 17 | CC 0 | ( 1) | T_LNUMBER                  | [  2]: 10
  6 | L3 | C 19 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [  1]: )
  7 | L3 | C 20 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

  8 | L4 | C  1 | CC 0 | ( 0) | T_FUNCTION                 | [  8]: function
  9 | L4 | C  9 | CC 0 | ( 0) | T_WHITESPACE               | [  1]: ⸱
 10 | L4 | C 10 | CC 0 | ( 0) | T_STRING                   | [ 22]: hasUnfinishedAttribute
 11 | L4 | C 32 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [  1]: (
 12 | L4 | C 33 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [  1]: )
 13 | L4 | C 34 | CC 0 | ( 0) | T_WHITESPACE               | [  1]: ⸱
 14 | L4 | C 35 | CC 0 | ( 0) | T_OPEN_CURLY_BRACKET       | [  1]: {
 15 | L4 | C 36 | CC 0 | ( 0) | T_CLOSE_CURLY_BRACKET      | [  1]: }
 16 | L4 | C 37 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

... while on PHP < 8.0, this will tokenize like this:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [  5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

  2 | L3 | C  1 | CC 0 | ( 0) | T_ATTRIBUTE                | [  2]: #[
  3 | L3 | C  3 | CC 0 | ( 0) | T_STRING                   | [  8]: function
  4 | L3 | C 11 | CC 0 | ( 0) | T_WHITESPACE               | [  1]: ⸱
  5 | L3 | C 12 | CC 0 | ( 0) | T_STRING                   | [ 22]: hasUnfinishedAttribute
  6 | L3 | C 34 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [  1]: (
  7 | L3 | C 35 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [  1]: )
  8 | L3 | C 36 | CC 0 | ( 0) | T_WHITESPACE               | [  1]: ⸱
  9 | L3 | C 37 | CC 0 | ( 0) | T_OPEN_CURLY_BRACKET       | [  1]: {
 10 | L3 | C 38 | CC 0 | ( 0) | T_CLOSE_CURLY_BRACKET      | [  1]: }
 11 | L3 | C 39 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

Take note that the PHP < 8.0 token stream is missing the tokens for the AttributeName(10) part of the file.

To reproduce

Steps to reproduce the behavior:

  1. Create a file called test.php with the code sample above...
  2. Run phpcs test.php --standard=PHPCSDebug with PHP 7.4 and PHP 8.4
  3. See the above posted token streams

Expected behavior

For the token stream to be consistent across PHP versions and for the token stream to include all file content.

Versions (please complete the following information)

Operating System not relevant
PHP version PHP < 8.0
PHP_CodeSniffer version Both 3.x as well as 4.x
Standard not relevant
Install type not relevant

Additional context

Even though this is a parse error, the PHPCS Tokenizer should IMO never remove actual file content from the token stream.

This also means this is a serious enough bug to warrant the fix to still be allowed into PHPCS 3.x.

Please confirm

  • I have searched the issue list and am not opening a duplicate issue.
  • I have read the Contribution Guidelines and this is not a support question.
  • I confirm that this bug is a bug in PHP_CodeSniffer and not in one of the external standards.
  • I have verified the issue still exists in the 4.x branch of PHP_CodeSniffer.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions