-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential time required to format nested conditional code #52
Comments
Here's a more realistic case that causes a file to be re-parsed a cool 524,288 times {$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$ifdef verXXX}
{$else}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
{$endif} |
@jgardn3r and I discussed a possible rewrite of the parser that could more efficiently handle highly nested conditional code. The idea is to use a kind of state machine, where a stack of states is maintained to represent the current position in the structure of the source. The machine would process one token at a time, and decide what to do with it based only on the current stack of states. Where this shines is when it comes to conditional directives. By decoupling the state of the parsing from the code that does the parsing, it becomes trivial to make a copy of the current state before exploring one of many parsing paths. The only hard thing is to know when to 'return' from the branch of conditional parsing; it isn't sufficient to return once the end of the conditional branch is reached, because the parsing of what comes after the conditional code can depend on what was inside the conditional code. One solution to this is to only return from the branch when a logical line is completed at the same level as the start of the conditional branch. Unfortunately, implementing something like this would be an almost complete rewrite of the parser. This is a hard sell, not only because the parser is already quite complex, but also because there are simpler mitigations to this performance problem involving skipping the complex conditional sections. |
@jgardn3r and I discussed an alternative solution to this problem that doesn't involve completely rewriting the parser. Current ApproachThe current approach is based on exploring the conditional code in 'levels'. Mathematically, the set of all conditional directive paths we currently explore could be represented as For example, in the following code // width 3 at level 0
{$if A} {$elseif B} {$elseif C} {$endif}
// width 1 at level 0
{$if A}
// width 2 at level 1
{$if A}
{$else}
// width 3 at level 2
{$if A} {$elseif B} {$elseif C} {$endif}
{$endif}
{$endif} The
ProblemThe problem with this approach is that it creates many duplicate paths through the file, because it will enumerate all the possibilities inside of a branch that wasn't even taken. Alternative ApproachInstead of exploring the conditional code in 'levels', we could explore the branches more directly by keeping track of which branches have been fully explored. In pseudo-code
|
Currently there's a performance issue with parsing files that contain deeply nested conditional directives.
The approach the parser takes right now is to reprocess the entire file for each conditional branch path.
For example, in the following code
4 passes are performed:
In isolation, this makes sense. It lets the parser explore all of the available code, and generate logical lines that are equivalent to what the a compiler could see after 'preprocessing'.
The problem is that all of the surrounding code in the file also has to be parsed over and over, even if it is unconditional.
A more efficient parser implementation could restrict itself to re-parsing only the parts of the code that are affected by the conditionals. This is easier said than done, but would be important for formatting large files with nested conditional directives.
A pathological case that the current parser will effectively never finish parsing
The text was updated successfully, but these errors were encountered: