Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential time required to format nested conditional code #52

Open
zaneduffield opened this issue Apr 11, 2024 · 3 comments · May be fixed by #231
Open

Exponential time required to format nested conditional code #52

zaneduffield opened this issue Apr 11, 2024 · 3 comments · May be fixed by #231

Comments

@zaneduffield
Copy link
Collaborator

Currently there's a performance issue with parsing files that contain deeply nested conditional directives.

The approach the parser takes right now is to reprocess the entire file for each conditional branch path.
For example, in the following code

{$ifdef A}
Foo({$ifdef B} B {$else} NB {$endif});
{$else}
Bar;
{$endif}

{$ifdef A2}
Foo2({$ifdef B2} B2 {$else} NB2 {$endif});
{$else}
Bar2;
{$endif}

4 passes are performed:

  • level 0: branch 0 & level 1: branch 0
    • Foo(B);
      Foo2(B2);
  • level 0: branch 0 & level 1: branch 1
    • Foo(NB);
      Foo2(NB2);
  • level 0: branch 1 & level 1: branch 0
    • Bar;
      Bar2;
  • level 0: branch 1 & level 1: branch 1
    • Bar;
      Bar2;

In isolation, this makes sense. It lets the parser explore all of the available code, and generate logical lines that are equivalent to what the a compiler could see after 'preprocessing'.

The problem is that all of the surrounding code in the file also has to be parsed over and over, even if it is unconditional.

A more efficient parser implementation could restrict itself to re-parsing only the parts of the code that are affected by the conditionals. This is easier said than done, but would be important for formatting large files with nested conditional directives.


A pathological case that the current parser will effectively never finish parsing

Foo(
  {$ifdef A}
    {$ifdef B}
      {$ifdef C}
        {$ifdef D}
          {$ifdef E}
            {$ifdef F}
            {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
            {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
            {$endif}
          {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
          {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
          {$endif}
        {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
        {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
        {$endif}
      {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
      {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
      {$endif}
    {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
    {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
    {$endif}
  {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
  {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif} {$elseif}
  {$endif}
)

@zaneduffield
Copy link
Collaborator Author

Here's a more realistic case that causes a file to be re-parsed a cool 524,288 times

{$ifdef verXXX}
{$else}
  {$ifdef verXXX}
  {$else}
    {$ifdef verXXX}
    {$else}
      {$ifdef verXXX}
      {$else}
        {$ifdef verXXX}
        {$else}
          {$ifdef verXXX}
          {$else}
            {$ifdef verXXX}
            {$else}
              {$ifdef verXXX}
              {$else}
                {$ifdef verXXX}
                {$else}
                  {$ifdef verXXX}
                  {$else}
                    {$ifdef verXXX}
                    {$else}
                      {$ifdef verXXX}
                      {$else}
                        {$ifdef verXXX}
                        {$else}
                          {$ifdef verXXX}
                          {$else}
                            {$ifdef verXXX}
                            {$else}
                              {$ifdef verXXX}
                              {$else}
                                {$ifdef verXXX}
                                {$else}
                                  {$ifdef verXXX}
                                  {$else}
                                    {$ifdef verXXX}
                                    {$else}
                                    {$endif}
                                  {$endif}
                                {$endif}
                              {$endif}
                            {$endif}
                          {$endif}
                        {$endif}
                      {$endif}
                    {$endif}
                  {$endif}
                {$endif}
              {$endif}
            {$endif}
          {$endif}
        {$endif}
      {$endif}
    {$endif}
  {$endif}
{$endif}

@zaneduffield
Copy link
Collaborator Author

@jgardn3r and I discussed a possible rewrite of the parser that could more efficiently handle highly nested conditional code.

The idea is to use a kind of state machine, where a stack of states is maintained to represent the current position in the structure of the source. The machine would process one token at a time, and decide what to do with it based only on the current stack of states.

Where this shines is when it comes to conditional directives. By decoupling the state of the parsing from the code that does the parsing, it becomes trivial to make a copy of the current state before exploring one of many parsing paths.

The only hard thing is to know when to 'return' from the branch of conditional parsing; it isn't sufficient to return once the end of the conditional branch is reached, because the parsing of what comes after the conditional code can depend on what was inside the conditional code. One solution to this is to only return from the branch when a logical line is completed at the same level as the start of the conditional branch.

Unfortunately, implementing something like this would be an almost complete rewrite of the parser. This is a hard sell, not only because the parser is already quite complex, but also because there are simpler mitigations to this performance problem involving skipping the complex conditional sections.

@zaneduffield
Copy link
Collaborator Author

@jgardn3r and I discussed an alternative solution to this problem that doesn't involve completely rewriting the parser.

Current Approach

The current approach is based on exploring the conditional code in 'levels'.
It calculates the maximum branching factor at each conditional directive depth and then enumerates the cartesian product of the ranges of branching factors at each depth.

Mathematically, the set of all conditional directive paths we currently explore could be represented as
$$P = \prod_{depth=0}^{maxdepth} { 1, 2, \dots, maxwidth_{\text{depth}} }$$

For example, in the following code

// width 3 at level 0
{$if A} {$elseif B} {$elseif C} {$endif} 
// width 1 at level 0
{$if A} 
  // width 2 at level 1
  {$if A}
  {$else}
    // width 3 at level 2
    {$if A} {$elseif B} {$elseif C} {$endif}
  {$endif}
{$endif}

The maxdepth is 2, and maxwidth_0 is 3, maxwidth_1 is 2, maxwidth_2 is 3. The paths explored are

1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
2 1 1
2 1 2
2 1 3
2 2 1
2 2 2
2 2 3

Problem

The problem with this approach is that it creates many duplicate paths through the file, because it will enumerate all the possibilities inside of a branch that wasn't even taken.

Alternative Approach

Instead of exploring the conditional code in 'levels', we could explore the branches more directly by keeping track of which branches have been fully explored.

In pseudo-code

  • while exists(branch not fully explored)
    • create a new 'pass' of the file
    • pass:
    • take tokens to the next conditional directive, or mark the current branch (if any) as fully explored and return
    • 'take' the first branch of the directive that hasn't yet been fully explored (or the first if all have been explored)
    • recurse into pass:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant