Skip to content

local-cse in O3 introduces side-effecting across structurally isolated sub-trees thus prevent const prop & DCE #7470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xuruiyang2002 opened this issue Apr 9, 2025 · 2 comments

Comments

@xuruiyang2002
Copy link
Contributor

xuruiyang2002 commented Apr 9, 2025

Given the following code:

(module
  (import "External" "external_function" (func $external_function))
  (func $_start (param $0 i64) (param $1 i32) (param $2 i32) (param $3 i32)
    (local $4 i32) (local $5 i32)
    i32.const 0
    call $foo
    drop
  )
  (func $foo (param $0 i32) (result i32)
    (local $1 i32) (local $2 i32) (local $4 i32) (local $5 i32) (local $scratch i32) (local $3 i64) (local $6 i64)
      i32.const 77986
      i32.load16_u
      i32.const 16
      local.tee $2
      i32.shl
      local.get $2
      i32.shr_s
      i32.const 65533
      i32.eq
      i32.const 77986
      i32.load16_u
      i32.const 16
      i32.shl
      i32.and
      if (result i32)  ;; label = @4
        call $external_function
        i32.const 0
      else
        i32.const 1
      end
    )
  (memory $0 258 258)
  (export "_start" (func $_start)))

wasm-opt (d0d970c) eliminates the dead br_if body by -all -O2 but can not do that by -all -O3.

Analysis

Similar but different to #7440, this time wasm-opt in O3 introduces side-effect in more complex structure by local-cse:

The figure below depicts how the local.cse transforms the input:

Image

It introduces the local.tee for less code size, however, it also affects the constant propagation and further blocks the dead code elimination.

Different from #7440, whose solution is simply enhancing the optimization of the binaryOp regardless that the side effects of its children, this issues is more complex because the local-cse does CSE in different depth and sub-tree, is there any code logic which can check or deal with this issue?

@kripken
Copy link
Member

kripken commented Apr 9, 2025

We do track the bits of locals, but I guess that isn't enough here - we need to see that the shl and shr_s combine to form a 16-bit sign-extend (after that, comparison to 65533 is always false, as the sign bit of a 16-bit number can't be set without the higher bits as well).

We could in theory track "partial sign-extends". Or, more generally, we could "look through" locals to their source, maybe in some optimize-instructions-propagate that would parallel precompute-propagate (in that it looks through locals, using a LocalGraph; perhaps we could do it concretely by expanding "getFallthrough", that is, we'd look not just at fallthrough values in the expression itself, but that arrive via locals). Another option could be to "expand" local-cse results like this, knowing we can re-optimize them later if we fail to find a pattern for them.

None of those options is simple, and I tend to think the benefit is low, so I'd say this is low priority.

@xuruiyang2002
Copy link
Contributor Author

Understood, and I agree. It’s might be better to leave this for later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants