-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evalPrimOp is a bottle-neck due to linear search #8
Comments
Hi, see: |
Wow, that was fast! And it also resulted in a great speedup, now my reduced benchmark case takes 35s instead of 55s. Great work! Another perhaps superior suggestion might be to intern all primop names and use an IntMap/Array, but the profile suggests that |
I think I closed prematurely, after all the changes aren't on master yet. But I think f5e6806 fixes the issue. |
While the implementation of, e.g.,
...ByteArray.evalPrimOp
is rather direct and elegant at the momentIt yields rather slow code. That is because GHC doesn't do the "obvious" trie-based match optimisation, which means we end up with a linear chain of comparisons
in Core.
I took a profile (on
nofib
"sbernoulli
, if that matters) and...ByteArray.evalPrimOp
takes about 10% of time and allocation. Here is an excerpt of the profileI think a bit of focus on optimising
evalPrimOp
may well speed up the interpreter by 50%. One way to do so would perhaps be to use aHashMap
or Trie to do the lookup.Why optimise anyway? Because at the moment a single run of NoFib's
bernoulli
benchmark takes about half an hour when the compiled program takes just 0.1s. That's quite a deal breaker for an exhaustive benchmark run of all 11* benchmarks.The text was updated successfully, but these errors were encountered: