Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track max level more lazily #119

Open
wants to merge 20 commits into
base: lh/anew-dev-2
Choose a base branch
from
Open

Track max level more lazily #119

wants to merge 20 commits into from

Conversation

Tortar
Copy link
Collaborator

@Tortar Tortar commented Mar 28, 2025

This mysteriously fails at the moment but this should be faster

Copy link

github-actions bot commented Mar 28, 2025

Benchmark Results

d02d122 65059e1 d02d122 / 65059e1
TTFX excluding time to load 6.15 ± 0 ms 6.12 ± 0 ms 0.927,0.99,1
code size in bytes 1.33e+04 ± 0 h 1.36e+04 ± 0 h 0.981
code size in lines 461 ± 0 h 466 ± 0 h 0.989
code size in syntax nodes 3.46e+03 ± 0 h 3.5e+03 ± 0 h 0.99
constructor n=100 σ=0.1 5.88 ± 0.14 μs 5.91 ± 0.16 μs 1.05,1,0.995
constructor n=100 σ=1.0 6.34 ± 0.14 μs 6.39 ± 0.19 μs 1,0.944,0.994
constructor n=100 σ=10.0 6.75 ± 0.18 μs 6.76 ± 0.17 μs 0.983,1,0.999
constructor n=100 σ=100.0 6.35 ± 0.48 μs 6.38 ± 0.66 μs 0.966,1.01,0.995
constructor n=1000 σ=0.1 0.0423 ± 0.0011 ms 0.0424 ± 0.0012 ms 0.993,1,0.997
constructor n=1000 σ=1.0 0.0447 ± 0.0013 ms 0.0454 ± 0.0017 ms 0.996,1.01,0.985
constructor n=1000 σ=10.0 0.0512 ± 0.00091 ms 0.0514 ± 0.001 ms 1,1.01,0.995
constructor n=1000 σ=100.0 0.0587 ± 0.00077 ms 0.059 ± 0.00089 ms 0.995,0.994,0.995
constructor n=10000 σ=0.1 0.422 ± 0.015 ms 0.427 ± 0.02 ms 1,0.997,0.99
constructor n=10000 σ=1.0 0.43 ± 0.017 ms 0.436 ± 0.023 ms 1.04,0.984,0.987
constructor n=10000 σ=10.0 0.457 ± 0.023 ms 0.946 ± 0.55 ms 0.985,0.998,0.483
constructor n=10000 σ=100.0 0.577 ± 0.024 ms 0.586 ± 0.027 ms 1,1,0.984
delete ∘ rand n=100 σ=0.1 4.41 ± 0.19 μs 4.6 ± 0.21 μs 0.972,0.976,0.959
delete ∘ rand n=100 σ=1.0 4.69 ± 0.19 μs 4.85 ± 0.2 μs 0.985,0.981,0.967
delete ∘ rand n=100 σ=10.0 4.89 ± 0.21 μs 5.08 ± 0.27 μs 0.974,0.976,0.963
delete ∘ rand n=100 σ=100.0 8.35 ± 0.53 μs 8.49 ± 0.54 μs 0.988,0.983,0.983
delete ∘ rand n=1000 σ=0.1 0.0437 ± 0.00072 ms 0.0456 ± 0.00084 ms 0.976,0.972,0.957
delete ∘ rand n=1000 σ=1.0 0.0477 ± 0.00093 ms 0.0493 ± 0.0016 ms 0.989,0.977,0.967
delete ∘ rand n=1000 σ=10.0 0.0488 ± 0.00083 ms 0.0505 ± 0.00084 ms 0.989,0.978,0.967
delete ∘ rand n=1000 σ=100.0 0.0565 ± 0.0011 ms 0.0581 ± 0.0011 ms 0.984,0.98,0.973
delete ∘ rand n=10000 σ=0.1 0.477 ± 0.011 ms 0.501 ± 0.015 ms 0.969,0.972,0.953
delete ∘ rand n=10000 σ=1.0 0.509 ± 0.011 ms 0.549 ± 0.18 ms 0.961,0.97,0.926
delete ∘ rand n=10000 σ=10.0 0.517 ± 0.012 ms 0.533 ± 0.014 ms 0.984,0.982,0.969
delete ∘ rand n=10000 σ=100.0 0.516 ± 0.016 ms 0.535 ± 0.013 ms 0.976,0.986,0.963
empty constructor 1.91 ± 0.11 μs 1.93 ± 0.12 μs 0.941,0.98,0.994
intermixed_h n=100 σ=0.1 11.3 ± 0.84 μs 11.4 ± 0.85 μs 0.967,0.972,0.99
intermixed_h n=100 σ=1.0 11.7 ± 0.99 μs 11.8 ± 1.1 μs 1.01,0.98,0.987
intermixed_h n=100 σ=10.0 11.5 ± 1.8 μs 11.7 ± 1.7 μs 0.986,0.963,0.983
intermixed_h n=100 σ=100.0 12.8 ± 1.6 μs 12.8 ± 1.4 μs 0.982,0.979,0.995
intermixed_h n=1000 σ=0.1 0.107 ± 0.0093 ms 0.108 ± 0.009 ms 1,0.981,0.99
intermixed_h n=1000 σ=1.0 0.111 ± 0.0086 ms 0.111 ± 0.0087 ms 1.01,0.985,0.995
intermixed_h n=1000 σ=10.0 0.104 ± 0.0092 ms 0.111 ± 0.011 ms 0.999,0.966,0.936
intermixed_h n=1000 σ=100.0 0.114 ± 0.013 ms 0.117 ± 0.014 ms 0.964,0.963,0.979
intermixed_h n=10000 σ=0.1 1.15 ± 0.13 ms 1.19 ± 0.27 ms 0.992,0.974,0.967
intermixed_h n=10000 σ=1.0 1.15 ± 0.19 ms 1.24 ± 0.22 ms 1.01,0.974,0.928
intermixed_h n=10000 σ=10.0 1.09 ± 0.15 ms 1.14 ± 0.15 ms 1.01,0.953,0.952
intermixed_h n=10000 σ=100.0 1.16 ± 0.17 ms 1.15 ± 0.16 ms 0.975,0.948,1.01
pathological 1 0.0456 ± 0.0002 μs 0.0434 ± 0.00018 μs 1.04,1.04,1.05
pathological 1′ 0.202 ± 0.0014 μs 0.199 ± 0.0014 μs 1.01,0.998,1.01
pathological 2 0.0627 ± 0.00026 μs 0.0468 ± 0.00063 μs 1.38,1.45,1.34
pathological 2′ 0.181 ± 0.0034 μs 0.177 ± 0.0013 μs 0.958,1.04,1.02
pathological 3 17.6 ± 0.33 ns 17.6 ± 0.25 ns 0.994,0.994,1
pathological 4 0.062 ± 0.00032 μs 0.0434 ± 0.00015 μs 1.43,1.44,1.43
pathological 4′ 0.242 ± 0.0016 μs 0.216 ± 0.0015 μs 1.02,1.1,1.12
pathological 5a 0.0445 ± 0.00023 μs 0.0435 ± 0.00015 μs 1.02
pathological 5b 0.0443 ± 0.00026 μs 0.0438 ± 0.00014 μs 1.02,0.957,1.01
pathological 5b′ 0.442 ± 0.0042 μs 0.446 ± 0.0046 μs 0.984,1,0.991
sample n=100 σ=0.1 25.8 ± 0.68 ns 27.2 ± 0.65 ns 0.985,0.98,0.951
sample n=100 σ=1.0 30 ± 2.2 ns 31.5 ± 2.4 ns 1,0.987,0.954
sample n=100 σ=10.0 20.3 ± 5.1 ns 20.8 ± 5.1 ns 0.998,1,0.978
sample n=100 σ=100.0 17.5 ± 4.2 ns 18.4 ± 4.5 ns 0.976,1,0.953
sample n=1000 σ=0.1 23.6 ± 4.5 ns 24.7 ± 6.9 ns 1.03,1.01,0.955
sample n=1000 σ=1.0 0.0332 ± 0.0023 μs 0.0344 ± 0.0026 μs 1,0.99,0.963
sample n=1000 σ=10.0 21.2 ± 5.5 ns 21.4 ± 5.2 ns 0.996,0.997,0.991
sample n=1000 σ=100.0 17.8 ± 4.3 ns 18.3 ± 4.4 ns 0.985,0.979,0.974
sample n=10000 σ=0.1 0.0321 ± 0.0011 μs 0.0341 ± 0.001 μs 0.978,0.964,0.942
sample n=10000 σ=1.0 0.035 ± 0.0029 μs 0.0366 ± 0.0012 μs 0.983,0.981,0.956
sample n=10000 σ=10.0 22.3 ± 6 ns 22.5 ± 6.4 ns 1.03,0.997,0.995
sample n=10000 σ=100.0 18.1 ± 4.4 ns 18.4 ± 4.4 ns 0.999,1.05,0.987
summarysize n=100 σ=0.1 1.13e+05 ± 0 h 1.13e+05 ± 0 h 1
summarysize n=100 σ=1.0 1.13e+05 ± 0 h 1.13e+05 ± 0 h 1
summarysize n=100 σ=10.0 1.13e+05 ± 0 h 1.13e+05 ± 0 h 1
summarysize n=100 σ=100.0 1.13e+05 ± 0 h 1.13e+05 ± 0 h 1
summarysize n=1000 σ=0.1 1.42e+05 ± 0 h 1.42e+05 ± 0 h 1
summarysize n=1000 σ=1.0 1.42e+05 ± 0 h 1.42e+05 ± 0 h 1
summarysize n=1000 σ=10.0 1.42e+05 ± 0 h 1.42e+05 ± 0 h 1
summarysize n=1000 σ=100.0 1.42e+05 ± 0 h 1.42e+05 ± 0 h 1
summarysize n=10000 σ=0.1 1e+06 ± 0 h 1e+06 ± 0 h 1
summarysize n=10000 σ=1.0 1e+06 ± 0 h 1e+06 ± 0 h 1
summarysize n=10000 σ=10.0 1e+06 ± 0 h 1e+06 ± 0 h 1
summarysize n=10000 σ=100.0 1e+06 ± 0 h 1e+06 ± 0 h 1
update ∘ rand n=100 σ=0.1 0.0857 ± 0.0022 μs 0.0837 ± 0.0027 μs 0.978,1.01,1.02
update ∘ rand n=100 σ=1.0 0.0913 ± 0.0025 μs 0.0904 ± 0.0032 μs 0.978,1.02,1.01
update ∘ rand n=100 σ=10.0 0.101 ± 0.0036 μs 0.0994 ± 0.004 μs 0.969,1.01,1.01
update ∘ rand n=100 σ=100.0 0.213 ± 0.021 μs 0.214 ± 0.021 μs 0.985,1,0.995
update ∘ rand n=1000 σ=0.1 0.0864 ± 0.0022 μs 0.0844 ± 0.0027 μs 0.982,1.01,1.02
update ∘ rand n=1000 σ=1.0 0.0924 ± 0.0028 μs 0.091 ± 0.004 μs 0.983,1.01,1.02
update ∘ rand n=1000 σ=10.0 0.0987 ± 0.0018 μs 0.0972 ± 0.0026 μs 0.977,1.02,1.02
update ∘ rand n=1000 σ=100.0 0.205 ± 0.0087 μs 0.208 ± 0.011 μs 0.976,0.989,0.986
update ∘ rand n=10000 σ=0.1 0.0931 ± 0.0023 μs 0.0927 ± 0.0015 μs 0.977,0.995,1
update ∘ rand n=10000 σ=1.0 0.0961 ± 0.00071 μs 0.0957 ± 0.0011 μs 0.99,0.981,1
update ∘ rand n=10000 σ=10.0 0.0972 ± 0.001 μs 0.0978 ± 0.0042 μs 0.983,1.01,0.994
update ∘ rand n=10000 σ=100.0 0.179 ± 0.0021 μs 0.182 ± 0.0027 μs 0.975,0.989,0.987
time_to_load 0.0725 ± 0.00055 s 0.0724 ± 0.00062 s 0.996,0.99,1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@LilithHafner
Copy link
Owner

Lazy is not always faster. In fact it almost always comes with some overhead (i.e. making something lazy makes it slower when the work still needs to get done). In this case the overhead is on every call to rand we have a branch to see if the lazy work needs to be done again.

Looking at our worst cases of 2', 4', and 5b', this work already needs to get done and so this is a slight regression. The best case improvement for this laziness is hit by pathological 2 and pathological 4 (repeated long distance shifts in m[2] with no rand calls in between) and even there it is only a 45% improvement.

Sampling runtime is slightly up (though that could be just noise).

@Tortar
Copy link
Collaborator Author

Tortar commented Mar 28, 2025

I agree, but I changed some parts and now the pathological are all a bit faster/equal in all cases it seems, so to me it seems worth it (even though delete + rand seems 2-3% slower), because I would like to bring down our pathological cases the most. But there is a bug I don't really understand where it comes from (which could even somewhat influence performance). Do you have ideas? (edit:solved)

(actually the pathological benchmarks seem a bit flacky)

@Tortar
Copy link
Collaborator Author

Tortar commented Mar 28, 2025

ok, solved the bug.

Everything seems as fast or faster than before except delete+rand which is 2% slower. I think the trade-off is good and I think this PR is therefore mergeable.

@Tortar
Copy link
Collaborator Author

Tortar commented Mar 30, 2025

Ei @LilithHafner, can you give your opinion on this? It should be fine in my opinion because we don't have measurable regressions on any pathological case, and the time for the simplest cases approaches with this change the time of the main branch

Copy link
Owner

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still benchmarks as a (slight) regression in our worst case. And I'm not particularly concerned about the best cases.

Could you share a plausible use case that has a significant improvement?

I guess I'm just not that enthusiastic about adding complexity to the code and to the invariants unless there's a compelling performance improvement.

Comment on lines +4 to +5
function verify_weights(w::DynamicDiscreteSamplers.Weights)
m = w.m
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated and I'd rather not make it because it will make throwing in verify(m) calls into src internals harder which will reduce its utility in debugging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants