Track max level more lazily #119

Tortar · 2025-03-28T01:02:54Z

This mysteriously fails at the moment but this should be faster

… getindex on non power of two weights

…ing, and eliminate unintentnional infinite recursion to fix some tests

… draft of this commit broke)

…e rng test much more robust now that it can pass that way

…bug in stage two weight storage that also significantly hurts performance)

…nificands (fixes test)

github-actions · 2025-03-28T01:09:17Z

Benchmark Results

	`d02d122`	`65059e1`	`d02d122` / `65059e1`
TTFX excluding time to load	6.15 ± 0 ms	6.12 ± 0 ms	0.927,0.99,1
code size in bytes	1.33e+04 ± 0 h	1.36e+04 ± 0 h	0.981
code size in lines	461 ± 0 h	466 ± 0 h	0.989
code size in syntax nodes	3.46e+03 ± 0 h	3.5e+03 ± 0 h	0.99
constructor n=100 σ=0.1	5.88 ± 0.14 μs	5.91 ± 0.16 μs	1.05,1,0.995
constructor n=100 σ=1.0	6.34 ± 0.14 μs	6.39 ± 0.19 μs	1,0.944,0.994
constructor n=100 σ=10.0	6.75 ± 0.18 μs	6.76 ± 0.17 μs	0.983,1,0.999
constructor n=100 σ=100.0	6.35 ± 0.48 μs	6.38 ± 0.66 μs	0.966,1.01,0.995
constructor n=1000 σ=0.1	0.0423 ± 0.0011 ms	0.0424 ± 0.0012 ms	0.993,1,0.997
constructor n=1000 σ=1.0	0.0447 ± 0.0013 ms	0.0454 ± 0.0017 ms	0.996,1.01,0.985
constructor n=1000 σ=10.0	0.0512 ± 0.00091 ms	0.0514 ± 0.001 ms	1,1.01,0.995
constructor n=1000 σ=100.0	0.0587 ± 0.00077 ms	0.059 ± 0.00089 ms	0.995,0.994,0.995
constructor n=10000 σ=0.1	0.422 ± 0.015 ms	0.427 ± 0.02 ms	1,0.997,0.99
constructor n=10000 σ=1.0	0.43 ± 0.017 ms	0.436 ± 0.023 ms	1.04,0.984,0.987
constructor n=10000 σ=10.0	0.457 ± 0.023 ms	0.946 ± 0.55 ms	0.985,0.998,0.483
constructor n=10000 σ=100.0	0.577 ± 0.024 ms	0.586 ± 0.027 ms	1,1,0.984
delete ∘ rand n=100 σ=0.1	4.41 ± 0.19 μs	4.6 ± 0.21 μs	0.972,0.976,0.959
delete ∘ rand n=100 σ=1.0	4.69 ± 0.19 μs	4.85 ± 0.2 μs	0.985,0.981,0.967
delete ∘ rand n=100 σ=10.0	4.89 ± 0.21 μs	5.08 ± 0.27 μs	0.974,0.976,0.963
delete ∘ rand n=100 σ=100.0	8.35 ± 0.53 μs	8.49 ± 0.54 μs	0.988,0.983,0.983
delete ∘ rand n=1000 σ=0.1	0.0437 ± 0.00072 ms	0.0456 ± 0.00084 ms	0.976,0.972,0.957
delete ∘ rand n=1000 σ=1.0	0.0477 ± 0.00093 ms	0.0493 ± 0.0016 ms	0.989,0.977,0.967
delete ∘ rand n=1000 σ=10.0	0.0488 ± 0.00083 ms	0.0505 ± 0.00084 ms	0.989,0.978,0.967
delete ∘ rand n=1000 σ=100.0	0.0565 ± 0.0011 ms	0.0581 ± 0.0011 ms	0.984,0.98,0.973
delete ∘ rand n=10000 σ=0.1	0.477 ± 0.011 ms	0.501 ± 0.015 ms	0.969,0.972,0.953
delete ∘ rand n=10000 σ=1.0	0.509 ± 0.011 ms	0.549 ± 0.18 ms	0.961,0.97,0.926
delete ∘ rand n=10000 σ=10.0	0.517 ± 0.012 ms	0.533 ± 0.014 ms	0.984,0.982,0.969
delete ∘ rand n=10000 σ=100.0	0.516 ± 0.016 ms	0.535 ± 0.013 ms	0.976,0.986,0.963
empty constructor	1.91 ± 0.11 μs	1.93 ± 0.12 μs	0.941,0.98,0.994
intermixed_h n=100 σ=0.1	11.3 ± 0.84 μs	11.4 ± 0.85 μs	0.967,0.972,0.99
intermixed_h n=100 σ=1.0	11.7 ± 0.99 μs	11.8 ± 1.1 μs	1.01,0.98,0.987
intermixed_h n=100 σ=10.0	11.5 ± 1.8 μs	11.7 ± 1.7 μs	0.986,0.963,0.983
intermixed_h n=100 σ=100.0	12.8 ± 1.6 μs	12.8 ± 1.4 μs	0.982,0.979,0.995
intermixed_h n=1000 σ=0.1	0.107 ± 0.0093 ms	0.108 ± 0.009 ms	1,0.981,0.99
intermixed_h n=1000 σ=1.0	0.111 ± 0.0086 ms	0.111 ± 0.0087 ms	1.01,0.985,0.995
intermixed_h n=1000 σ=10.0	0.104 ± 0.0092 ms	0.111 ± 0.011 ms	0.999,0.966,0.936
intermixed_h n=1000 σ=100.0	0.114 ± 0.013 ms	0.117 ± 0.014 ms	0.964,0.963,0.979
intermixed_h n=10000 σ=0.1	1.15 ± 0.13 ms	1.19 ± 0.27 ms	0.992,0.974,0.967
intermixed_h n=10000 σ=1.0	1.15 ± 0.19 ms	1.24 ± 0.22 ms	1.01,0.974,0.928
intermixed_h n=10000 σ=10.0	1.09 ± 0.15 ms	1.14 ± 0.15 ms	1.01,0.953,0.952
intermixed_h n=10000 σ=100.0	1.16 ± 0.17 ms	1.15 ± 0.16 ms	0.975,0.948,1.01
pathological 1	0.0456 ± 0.0002 μs	0.0434 ± 0.00018 μs	1.04,1.04,1.05
pathological 1′	0.202 ± 0.0014 μs	0.199 ± 0.0014 μs	1.01,0.998,1.01
pathological 2	0.0627 ± 0.00026 μs	0.0468 ± 0.00063 μs	1.38,1.45,1.34
pathological 2′	0.181 ± 0.0034 μs	0.177 ± 0.0013 μs	0.958,1.04,1.02
pathological 3	17.6 ± 0.33 ns	17.6 ± 0.25 ns	0.994,0.994,1
pathological 4	0.062 ± 0.00032 μs	0.0434 ± 0.00015 μs	1.43,1.44,1.43
pathological 4′	0.242 ± 0.0016 μs	0.216 ± 0.0015 μs	1.02,1.1,1.12
pathological 5a	0.0445 ± 0.00023 μs	0.0435 ± 0.00015 μs	1.02
pathological 5b	0.0443 ± 0.00026 μs	0.0438 ± 0.00014 μs	1.02,0.957,1.01
pathological 5b′	0.442 ± 0.0042 μs	0.446 ± 0.0046 μs	0.984,1,0.991
sample n=100 σ=0.1	25.8 ± 0.68 ns	27.2 ± 0.65 ns	0.985,0.98,0.951
sample n=100 σ=1.0	30 ± 2.2 ns	31.5 ± 2.4 ns	1,0.987,0.954
sample n=100 σ=10.0	20.3 ± 5.1 ns	20.8 ± 5.1 ns	0.998,1,0.978
sample n=100 σ=100.0	17.5 ± 4.2 ns	18.4 ± 4.5 ns	0.976,1,0.953
sample n=1000 σ=0.1	23.6 ± 4.5 ns	24.7 ± 6.9 ns	1.03,1.01,0.955
sample n=1000 σ=1.0	0.0332 ± 0.0023 μs	0.0344 ± 0.0026 μs	1,0.99,0.963
sample n=1000 σ=10.0	21.2 ± 5.5 ns	21.4 ± 5.2 ns	0.996,0.997,0.991
sample n=1000 σ=100.0	17.8 ± 4.3 ns	18.3 ± 4.4 ns	0.985,0.979,0.974
sample n=10000 σ=0.1	0.0321 ± 0.0011 μs	0.0341 ± 0.001 μs	0.978,0.964,0.942
sample n=10000 σ=1.0	0.035 ± 0.0029 μs	0.0366 ± 0.0012 μs	0.983,0.981,0.956
sample n=10000 σ=10.0	22.3 ± 6 ns	22.5 ± 6.4 ns	1.03,0.997,0.995
sample n=10000 σ=100.0	18.1 ± 4.4 ns	18.4 ± 4.4 ns	0.999,1.05,0.987
summarysize n=100 σ=0.1	1.13e+05 ± 0 h	1.13e+05 ± 0 h	1
summarysize n=100 σ=1.0	1.13e+05 ± 0 h	1.13e+05 ± 0 h	1
summarysize n=100 σ=10.0	1.13e+05 ± 0 h	1.13e+05 ± 0 h	1
summarysize n=100 σ=100.0	1.13e+05 ± 0 h	1.13e+05 ± 0 h	1
summarysize n=1000 σ=0.1	1.42e+05 ± 0 h	1.42e+05 ± 0 h	1
summarysize n=1000 σ=1.0	1.42e+05 ± 0 h	1.42e+05 ± 0 h	1
summarysize n=1000 σ=10.0	1.42e+05 ± 0 h	1.42e+05 ± 0 h	1
summarysize n=1000 σ=100.0	1.42e+05 ± 0 h	1.42e+05 ± 0 h	1
summarysize n=10000 σ=0.1	1e+06 ± 0 h	1e+06 ± 0 h	1
summarysize n=10000 σ=1.0	1e+06 ± 0 h	1e+06 ± 0 h	1
summarysize n=10000 σ=10.0	1e+06 ± 0 h	1e+06 ± 0 h	1
summarysize n=10000 σ=100.0	1e+06 ± 0 h	1e+06 ± 0 h	1
update ∘ rand n=100 σ=0.1	0.0857 ± 0.0022 μs	0.0837 ± 0.0027 μs	0.978,1.01,1.02
update ∘ rand n=100 σ=1.0	0.0913 ± 0.0025 μs	0.0904 ± 0.0032 μs	0.978,1.02,1.01
update ∘ rand n=100 σ=10.0	0.101 ± 0.0036 μs	0.0994 ± 0.004 μs	0.969,1.01,1.01
update ∘ rand n=100 σ=100.0	0.213 ± 0.021 μs	0.214 ± 0.021 μs	0.985,1,0.995
update ∘ rand n=1000 σ=0.1	0.0864 ± 0.0022 μs	0.0844 ± 0.0027 μs	0.982,1.01,1.02
update ∘ rand n=1000 σ=1.0	0.0924 ± 0.0028 μs	0.091 ± 0.004 μs	0.983,1.01,1.02
update ∘ rand n=1000 σ=10.0	0.0987 ± 0.0018 μs	0.0972 ± 0.0026 μs	0.977,1.02,1.02
update ∘ rand n=1000 σ=100.0	0.205 ± 0.0087 μs	0.208 ± 0.011 μs	0.976,0.989,0.986
update ∘ rand n=10000 σ=0.1	0.0931 ± 0.0023 μs	0.0927 ± 0.0015 μs	0.977,0.995,1
update ∘ rand n=10000 σ=1.0	0.0961 ± 0.00071 μs	0.0957 ± 0.0011 μs	0.99,0.981,1
update ∘ rand n=10000 σ=10.0	0.0972 ± 0.001 μs	0.0978 ± 0.0042 μs	0.983,1.01,0.994
update ∘ rand n=10000 σ=100.0	0.179 ± 0.0021 μs	0.182 ± 0.0027 μs	0.975,0.989,0.987
time_to_load	0.0725 ± 0.00055 s	0.0724 ± 0.00062 s	0.996,0.99,1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

LilithHafner · 2025-03-28T11:14:41Z

Lazy is not always faster. In fact it almost always comes with some overhead (i.e. making something lazy makes it slower when the work still needs to get done). In this case the overhead is on every call to rand we have a branch to see if the lazy work needs to be done again.

Looking at our worst cases of 2', 4', and 5b', this work already needs to get done and so this is a slight regression. The best case improvement for this laziness is hit by pathological 2 and pathological 4 (repeated long distance shifts in m[2] with no rand calls in between) and even there it is only a 45% improvement.

Sampling runtime is slightly up (though that could be just noise).

Tortar · 2025-03-28T13:53:10Z

I agree, but I changed some parts and now the pathological are all a bit faster/equal in all cases it seems, so to me it seems worth it (even though delete + rand seems 2-3% slower), because I would like to bring down our pathological cases the most. But there is a bug I don't really understand where it comes from (which could even somewhat influence performance). Do you have ideas? (edit:solved)

(actually the pathological benchmarks seem a bit flacky)

Tortar · 2025-03-28T17:10:47Z

ok, solved the bug.

Everything seems as fast or faster than before except delete+rand which is 2% slower. I think the trade-off is good and I think this PR is therefore mergeable.

Tortar · 2025-03-30T14:39:27Z

Ei @LilithHafner, can you give your opinion on this? It should be fine in my opinion because we don't have measurable regressions on any pathological case, and the time for the simplest cases approaches with this change the time of the main branch

LilithHafner

This still benchmarks as a (slight) regression in our worst case. And I'm not particularly concerned about the best cases.

Could you share a plausible use case that has a significant improvement?

I guess I'm just not that enthusiastic about adding complexity to the code and to the invariants unless there's a compelling performance improvement.

LilithHafner · 2025-03-30T21:02:33Z

test/invariants.jl

+function verify_weights(w::DynamicDiscreteSamplers.Weights)
+    m = w.m


This change seems unrelated and I'd rather not make it because it will make throwing in verify(m) calls into src internals harder which will reduce its utility in debugging

Tortar · 2025-06-30T17:42:58Z

I would say that this can be closed, #156 makes it even less relevant

LilithHafner added 30 commits February 1, 2025 15:23

fix whitespace

3775f67

Initial memory layout & api spec and rand implementation

a80eb62

WIP, all but setindex! is "done" still not tested

87dc3d5

Done! (but still not tested at all)

479690f

Make pakcage load

708a4f1

Make constructable

c85bfd7

Add some tests

1b3a369

Fix the very most basic usage of random sampling

3e49e90

Fix bounds error construction and add tests for getindex

357e2f2

Support basic setting weight back to zero

94f597d

Add more tests

ac0882d

Fix allocation size logging and index saving and getindex bugs to fix…

b5f9197

… getindex on non power of two weights

Fix some indices, improve memory layout docs, fix getindex bit-twiddl…

6db1836

…ing, and eliminate unintentnional infinite recursion to fix some tests

Add broken test

d4990f0

Add passing rng test

173e87a

Fix copy direction bug and indexing bug to fix a test

bc47214

Add another broken test

6eff8f4

Fix next free space tracking bug

9106259

Add broken tests

e091f09

Fix more indexing bugs to fix tests (also added some tests that a WIP…

f5e8274

… draft of this commit broke)

Add broken test

1f82bfe

Shift edit_map entry when setting to zero to fix broken test; and mak…

35185e8

…e rng test much more robust now that it can pass that way

Add broken test

ad9d76f

Mutate m[3] as needed to support wide range of weights and fix test

c3752b4

loosen assert and widen rng tests

9efea32

Add more passing tests

5435bcb

1.5x sampling speedup by melding rand in range and rejection sampling

38780fa

Add broken gross statistical accuracy test (this test fails due to a …

e62f736

…bug in stage two weight storage that also significantly hurts performance)

Come up with, document, and implement a simpler story for shifted sig…

dd20d38

…nificands (fixes test)

Add perf todo

b1689ec

Tortar added 5 commits March 27, 2025 17:02

Remove broadcast assignment in update_significand_sum (#118)

d02d122

Track max level more lazily

1bd82b6

Update invariants.jl

7f622b0

Update invariants.jl

f46c133

Update invariants.jl

e434aa3

Update DynamicDiscreteSamplers.jl

79e8a5c

Tortar added 4 commits March 28, 2025 12:18

Update DynamicDiscreteSamplers.jl

791afc2

Update DynamicDiscreteSamplers.jl

b7a9503

Update DynamicDiscreteSamplers.jl

5172d41

Update DynamicDiscreteSamplers.jl

7f6747a

Tortar added 7 commits March 28, 2025 16:31

Update DynamicDiscreteSamplers.jl

fc2a0d0

Update invariants.jl

1901d09

Update DynamicDiscreteSamplers.jl

60d4565

Update invariants.jl

deca962

Update weights.jl

4e2cf75

Update invariants.jl

1b86a25

Update weights.jl

f0e518d

LilithHafner reviewed Mar 30, 2025

View reviewed changes

Tortar added 4 commits March 31, 2025 02:24

Update DynamicDiscreteSamplers.jl

8f1b279

Update DynamicDiscreteSamplers.jl

985b6b1

Update DynamicDiscreteSamplers.jl

699de87

Update DynamicDiscreteSamplers.jl

65059e1

Base automatically changed from lh/anew-dev-2 to main May 25, 2025 00:03

Tortar closed this Jun 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track max level more lazily #119

Track max level more lazily #119

Uh oh!

Tortar commented Mar 28, 2025

Uh oh!

github-actions bot commented Mar 28, 2025 •

edited

Loading

Uh oh!

LilithHafner commented Mar 28, 2025

Uh oh!

Tortar commented Mar 28, 2025 •

edited

Loading

Uh oh!

Tortar commented Mar 28, 2025 •

edited

Loading

Uh oh!

Tortar commented Mar 30, 2025

Uh oh!

LilithHafner left a comment

Uh oh!

LilithHafner Mar 30, 2025

Uh oh!

Tortar commented Jun 30, 2025

Uh oh!

Uh oh!

		function verify_weights(w::DynamicDiscreteSamplers.Weights)
		m = w.m

Track max level more lazily #119

Track max level more lazily #119

Uh oh!

Conversation

Tortar commented Mar 28, 2025

Uh oh!

github-actions bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Benchmark Plots

Uh oh!

LilithHafner commented Mar 28, 2025

Uh oh!

Tortar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tortar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tortar commented Mar 30, 2025

Uh oh!

LilithHafner left a comment

Choose a reason for hiding this comment

Uh oh!

LilithHafner Mar 30, 2025

Choose a reason for hiding this comment

Uh oh!

Tortar commented Jun 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Mar 28, 2025 •

edited

Loading

Tortar commented Mar 28, 2025 •

edited

Loading

Tortar commented Mar 28, 2025 •

edited

Loading