-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-126795: Increase the JIT threshold from 16 to 4096 #126816
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1 @@ | |||
Increase the threshold for JIT code warmup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include some mention of the performance improvements we've seen with the new threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Savannah.
@@ -75,20 +75,19 @@ def loop(): | |||
self.assertEqual(opt.get_count(), 0) | |||
with clear_executors(loop): | |||
loop() | |||
# Subtract because optimizer doesn't kick in sooner | |||
self.assertEqual(opt.get_count(), 1000 - TIER2_THRESHOLD) | |||
self.assertEqual(opt.get_count(), 1001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something. Why did we remove the piece about the TIER2_THRESHOLD
?
I guess, more generally, how did you decide when to add/remove the threshold from a test? It seems we're wholesale replacing hardcoded values in tests but also adding/removing the threshold when basic arithmetic is used. Was this just trial and error, or is there some convention I'm not picking up on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we were iterating 1000 times, 1000 - TIER2_THRESHOLD
of which happened after reaching the threshold.
By that doesn't work if TIER2_THRESHOLD > 1000
, so now we iterate 1000 + TIER2_THRESHOLD
times, 1000 of which happen after reaching the threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier logic assumed a low threshold (< 1000). With the new threshold this logic needs to be adjusted to keep the test working. Now it's generic enough, that the threshold could be any (positive) number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on how we name the consts, but looks good in principle.
x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = 42 | ||
y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = 42 | ||
z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = 42 | ||
a0 = a1 = a2 = a3 = a4 = a5 = a6 = a7 = a8 = a9 = {TIER2_THRESHOLD} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think those 42s are anything to do with thresholds, just a Douglas Adams reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
42 is always the right answer :)
w0 = w1 = w2 = w3 = w4 = w5 = w6 = w7 = w8 = w9 = {TIER2_THRESHOLD} | ||
x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = {TIER2_THRESHOLD} | ||
y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = {TIER2_THRESHOLD} | ||
z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = {TIER2_THRESHOLD} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z9
does need to exceed the threshold though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add an assert somewhere to check this condition? if z9 is bigger than the threshold the test fails.
@@ -1390,13 +1392,13 @@ def test_guard_type_version_not_removed(self): | |||
|
|||
def thing(a): | |||
x = 0 | |||
for i in range(100): | |||
for i in range(TIER2_THRESHOLD + 100): | |||
x += a.attr | |||
# for the first 90 iterations we set the attribute on this dummy function which shouldn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs updating
@@ -2222,7 +2222,7 @@ module_exec(PyObject *module) | |||
} | |||
|
|||
if (PyModule_Add(module, "TIER2_THRESHOLD", | |||
PyLong_FromLong(JUMP_BACKWARD_INITIAL_VALUE)) < 0) { | |||
PyLong_FromLong(JUMP_BACKWARD_INITIAL_VALUE + 1)) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not have the +1
here.
Could you add another const EXCEEDS_TIER2_THRESHOLD = TIER2_THRESHOLD + 1
?
In the tests above, use EXCEEDS_TIER2_THRESHOLD
where we want to exceed the threshold, and in the places where we need the actual threshold, and you currently have TIER2_THRESHOLD - 1
, use TIER2_THRESHOLD
In theory, the tests should continue to work if EXCEEDS_TIER2_THRESHOLD = TIER2_THRESHOLD + 5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, the threshold to tier up is 4096 hits. We just have JUMP_BACKWARD_INITIAL_VALUE
set to 4095 because we "count" the final zero.
This makes sense for me when looking at the tests, personally. It's intuitive that something that loops TIER2_THRESHOLD
times would tier up, and something that loops TIER2_THRESHOLD - 1
wouldn't.
When you're done making the requested changes, leave the comment: |
Results across platforms. Looks like the memory savings are more pronounced on AArch64 macOS and performance impact is more pronounced on AArch64 Linux. Windows seems to benefit less overall (we don't measure memory on Windows, though):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more a question: are we going to make this threshold changeable at runtime?
@@ -75,20 +75,19 @@ def loop(): | |||
self.assertEqual(opt.get_count(), 0) | |||
with clear_executors(loop): | |||
loop() | |||
# Subtract because optimizer doesn't kick in sooner | |||
self.assertEqual(opt.get_count(), 1000 - TIER2_THRESHOLD) | |||
self.assertEqual(opt.get_count(), 1001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier logic assumed a low threshold (< 1000). With the new threshold this logic needs to be adjusted to keep the test working. Now it's generic enough, that the threshold could be any (positive) number.
x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = 42 | ||
y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = 42 | ||
z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = 42 | ||
a0 = a1 = a2 = a3 = a4 = a5 = a6 = a7 = a8 = a9 = {TIER2_THRESHOLD} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
42 is always the right answer :)
w0 = w1 = w2 = w3 = w4 = w5 = w6 = w7 = w8 = w9 = {TIER2_THRESHOLD} | ||
x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = {TIER2_THRESHOLD} | ||
y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = {TIER2_THRESHOLD} | ||
z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = {TIER2_THRESHOLD} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add an assert somewhere to check this condition? if z9 is bigger than the threshold the test fails.
@@ -0,0 +1 @@ | |||
Increase the threshold for JIT code warmup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Savannah.
Anyway, great stuff! These results are incredible! |
The core change itself is simple, and results in 2.1% speed improvement and a 3.6% memory improvement for JIT builds. The bulk of this PR is just modifying most of the tests in
test_capi.test_opt
to remove assumptions about the warmup threshold.