-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of array_find() etc #18157
Conversation
This avoids destruction logic for the common case, avoids some copy, and adds an optimization hint. On an intel i7 1185G7: ``` Benchmark 1: ./sapi/cli/php x.php Time (mean ± σ): 543.7 ms ± 3.8 ms [User: 538.9 ms, System: 4.4 ms] Range (min … max): 538.4 ms … 552.9 ms 10 runs Benchmark 2: ./sapi/cli/php_old x.php Time (mean ± σ): 583.0 ms ± 4.2 ms [User: 578.4 ms, System: 3.4 ms] Range (min … max): 579.3 ms … 593.9 ms 10 runs Summary ./sapi/cli/php x.php ran 1.07 ± 0.01 times faster than ./sapi/cli/php_old x.php ``` On an intel i7 4790: ``` Benchmark 1: ./sapi/cli/php x.php Time (mean ± σ): 828.6 ms ± 4.8 ms [User: 824.4 ms, System: 1.6 ms] Range (min … max): 822.8 ms … 839.0 ms 10 runs Benchmark 2: ./sapi/cli/php_old x.php Time (mean ± σ): 940.1 ms ± 26.4 ms [User: 934.4 ms, System: 2.5 ms] Range (min … max): 918.0 ms … 981.1 ms 10 runs Summary ./sapi/cli/php x.php ran 1.13 ± 0.03 times faster than ./sapi/cli/php_old x.php ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for the two remarks.
$ grep -m1 'model name' /proc/cpuinfo
model name : 13th Gen Intel(R) Core(TM) i7-1365U
$ cat test.php
<?php
$array = range(1, 10000);
$result = 0;
for ($i = 0; $i < 5000; $i++) {
$result += array_find($array, static function ($item) {
return $item === 5000;
});
}
var_dump($result);
$ taskset -c 0 hyperfine -L php php-before,php-after '/tmp/bench/{php} test.php'
Benchmark 1: /tmp/bench/php-before test.php
Time (mean ± σ): 597.8 ms ± 4.6 ms [User: 594.0 ms, System: 2.8 ms]
Range (min … max): 592.4 ms … 606.0 ms 10 runs
Benchmark 2: /tmp/bench/php-after test.php
Time (mean ± σ): 568.2 ms ± 9.5 ms [User: 564.4 ms, System: 2.7 ms]
Range (min … max): 560.9 ms … 591.3 ms 10 runs
Summary
/tmp/bench/php-after test.php ran
1.05 ± 0.02 times faster than /tmp/bench/php-before test.php
ext/standard/array.c
Outdated
bool retval_true; | ||
switch (Z_TYPE(retval)) { | ||
case IS_TRUE: | ||
retval_true = !negate_condition; | ||
break; | ||
case IS_FALSE: | ||
retval_true = negate_condition; | ||
break; | ||
default: | ||
retval_true = zend_is_true(&retval) ^ negate_condition; | ||
zval_ptr_dtor(&retval); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably also apply to array_filter()
. Perhaps this logic can be moved into some force-inlined:
bool is_true(zval *retval, bool negate_condition) {
switch (Z_TYPE(retval)) {
case IS_TRUE:
return !negate_condition;
case IS_FALSE:
return negate_condition;
default:
bool retval_true = zend_is_true(&retval) ^ negate_condition;
zval_ptr_dtor(&retval);
return retval_true;
}
}
that is then used for array_find and array_filter.
@@ -6620,23 +6620,32 @@ static enum php_array_find_result php_array_find(const HashTable *array, zend_fc | |||
if (!str_key) { | |||
ZVAL_LONG(&args[1], num_key); | |||
} else { | |||
ZEND_ASSUME(!HT_IS_PACKED(array)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from a short explanatory comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. It's also not clear to me how this improves performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. I'm also wondering whether Z_PARAM_FUNC()
may benefit from optimizations for \Closure
. zend_is_callable_ex()
also finds the first user call frame, even though it's never used in zend_get_callable_name_ex()
in the IS_OBJECT
path. But this is ofc something that can be considered separately.
I was also wondering if it may be possible to store zend_fcall_info_cache
statically for each function, and allow it to be re-used between calls through a polymorphic cache. This should work for immutable values living in shm, at least, given that they are guaranteed not to relocate. However, most callables are likely passed as \Closure
s nowadays, so this may not matter much anymore.
@@ -6620,23 +6620,32 @@ static enum php_array_find_result php_array_find(const HashTable *array, zend_fc | |||
if (!str_key) { | |||
ZVAL_LONG(&args[1], num_key); | |||
} else { | |||
ZEND_ASSUME(!HT_IS_PACKED(array)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. It's also not clear to me how this improves performance.
It probably does. See PR description in: symfony/symfony#60000. I planned to have a look at the performance differences between |
For reference, performance numbers after the latest changes with the extra function:
No idea why there even is a 70ms difference in the new baseline … |
@TimWolla And if you add |
…re` to a `callable` parameter With `Closure`s existing since 5.3 and first class callables since 8.1, nowadays the most common type of callable likely is a `Closure`. Type checking a `Closure` argument for a `callable` parameter however does quite a bit of work, when a simple CE check would suffice. We thus do this. For: <?php function closure_(\Closure $f) { $f('abc'); } function callable_(callable $f) { $f('abc'); } $c = strrev(...); for ($i = 0; $i < 10000000; $i++) { closure_($c); } with the call in the loop appropriately replaced and running on a Intel(R) Core(TM) i7-1365U, I get the following timings: Benchmark 1: /tmp/bench/before callable.php Time (mean ± σ): 368.9 ms ± 5.2 ms [User: 365.0 ms, System: 3.6 ms] Range (min … max): 359.8 ms … 374.5 ms 10 runs Benchmark 2: /tmp/bench/before closure.php Time (mean ± σ): 283.3 ms ± 6.0 ms [User: 279.6 ms, System: 3.4 ms] Range (min … max): 274.1 ms … 293.2 ms 10 runs Benchmark 3: /tmp/bench/after callable.php Time (mean ± σ): 279.9 ms ± 10.1 ms [User: 276.3 ms, System: 3.4 ms] Range (min … max): 269.6 ms … 301.5 ms 10 runs Benchmark 4: /tmp/bench/after closure.php Time (mean ± σ): 283.4 ms ± 2.3 ms [User: 279.5 ms, System: 3.6 ms] Range (min … max): 279.7 ms … 286.6 ms 10 runs Summary /tmp/bench/after callable.php ran 1.01 ± 0.04 times faster than /tmp/bench/before closure.php 1.01 ± 0.04 times faster than /tmp/bench/after closure.php 1.32 ± 0.05 times faster than /tmp/bench/before callable.php The “standard” `array_find()` micro-benchmark of: <?php $array = range(1, 10000); $result = 0; for ($i = 0; $i < 5000; $i++) { $result += array_find($array, static function ($item) { return $item === 5000; }); } var_dump($result); Results in: $ hyperfine -L version before,after '/tmp/bench/{version} native.php' Benchmark 1: /tmp/bench/before native.php Time (mean ± σ): 637.3 ms ± 6.4 ms [User: 632.4 ms, System: 3.9 ms] Range (min … max): 626.8 ms … 644.9 ms 10 runs Benchmark 2: /tmp/bench/after native.php Time (mean ± σ): 572.0 ms ± 3.8 ms [User: 567.8 ms, System: 3.8 ms] Range (min … max): 567.0 ms … 580.1 ms 10 runs Summary /tmp/bench/after native.php ran 1.11 ± 0.01 times faster than /tmp/bench/before native.php see php#18157 (comment)
This avoids destruction logic for the common case, avoids some copy, and adds an optimization hint.
On an intel i7 1185G7:
On an intel i7 4790: