Slightly improve performance of zend_copy_extra_args #18146
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch aims to improve the performance when a callback needs
zend_copy_extra_args
. This turns out to be common with some array functions like array_walk and the array_find family of functions. In these cases, the callback is often a short function and often only takes a single argument. Therefore,zend_copy_extra_args
takes measurable time in the profile. Looking at VTune reveals that my system stalls on memory loads for op_array and the argument count. By passing op_array as an argument, we eliminate the load for op_array and due to GCC's inter-procedural analysis it also can use the already-loaded argument counts.The following synthetic benchmark (courtesy of Tim) improves about 11% in run time performance:
Hyperfine stats (on an i7-1185G7) for this benchmark:
On an intel i7-4790 I get about a 5% +-1% performance improvement. Ilija measured a improvement of around 7% +-2% on his intel i7-12800H. For neither of my systems I measured a noticeable difference in bench.php, micro_bench.php or Symfony demo. This means we do not see a regression for these other benchmarks.
For reference, this is the resulting hyperfine benchmark on the i7-1185G7 for Symfony demo:
To further confirm no regressions take place, valgrind instruction count for 500 runs on php-cgi on Symfony demo:
Before patch: 393,938,765
After patch: 393,950,974
So there is a noticeable increase, but it makes no effect on the run time at least on my system.
It is possible to get rid of this increase by applying the NOIPA attribute to
zend_copy_extra_args
, which makes the attached benchmark only a few percent faster, but keep the instruction count for Symfony the same.Looking at the effect on the assembly of zend_init_func_execute_data. We see on the regular path of execution one small change (besides instruction reordering), resulting in an extra instruction.
This is around the code that compares the argument count with EX_NUM_ARGS().
Before patch:
After patch:
Where previously 0x20(%rbx) was compared directly with %r9d, it is now stored in a register %esi so that it can be reused without reloading in zend_copy_extra_args (caused by inter-procedural analysis). Still, there's the same number of memory loads, just now via an extra move.
There is some changes to the code that calls zend_copy_extra_args, where some memory loads happen prior to the call.
The memory loads at the start of zend_copy_extra_args have been eliminated however.