Skip to content

zend_execute: Streamline typechecks in zend_check_type_slow() if an object is given #18277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TimWolla
Copy link
Member

@TimWolla TimWolla commented Apr 8, 2025

With Closures existing since 5.3 and first class callables since 8.1, nowadays the most common type of callable likely is a Closure. Type checking a Closure argument for a callable parameter however does quite a bit of work, when a simple CE check would suffice. We thus do this.

For:

<?php

function closure_(\Closure $f)
{
	$f('abc');
}

function callable_(callable $f)
{
	$f('abc');
}

$c = strrev(...);
for ($i = 0; $i < 10000000; $i++) {
	closure_($c);
}

with the call in the loop appropriately replaced and running on a Intel(R) Core(TM) i7-1365U, I get the following timings:

Benchmark 1: /tmp/bench/before callable.php
  Time (mean ± σ):     368.9 ms ±   5.2 ms    [User: 365.0 ms, System: 3.6 ms]
  Range (min … max):   359.8 ms … 374.5 ms    10 runs

Benchmark 2: /tmp/bench/before closure.php
  Time (mean ± σ):     283.3 ms ±   6.0 ms    [User: 279.6 ms, System: 3.4 ms]
  Range (min … max):   274.1 ms … 293.2 ms    10 runs

Benchmark 3: /tmp/bench/after callable.php
  Time (mean ± σ):     279.9 ms ±  10.1 ms    [User: 276.3 ms, System: 3.4 ms]
  Range (min … max):   269.6 ms … 301.5 ms    10 runs

Benchmark 4: /tmp/bench/after closure.php
  Time (mean ± σ):     283.4 ms ±   2.3 ms    [User: 279.5 ms, System: 3.6 ms]
  Range (min … max):   279.7 ms … 286.6 ms    10 runs

Summary
  /tmp/bench/after callable.php ran
    1.01 ± 0.04 times faster than /tmp/bench/before closure.php
    1.01 ± 0.04 times faster than /tmp/bench/after closure.php
    1.32 ± 0.05 times faster than /tmp/bench/before callable.php

The “standard” array_find() micro-benchmark of:

<?php

$array = range(1, 10000);
$result = 0;
for ($i = 0; $i < 5000; $i++) {
    $result += array_find($array, static function ($item) {
            return $item === 5000;
    });
}
var_dump($result);

Results in:

$ hyperfine -L version before,after '/tmp/bench/{version} native.php'
Benchmark 1: /tmp/bench/before native.php
  Time (mean ± σ):     637.3 ms ±   6.4 ms    [User: 632.4 ms, System: 3.9 ms]
  Range (min … max):   626.8 ms … 644.9 ms    10 runs

Benchmark 2: /tmp/bench/after native.php
  Time (mean ± σ):     572.0 ms ±   3.8 ms    [User: 567.8 ms, System: 3.8 ms]
  Range (min … max):   567.0 ms … 580.1 ms    10 runs

Summary
  /tmp/bench/after native.php ran
    1.11 ± 0.01 times faster than /tmp/bench/before native.php

see #18157 (comment)

…re` to a `callable` parameter

With `Closure`s existing since 5.3 and first class callables since 8.1,
nowadays the most common type of callable likely is a `Closure`. Type checking
a `Closure` argument for a `callable` parameter however does quite a bit of
work, when a simple CE check would suffice. We thus do this.

For:

    <?php

    function closure_(\Closure $f)
    {
    	$f('abc');
    }

    function callable_(callable $f)
    {
    	$f('abc');
    }

    $c = strrev(...);
    for ($i = 0; $i < 10000000; $i++) {
    	closure_($c);
    }

with the call in the loop appropriately replaced and running on a Intel(R)
Core(TM) i7-1365U, I get the following timings:

    Benchmark 1: /tmp/bench/before callable.php
      Time (mean ± σ):     368.9 ms ±   5.2 ms    [User: 365.0 ms, System: 3.6 ms]
      Range (min … max):   359.8 ms … 374.5 ms    10 runs

    Benchmark 2: /tmp/bench/before closure.php
      Time (mean ± σ):     283.3 ms ±   6.0 ms    [User: 279.6 ms, System: 3.4 ms]
      Range (min … max):   274.1 ms … 293.2 ms    10 runs

    Benchmark 3: /tmp/bench/after callable.php
      Time (mean ± σ):     279.9 ms ±  10.1 ms    [User: 276.3 ms, System: 3.4 ms]
      Range (min … max):   269.6 ms … 301.5 ms    10 runs

    Benchmark 4: /tmp/bench/after closure.php
      Time (mean ± σ):     283.4 ms ±   2.3 ms    [User: 279.5 ms, System: 3.6 ms]
      Range (min … max):   279.7 ms … 286.6 ms    10 runs

    Summary
      /tmp/bench/after callable.php ran
        1.01 ± 0.04 times faster than /tmp/bench/before closure.php
        1.01 ± 0.04 times faster than /tmp/bench/after closure.php
        1.32 ± 0.05 times faster than /tmp/bench/before callable.php

The “standard” `array_find()` micro-benchmark of:

    <?php

    $array = range(1, 10000);
    $result = 0;
    for ($i = 0; $i < 5000; $i++) {
        $result += array_find($array, static function ($item) {
                return $item === 5000;
        });
    }
    var_dump($result);

Results in:

    $ hyperfine -L version before,after '/tmp/bench/{version} native.php'
    Benchmark 1: /tmp/bench/before native.php
      Time (mean ± σ):     637.3 ms ±   6.4 ms    [User: 632.4 ms, System: 3.9 ms]
      Range (min … max):   626.8 ms … 644.9 ms    10 runs

    Benchmark 2: /tmp/bench/after native.php
      Time (mean ± σ):     572.0 ms ±   3.8 ms    [User: 567.8 ms, System: 3.8 ms]
      Range (min … max):   567.0 ms … 580.1 ms    10 runs

    Summary
      /tmp/bench/after native.php ran
        1.11 ± 0.01 times faster than /tmp/bench/before native.php

see php#18157 (comment)
(type_mask & MAY_BE_CALLABLE)
&& (
/* Fast check for closures. */
EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think arg is guaranteed to be an object at this point. Maybe this can be combined with the Z_TYPE_P(arg) == IS_OBJECT) above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can combine it with the one above as a type declaration that only contains built-in types won't be complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to something like:

if (Z_TYPE_P(arg) == IS_OBJECT) {
    if ((type_mask & MAY_BE_CALLABLE) && EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure)) {
        return true;
    } else if (ZEND_TYPE_IS_COMPLEX(*type)) {
        ...
    }
}

Not sure if that's better

Copy link
Member

@Girgias Girgias Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to have if(EXPECTED(Z_TYPE_P(arg) == IS_OBJECT) {} first, as to get to this point you need to cross if (EXPECTED(ZEND_TYPE_CONTAINS_CODE(*type, Z_TYPE_P(arg)))) { from zend_check_type().

Meaning that we are either dealing with an object, a callable, or a scalar type that needs to be coerced which depends on strict_type.

The nice benefit of this approach is that it would mean that the call to zend_value_instanceof_static() could be moved into that new if block and remove the check for if (Z_TYPE_P(zv) != IS_OBJECT) {

Copy link
Member

@iluuu1994 iluuu1994 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The may also still check for ZEND_TYPE_IS_COMPLEX(*type) first (before type_mask & MAY_BE_CALLABLE) to avoid any degradation of complex types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted according to Gina's suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is static really more common than callable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is static really more common than callable?

I'm seeing static used quite a bit for return types. The closure check is probably cheaper, though. No strong opinion here. Can reorder if desired (ideally after someone else verified the numbers, because I don't really trust the results on my CPU).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. My benchmarks are usually all over the place... I have yet to find a setting for me that works well.

Copy link
Member

@Girgias Girgias Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can have the static type check after the quick Closure check, especially as we could now transform static to the class name for final classes.

… object is given

For the callable.php vs closure.php benchmark:

    $ taskset -c 0 hyperfine -L file callable.php,closure.php -L version before,after '/tmp/bench/{version} {file}'
    Benchmark 1: /tmp/bench/before callable.php
      Time (mean ± σ):     351.1 ms ±   2.8 ms    [User: 348.3 ms, System: 2.0 ms]
      Range (min … max):   349.3 ms … 359.0 ms    10 runs

    Benchmark 2: /tmp/bench/before closure.php
      Time (mean ± σ):     274.6 ms ±   2.4 ms    [User: 270.9 ms, System: 2.9 ms]
      Range (min … max):   273.3 ms … 281.5 ms    10 runs

    Benchmark 3: /tmp/bench/after callable.php
      Time (mean ± σ):     272.4 ms ±   0.5 ms    [User: 270.3 ms, System: 2.0 ms]
      Range (min … max):   271.6 ms … 273.3 ms    10 runs

    Benchmark 4: /tmp/bench/after closure.php
      Time (mean ± σ):     277.4 ms ±   2.2 ms    [User: 274.3 ms, System: 2.4 ms]
      Range (min … max):   275.7 ms … 283.3 ms    10 runs

    Summary
      /tmp/bench/after callable.php ran
        1.01 ± 0.01 times faster than /tmp/bench/before closure.php
        1.02 ± 0.01 times faster than /tmp/bench/after closure.php
        1.29 ± 0.01 times faster than /tmp/bench/before callable.php

For the array_find benchmark:

    Benchmark 1: /tmp/bench/before native.php
      Time (mean ± σ):     627.6 ms ±   7.1 ms    [User: 622.5 ms, System: 2.8 ms]
      Range (min … max):   622.1 ms … 641.4 ms    10 runs

    Benchmark 2: /tmp/bench/after native.php
      Time (mean ± σ):     598.0 ms ±   5.5 ms    [User: 594.4 ms, System: 2.7 ms]
      Range (min … max):   589.9 ms … 604.9 ms    10 runs

    Summary
      /tmp/bench/after native.php ran
        1.05 ± 0.02 times faster than /tmp/bench/before native.php

For a test with scalar types:

    <?php

    function func(string $f): string
    {
    	return strrev($f);
    }

    for ($i = 0; $i < 10000000; $i++) {
    	func('abc');
    }

the timings are:

    Benchmark 1: /tmp/bench/before scalar.php
      Time (mean ± σ):     212.5 ms ±   1.7 ms    [User: 209.8 ms, System: 2.2 ms]
      Range (min … max):   211.4 ms … 218.1 ms    14 runs

      Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

    Benchmark 2: /tmp/bench/after scalar.php
      Time (mean ± σ):     200.9 ms ±   2.0 ms    [User: 198.0 ms, System: 2.4 ms]
      Range (min … max):   199.7 ms … 207.2 ms    14 runs

      Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

    Summary
      /tmp/bench/after scalar.php ran
        1.06 ± 0.01 times faster than /tmp/bench/before scalar.php

And a union type using only scalars:

    <?php

    function func(string|int $f): string
    {
    	return strrev($f);
    }

    for ($i = 0; $i < 10000000; $i++) {
    	func('abc');
    }

results in:

    Benchmark 1: /tmp/bench/before union.php
      Time (mean ± σ):     212.8 ms ±   1.8 ms    [User: 210.0 ms, System: 2.2 ms]
      Range (min … max):   212.0 ms … 219.0 ms    14 runs

      Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

    Benchmark 2: /tmp/bench/after union.php
      Time (mean ± σ):     200.1 ms ±   0.4 ms    [User: 197.7 ms, System: 2.3 ms]
      Range (min … max):   199.4 ms … 200.8 ms    14 runs

    Summary
      /tmp/bench/after union.php ran
        1.06 ± 0.01 times faster than /tmp/bench/before union.php

Union types with objects are the only thing tested that were more or less the
same:

    <?php

    class A { public function get() { return 'abc'; } }
    class B { }

    function func(A|B $f): string
    {
    	return strrev($f->get());
    }

    for ($i = 0; $i < 10000000; $i++) {
    	func(new A());
    }

results in:

    Benchmark 1: /tmp/bench/before union_obj.php
      Time (mean ± σ):     700.6 ms ±  10.4 ms    [User: 696.5 ms, System: 3.9 ms]
      Range (min … max):   688.6 ms … 717.8 ms    10 runs

    Benchmark 2: /tmp/bench/after union_obj.php
      Time (mean ± σ):     706.7 ms ±  15.0 ms    [User: 702.5 ms, System: 3.8 ms]
      Range (min … max):   688.9 ms … 725.8 ms    10 runs

    Summary
      /tmp/bench/before union_obj.php ran
        1.01 ± 0.03 times faster than /tmp/bench/after union_obj.php
@TimWolla TimWolla changed the title zend_execute: Improve type checking performance when passing a Closure to a callable parameter zend_execute: Streamline typechecks in zend_check_type_slow() if an object is given Apr 8, 2025
@TimWolla
Copy link
Member Author

TimWolla commented Apr 8, 2025

Quoting from the follow-up commit:


For the callable.php vs closure.php benchmark:

$ taskset -c 0 hyperfine -L file callable.php,closure.php -L version before,after '/tmp/bench/{version} {file}'
Benchmark 1: /tmp/bench/before callable.php
  Time (mean ± σ):     351.1 ms ±   2.8 ms    [User: 348.3 ms, System: 2.0 ms]
  Range (min … max):   349.3 ms … 359.0 ms    10 runs

Benchmark 2: /tmp/bench/before closure.php
  Time (mean ± σ):     274.6 ms ±   2.4 ms    [User: 270.9 ms, System: 2.9 ms]
  Range (min … max):   273.3 ms … 281.5 ms    10 runs

Benchmark 3: /tmp/bench/after callable.php
  Time (mean ± σ):     272.4 ms ±   0.5 ms    [User: 270.3 ms, System: 2.0 ms]
  Range (min … max):   271.6 ms … 273.3 ms    10 runs

Benchmark 4: /tmp/bench/after closure.php
  Time (mean ± σ):     277.4 ms ±   2.2 ms    [User: 274.3 ms, System: 2.4 ms]
  Range (min … max):   275.7 ms … 283.3 ms    10 runs

Summary
  /tmp/bench/after callable.php ran
    1.01 ± 0.01 times faster than /tmp/bench/before closure.php
    1.02 ± 0.01 times faster than /tmp/bench/after closure.php
    1.29 ± 0.01 times faster than /tmp/bench/before callable.php

For the array_find benchmark:

Benchmark 1: /tmp/bench/before native.php
  Time (mean ± σ):     627.6 ms ±   7.1 ms    [User: 622.5 ms, System: 2.8 ms]
  Range (min … max):   622.1 ms … 641.4 ms    10 runs

Benchmark 2: /tmp/bench/after native.php
  Time (mean ± σ):     598.0 ms ±   5.5 ms    [User: 594.4 ms, System: 2.7 ms]
  Range (min … max):   589.9 ms … 604.9 ms    10 runs

Summary
  /tmp/bench/after native.php ran
    1.05 ± 0.02 times faster than /tmp/bench/before native.php

For a test with scalar types:

<?php

function func(string $f): string
{
	return strrev($f);
}

for ($i = 0; $i < 10000000; $i++) {
	func('abc');
}

the timings are:

Benchmark 1: /tmp/bench/before scalar.php
  Time (mean ± σ):     212.5 ms ±   1.7 ms    [User: 209.8 ms, System: 2.2 ms]
  Range (min … max):   211.4 ms … 218.1 ms    14 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: /tmp/bench/after scalar.php
  Time (mean ± σ):     200.9 ms ±   2.0 ms    [User: 198.0 ms, System: 2.4 ms]
  Range (min … max):   199.7 ms … 207.2 ms    14 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  /tmp/bench/after scalar.php ran
    1.06 ± 0.01 times faster than /tmp/bench/before scalar.php

And a union type using only scalars:

<?php

function func(string|int $f): string
{
	return strrev($f);
}

for ($i = 0; $i < 10000000; $i++) {
	func('abc');
}

results in:

Benchmark 1: /tmp/bench/before union.php
  Time (mean ± σ):     212.8 ms ±   1.8 ms    [User: 210.0 ms, System: 2.2 ms]
  Range (min … max):   212.0 ms … 219.0 ms    14 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: /tmp/bench/after union.php
  Time (mean ± σ):     200.1 ms ±   0.4 ms    [User: 197.7 ms, System: 2.3 ms]
  Range (min … max):   199.4 ms … 200.8 ms    14 runs

Summary
  /tmp/bench/after union.php ran
    1.06 ± 0.01 times faster than /tmp/bench/before union.php

Union types with objects are the only thing tested that were more or less the
same:

<?php

class A { public function get() { return 'abc'; } }
class B { }

function func(A|B $f): string
{
	return strrev($f->get());
}

for ($i = 0; $i < 10000000; $i++) {
	func(new A());
}

results in:

Benchmark 1: /tmp/bench/before union_obj.php
  Time (mean ± σ):     700.6 ms ±  10.4 ms    [User: 696.5 ms, System: 3.9 ms]
  Range (min … max):   688.6 ms … 717.8 ms    10 runs

Benchmark 2: /tmp/bench/after union_obj.php
  Time (mean ± σ):     706.7 ms ±  15.0 ms    [User: 702.5 ms, System: 3.8 ms]
  Range (min … max):   688.9 ms … 725.8 ms    10 runs

Summary
  /tmp/bench/before union_obj.php ran
    1.01 ± 0.03 times faster than /tmp/bench/after union_obj.php

I don't really trust my CPU to provide reliable benchmark results, though. So if someone wants to test this themselves, please do. I've created a Gist with the PHP files at: https://gist.github.com/TimWolla/d8ffbe9b9668d4d05805982ae591dcba

@iluuu1994
Copy link
Member

Summary
  sapi/cli/php-old callable-micro-opt/closure.php ran
    1.03 ± 0.02 times faster than sapi/cli/php callable-micro-opt/callable.php
    1.04 ± 0.02 times faster than sapi/cli/php callable-micro-opt/closure.php
    1.29 ± 0.02 times faster than sapi/cli/php-old callable-micro-opt/callable.php

Summary
  sapi/cli/php-old callable-micro-opt/native.php ran
    1.01 ± 0.02 times faster than sapi/cli/php callable-micro-opt/native.php

Summary
  sapi/cli/php-old callable-micro-opt/scalar.php ran
    1.02 ± 0.01 times faster than sapi/cli/php callable-micro-opt/scalar.php

Summary
  sapi/cli/php-old callable-micro-opt/union.php ran
    1.02 ± 0.01 times faster than sapi/cli/php callable-micro-opt/union.php

Summary
  sapi/cli/php-old callable-micro-opt/union_obj.php ran
    1.04 ± 0.01 times faster than sapi/cli/php callable-micro-opt/union_obj.php

With the exception of callable.php, which was significantly faster on the new version, most other benchmark seem to have slightly regressed. This result is consistent, I tried each benchmark multiple times.

@nielsdos
Copy link
Member

nielsdos commented Apr 10, 2025

A run on an i7-1185G7:

Summary
  ./sapi/cli/php ~/Downloads/tim/callable.php ran
    1.21 ± 0.05 times faster than ./sapi/cli/php_old ~/Downloads/tim/callable.php

  ./sapi/cli/php ~/Downloads/tim/closure.php ran
    1.00 ± 0.02 times faster than ./sapi/cli/php_old ~/Downloads/tim/closure.php

  ./sapi/cli/php ~/Downloads/tim/native.php ran
    1.00 ± 0.02 times faster than ./sapi/cli/php_old ~/Downloads/tim/native.php

  ./sapi/cli/php ~/Downloads/tim/scalar.php ran
    1.00 ± 0.04 times faster than ./sapi/cli/php_old ~/Downloads/tim/scalar.php

  ./sapi/cli/php_old ~/Downloads/tim/union.php ran
    1.01 ± 0.02 times faster than ./sapi/cli/php ~/Downloads/tim/union.php

  ./sapi/cli/php ~/Downloads/tim/union_obj.php ran
    1.00 ± 0.03 times faster than ./sapi/cli/php_old ~/Downloads/tim/union_obj.php

So more or less status quo, and a slight degradation for union.php.
The performance could be highly dependent on the machine you run it on, and probably the union checking code was hand optimized.
If we're worried about degrading but still want to have a fast path for closures, why not do this instead of this PR (which may be beneficial for other callers of this function too):

diff --git a/Zend/zend_API.c b/Zend/zend_API.c
index e0006e7d727..c8a506d8366 100644
--- a/Zend/zend_API.c
+++ b/Zend/zend_API.c
@@ -4293,6 +4293,10 @@ ZEND_API bool zend_is_callable_at_frame(
 
 ZEND_API bool zend_is_callable_ex(zval *callable, zend_object *object, uint32_t check_flags, zend_string **callable_name, zend_fcall_info_cache *fcc, char **error) /* {{{ */
 {
+	if (Z_TYPE_P(callable) == IS_OBJECT && Z_OBJCE_P(callable) == zend_ce_closure) {
+		return true;
+	}
+
 	/* Determine callability at the first parent user frame. */
 	zend_execute_data *frame = EG(current_execute_data);
 	while (frame && (!frame->func || !ZEND_USER_CODE(frame->func->type))) {

This only affects the callable bench and has no influence on the others. Sure the win isn't as great for type checking as this PR had, but this may be a pragmatic solution?

  ./sapi/cli/php ~/Downloads/tim/callable.php ran
    1.16 ± 0.03 times faster than ./sapi/cli/php_old ~/Downloads/tim/callable.php
  ./sapi/cli/php ~/Downloads/tim/closure.php ran
    1.00 ± 0.02 times faster than ./sapi/cli/php_old ~/Downloads/tim/closure.php

@iluuu1994
Copy link
Member

iluuu1994 commented Apr 10, 2025

I can confirm that Niels' patch significantly improves performance for callable over master:

Benchmark 1: sapi/cli/php-old benchmarks/callable.php
  Time (mean ± σ):     407.1 ms ±   3.5 ms    [User: 404.3 ms, System: 1.4 ms]
  Range (min … max):   399.0 ms … 411.5 ms    10 runs
 
Benchmark 2: sapi/cli/php benchmarks/callable.php
  Time (mean ± σ):     327.9 ms ±   5.1 ms    [User: 324.4 ms, System: 2.3 ms]
  Range (min … max):   318.9 ms … 338.6 ms    10 runs
 
Summary
  sapi/cli/php benchmarks/callable.php ran
    1.24 ± 0.02 times faster than sapi/cli/php-old benchmarks/callable.php

However, that might not mean much for my CPU because it also significantly improves \Closure, even though the modified code path is not even hit. 😄

Benchmark 1: sapi/cli/php-old benchmarks/closure.php
  Time (mean ± σ):     329.7 ms ±  10.7 ms    [User: 324.8 ms, System: 1.9 ms]
  Range (min … max):   316.0 ms … 344.9 ms    10 runs
 
Benchmark 2: sapi/cli/php benchmarks/closure.php
  Time (mean ± σ):     308.0 ms ±   5.6 ms    [User: 304.6 ms, System: 2.3 ms]
  Range (min … max):   300.8 ms … 321.3 ms    10 runs
 
Summary
  sapi/cli/php benchmarks/closure.php ran
    1.07 ± 0.04 times faster than sapi/cli/php-old benchmarks/closure.php

So, take that with a grain of salt. I hate my CPU. 😋

@nielsdos
Copy link
Member

nielsdos commented Apr 10, 2025

However, that might not mean much for my CPU because it also significantly improves \Closure, even though the modified code path is not even hit.

#justCodeLayoutThings
#LoveOoOExecution

So, take that with a grain of salt. I hate my CPU. 😋

But as long as the others don't slow down I take it as a win 😎


EDIT: ofc my patch is not "complete" as we need to fetch the callable name in some cases too, but not in the type check case. But if this seems like a good plan to Tim as well, I can make it a formal patch with a PR.

@TimWolla
Copy link
Member Author

TimWolla commented Apr 11, 2025

EDIT: ofc my patch is not "complete" as we need to fetch the callable name in some cases too, but not in the type check case.

Yes, the callable name was the reason I did not touch that function, but zend_check_type_slow instead.

But if this seems like a good plan to Tim as well, I can make it a formal patch with a PR.

No strong opinion here, please feel free to propose something, given that I can't reliably benchmark this with the hardware I have available. Alternatively for 8e216a2 (the initial commit in this PR), the EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure) could perhaps just be extended to EXPECTED(Z_TYPE_P(arg) == IS_OBJECT && Z_OBJCE_P(arg) == zend_ce_closure) (unless the larger assembly also is somehow bad for the other cases, who knows ¯\_(ツ)_/¯).

@nielsdos
Copy link
Member

The path from zend_check_type_slow doesn't request a callable name anyway so that doesn't matter.
I'll propose something this evening.

@nielsdos
Copy link
Member

All sorts of complicated things going on here, function seems very sensitive to changes.

The best improvement for callable.php, on my desktop, is when you add Z_TYPE_P(arg) == IS_OBJECT && Z_OBJCE_P(arg) == zend_ce_closure in zend_check_type_slow after checking for MAY_BE_CALLABLE. However, if you add an EXPECTED annotation on top of that (like originally suggested), then the compiler over-optimizes that code path and the function zend_check_user_type_slow grows from 786 bytes (without EXPECTED but with the check) to 873 bytes (with EXPECTED and with the check). This is related to code layout and different register usages/spills, (on first sight).
This increase in code size caused by the EXPECTED annotation yields a noticeable decrease in performance for native.php. However, the annotation does not seem to improve the performance of callable.php anyway.

My suggestion only influences the callable.php case and with all the extra plumbing added it yields a 18% improvement for callable.php on my system and a minor improvement of 4% for native.php, whereas the suggestion to change zend_check_type_slow yields a 25% improvement and no improvement for native.php.
My suggestion is more isolated and doesn't change any code layout of zend_check_type_slow, but the gains are more limited. I think it's still worthwhile to further explore the idea of adding a check in zend_check_type_slow, but without the EXPECTED annotation.

A "in the middle" solution may be to declare a helper function in the same file as zend_execute.c so that the compiler can perform IPA to get a call with less overhead, and we can force the helper to be outlined. This should keep the same code layout in zend_check_type_slow while still adding a fast-path for closures close to zend_check_type_slow:

diff --git a/Zend/zend_execute.c b/Zend/zend_execute.c
index 89fbcf2cbd7..58be57596e3 100644
--- a/Zend/zend_execute.c
+++ b/Zend/zend_execute.c
@@ -1140,6 +1140,14 @@ static bool zend_check_intersection_type_from_list(
 	return true;
 }
 
+static zend_never_inline bool zend_is_callable_idk_what_to_call_this(zval *callable, uint32_t check_flags)
+{
+	if (EXPECTED(Z_TYPE_P(callable) == IS_OBJECT && Z_OBJCE_P(callable) == zend_ce_closure)) {
+		return true;
+	}
+	return zend_is_callable_ex(callable, NULL, check_flags, NULL, NULL, NULL);
+}
+
 static zend_always_inline bool zend_check_type_slow(
 		const zend_type *type, zval *arg, const zend_reference *ref,
 		bool is_return_type, bool is_internal)
@@ -1178,7 +1186,7 @@ static zend_always_inline bool zend_check_type_slow(
 
 	const uint32_t type_mask = ZEND_TYPE_FULL_MASK(*type);
 	if ((type_mask & MAY_BE_CALLABLE) &&
-		zend_is_callable(arg, is_internal ? IS_CALLABLE_SUPPRESS_DEPRECATIONS : 0, NULL)) {
+		zend_is_callable_idk_what_to_call_this(arg, is_internal ? IS_CALLABLE_SUPPRESS_DEPRECATIONS : 0)) {
 		return 1;
 	}
 	if ((type_mask & MAY_BE_STATIC) && zend_value_instanceof_static(arg)) {

This yields a 25% improvement on callable.php, so the same as the first idea, but the chance is high that this does not unexpectedly influence any other test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants