zend_execute: Streamline typechecks in `zend_check_type_slow()` if an object is given #18277

TimWolla · 2025-04-08T10:13:22Z

With Closures existing since 5.3 and first class callables since 8.1, nowadays the most common type of callable likely is a Closure. Type checking a Closure argument for a callable parameter however does quite a bit of work, when a simple CE check would suffice. We thus do this.

For:

<?php

function closure_(\Closure $f)
{
	$f('abc');
}

function callable_(callable $f)
{
	$f('abc');
}

$c = strrev(...);
for ($i = 0; $i < 10000000; $i++) {
	closure_($c);
}

with the call in the loop appropriately replaced and running on a Intel(R) Core(TM) i7-1365U, I get the following timings:

Benchmark 1: /tmp/bench/before callable.php
  Time (mean ± σ):     368.9 ms ±   5.2 ms    [User: 365.0 ms, System: 3.6 ms]
  Range (min … max):   359.8 ms … 374.5 ms    10 runs

Benchmark 2: /tmp/bench/before closure.php
  Time (mean ± σ):     283.3 ms ±   6.0 ms    [User: 279.6 ms, System: 3.4 ms]
  Range (min … max):   274.1 ms … 293.2 ms    10 runs

Benchmark 3: /tmp/bench/after callable.php
  Time (mean ± σ):     279.9 ms ±  10.1 ms    [User: 276.3 ms, System: 3.4 ms]
  Range (min … max):   269.6 ms … 301.5 ms    10 runs

Benchmark 4: /tmp/bench/after closure.php
  Time (mean ± σ):     283.4 ms ±   2.3 ms    [User: 279.5 ms, System: 3.6 ms]
  Range (min … max):   279.7 ms … 286.6 ms    10 runs

Summary
  /tmp/bench/after callable.php ran
    1.01 ± 0.04 times faster than /tmp/bench/before closure.php
    1.01 ± 0.04 times faster than /tmp/bench/after closure.php
    1.32 ± 0.05 times faster than /tmp/bench/before callable.php

The “standard” array_find() micro-benchmark of:

<?php

$array = range(1, 10000);
$result = 0;
for ($i = 0; $i < 5000; $i++) {
    $result += array_find($array, static function ($item) {
            return $item === 5000;
    });
}
var_dump($result);

Results in:

$ hyperfine -L version before,after '/tmp/bench/{version} native.php'
Benchmark 1: /tmp/bench/before native.php
  Time (mean ± σ):     637.3 ms ±   6.4 ms    [User: 632.4 ms, System: 3.9 ms]
  Range (min … max):   626.8 ms … 644.9 ms    10 runs

Benchmark 2: /tmp/bench/after native.php
  Time (mean ± σ):     572.0 ms ±   3.8 ms    [User: 567.8 ms, System: 3.8 ms]
  Range (min … max):   567.0 ms … 580.1 ms    10 runs

Summary
  /tmp/bench/after native.php ran
    1.11 ± 0.01 times faster than /tmp/bench/before native.php

see #18157 (comment)

…re` to a `callable` parameter With `Closure`s existing since 5.3 and first class callables since 8.1, nowadays the most common type of callable likely is a `Closure`. Type checking a `Closure` argument for a `callable` parameter however does quite a bit of work, when a simple CE check would suffice. We thus do this. For: <?php function closure_(\Closure $f) { $f('abc'); } function callable_(callable $f) { $f('abc'); } $c = strrev(...); for ($i = 0; $i < 10000000; $i++) { closure_($c); } with the call in the loop appropriately replaced and running on a Intel(R) Core(TM) i7-1365U, I get the following timings: Benchmark 1: /tmp/bench/before callable.php Time (mean ± σ): 368.9 ms ± 5.2 ms [User: 365.0 ms, System: 3.6 ms] Range (min … max): 359.8 ms … 374.5 ms 10 runs Benchmark 2: /tmp/bench/before closure.php Time (mean ± σ): 283.3 ms ± 6.0 ms [User: 279.6 ms, System: 3.4 ms] Range (min … max): 274.1 ms … 293.2 ms 10 runs Benchmark 3: /tmp/bench/after callable.php Time (mean ± σ): 279.9 ms ± 10.1 ms [User: 276.3 ms, System: 3.4 ms] Range (min … max): 269.6 ms … 301.5 ms 10 runs Benchmark 4: /tmp/bench/after closure.php Time (mean ± σ): 283.4 ms ± 2.3 ms [User: 279.5 ms, System: 3.6 ms] Range (min … max): 279.7 ms … 286.6 ms 10 runs Summary /tmp/bench/after callable.php ran 1.01 ± 0.04 times faster than /tmp/bench/before closure.php 1.01 ± 0.04 times faster than /tmp/bench/after closure.php 1.32 ± 0.05 times faster than /tmp/bench/before callable.php The “standard” `array_find()` micro-benchmark of: <?php $array = range(1, 10000); $result = 0; for ($i = 0; $i < 5000; $i++) { $result += array_find($array, static function ($item) { return $item === 5000; }); } var_dump($result); Results in: $ hyperfine -L version before,after '/tmp/bench/{version} native.php' Benchmark 1: /tmp/bench/before native.php Time (mean ± σ): 637.3 ms ± 6.4 ms [User: 632.4 ms, System: 3.9 ms] Range (min … max): 626.8 ms … 644.9 ms 10 runs Benchmark 2: /tmp/bench/after native.php Time (mean ± σ): 572.0 ms ± 3.8 ms [User: 567.8 ms, System: 3.8 ms] Range (min … max): 567.0 ms … 580.1 ms 10 runs Summary /tmp/bench/after native.php ran 1.11 ± 0.01 times faster than /tmp/bench/before native.php see php#18157 (comment)

iluuu1994 · 2025-04-08T10:26:37Z

Zend/zend_execute.c

+		(type_mask & MAY_BE_CALLABLE)
+		&& (
+			/* Fast check for closures. */
+			EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure)


I don't think arg is guaranteed to be an object at this point. Maybe this can be combined with the Z_TYPE_P(arg) == IS_OBJECT) above.

I don't think you can combine it with the one above as a type declaration that only contains built-in types won't be complex.

I was referring to something like:

if (Z_TYPE_P(arg) == IS_OBJECT) { if ((type_mask & MAY_BE_CALLABLE) && EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure)) { return true; } else if (ZEND_TYPE_IS_COMPLEX(*type)) { ... } }

Not sure if that's better

I think it is reasonable to have if(EXPECTED(Z_TYPE_P(arg) == IS_OBJECT) {} first, as to get to this point you need to cross if (EXPECTED(ZEND_TYPE_CONTAINS_CODE(*type, Z_TYPE_P(arg)))) { from zend_check_type().

Meaning that we are either dealing with an object, a callable, or a scalar type that needs to be coerced which depends on strict_type.

The nice benefit of this approach is that it would mean that the call to zend_value_instanceof_static() could be moved into that new if block and remove the check for if (Z_TYPE_P(zv) != IS_OBJECT) {

Good point. The may also still check for ZEND_TYPE_IS_COMPLEX(*type) first (before type_mask & MAY_BE_CALLABLE) to avoid any degradation of complex types.

Adjusted according to Gina's suggestion.

Is static really more common than callable?

Is static really more common than callable?

I'm seeing static used quite a bit for return types. The closure check is probably cheaper, though. No strong opinion here. Can reorder if desired (ideally after someone else verified the numbers, because I don't really trust the results on my CPU).

Same. My benchmarks are usually all over the place... I have yet to find a setting for me that works well.

I guess we can have the static type check after the quick Closure check, especially as we could now transform static to the class name for final classes.

… object is given For the callable.php vs closure.php benchmark: $ taskset -c 0 hyperfine -L file callable.php,closure.php -L version before,after '/tmp/bench/{version} {file}' Benchmark 1: /tmp/bench/before callable.php Time (mean ± σ): 351.1 ms ± 2.8 ms [User: 348.3 ms, System: 2.0 ms] Range (min … max): 349.3 ms … 359.0 ms 10 runs Benchmark 2: /tmp/bench/before closure.php Time (mean ± σ): 274.6 ms ± 2.4 ms [User: 270.9 ms, System: 2.9 ms] Range (min … max): 273.3 ms … 281.5 ms 10 runs Benchmark 3: /tmp/bench/after callable.php Time (mean ± σ): 272.4 ms ± 0.5 ms [User: 270.3 ms, System: 2.0 ms] Range (min … max): 271.6 ms … 273.3 ms 10 runs Benchmark 4: /tmp/bench/after closure.php Time (mean ± σ): 277.4 ms ± 2.2 ms [User: 274.3 ms, System: 2.4 ms] Range (min … max): 275.7 ms … 283.3 ms 10 runs Summary /tmp/bench/after callable.php ran 1.01 ± 0.01 times faster than /tmp/bench/before closure.php 1.02 ± 0.01 times faster than /tmp/bench/after closure.php 1.29 ± 0.01 times faster than /tmp/bench/before callable.php For the array_find benchmark: Benchmark 1: /tmp/bench/before native.php Time (mean ± σ): 627.6 ms ± 7.1 ms [User: 622.5 ms, System: 2.8 ms] Range (min … max): 622.1 ms … 641.4 ms 10 runs Benchmark 2: /tmp/bench/after native.php Time (mean ± σ): 598.0 ms ± 5.5 ms [User: 594.4 ms, System: 2.7 ms] Range (min … max): 589.9 ms … 604.9 ms 10 runs Summary /tmp/bench/after native.php ran 1.05 ± 0.02 times faster than /tmp/bench/before native.php For a test with scalar types: <?php function func(string $f): string { return strrev($f); } for ($i = 0; $i < 10000000; $i++) { func('abc'); } the timings are: Benchmark 1: /tmp/bench/before scalar.php Time (mean ± σ): 212.5 ms ± 1.7 ms [User: 209.8 ms, System: 2.2 ms] Range (min … max): 211.4 ms … 218.1 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: /tmp/bench/after scalar.php Time (mean ± σ): 200.9 ms ± 2.0 ms [User: 198.0 ms, System: 2.4 ms] Range (min … max): 199.7 ms … 207.2 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary /tmp/bench/after scalar.php ran 1.06 ± 0.01 times faster than /tmp/bench/before scalar.php And a union type using only scalars: <?php function func(string|int $f): string { return strrev($f); } for ($i = 0; $i < 10000000; $i++) { func('abc'); } results in: Benchmark 1: /tmp/bench/before union.php Time (mean ± σ): 212.8 ms ± 1.8 ms [User: 210.0 ms, System: 2.2 ms] Range (min … max): 212.0 ms … 219.0 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: /tmp/bench/after union.php Time (mean ± σ): 200.1 ms ± 0.4 ms [User: 197.7 ms, System: 2.3 ms] Range (min … max): 199.4 ms … 200.8 ms 14 runs Summary /tmp/bench/after union.php ran 1.06 ± 0.01 times faster than /tmp/bench/before union.php Union types with objects are the only thing tested that were more or less the same: <?php class A { public function get() { return 'abc'; } } class B { } function func(A|B $f): string { return strrev($f->get()); } for ($i = 0; $i < 10000000; $i++) { func(new A()); } results in: Benchmark 1: /tmp/bench/before union_obj.php Time (mean ± σ): 700.6 ms ± 10.4 ms [User: 696.5 ms, System: 3.9 ms] Range (min … max): 688.6 ms … 717.8 ms 10 runs Benchmark 2: /tmp/bench/after union_obj.php Time (mean ± σ): 706.7 ms ± 15.0 ms [User: 702.5 ms, System: 3.8 ms] Range (min … max): 688.9 ms … 725.8 ms 10 runs Summary /tmp/bench/before union_obj.php ran 1.01 ± 0.03 times faster than /tmp/bench/after union_obj.php

TimWolla · 2025-04-08T12:59:01Z

Quoting from the follow-up commit:

For the callable.php vs closure.php benchmark:

$ taskset -c 0 hyperfine -L file callable.php,closure.php -L version before,after '/tmp/bench/{version} {file}'
Benchmark 1: /tmp/bench/before callable.php
  Time (mean ± σ):     351.1 ms ±   2.8 ms    [User: 348.3 ms, System: 2.0 ms]
  Range (min … max):   349.3 ms … 359.0 ms    10 runs

Benchmark 2: /tmp/bench/before closure.php
  Time (mean ± σ):     274.6 ms ±   2.4 ms    [User: 270.9 ms, System: 2.9 ms]
  Range (min … max):   273.3 ms … 281.5 ms    10 runs

Benchmark 3: /tmp/bench/after callable.php
  Time (mean ± σ):     272.4 ms ±   0.5 ms    [User: 270.3 ms, System: 2.0 ms]
  Range (min … max):   271.6 ms … 273.3 ms    10 runs

Benchmark 4: /tmp/bench/after closure.php
  Time (mean ± σ):     277.4 ms ±   2.2 ms    [User: 274.3 ms, System: 2.4 ms]
  Range (min … max):   275.7 ms … 283.3 ms    10 runs

Summary
  /tmp/bench/after callable.php ran
    1.01 ± 0.01 times faster than /tmp/bench/before closure.php
    1.02 ± 0.01 times faster than /tmp/bench/after closure.php
    1.29 ± 0.01 times faster than /tmp/bench/before callable.php

For the array_find benchmark:

Benchmark 1: /tmp/bench/before native.php
  Time (mean ± σ):     627.6 ms ±   7.1 ms    [User: 622.5 ms, System: 2.8 ms]
  Range (min … max):   622.1 ms … 641.4 ms    10 runs

Benchmark 2: /tmp/bench/after native.php
  Time (mean ± σ):     598.0 ms ±   5.5 ms    [User: 594.4 ms, System: 2.7 ms]
  Range (min … max):   589.9 ms … 604.9 ms    10 runs

Summary
  /tmp/bench/after native.php ran
    1.05 ± 0.02 times faster than /tmp/bench/before native.php

For a test with scalar types:

<?php

function func(string $f): string
{
	return strrev($f);
}

for ($i = 0; $i < 10000000; $i++) {
	func('abc');
}

the timings are:

Benchmark 1: /tmp/bench/before scalar.php
  Time (mean ± σ):     212.5 ms ±   1.7 ms    [User: 209.8 ms, System: 2.2 ms]
  Range (min … max):   211.4 ms … 218.1 ms    14 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: /tmp/bench/after scalar.php
  Time (mean ± σ):     200.9 ms ±   2.0 ms    [User: 198.0 ms, System: 2.4 ms]
  Range (min … max):   199.7 ms … 207.2 ms    14 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  /tmp/bench/after scalar.php ran
    1.06 ± 0.01 times faster than /tmp/bench/before scalar.php

And a union type using only scalars:

<?php

function func(string|int $f): string
{
	return strrev($f);
}

for ($i = 0; $i < 10000000; $i++) {
	func('abc');
}

results in:

Benchmark 1: /tmp/bench/before union.php
  Time (mean ± σ):     212.8 ms ±   1.8 ms    [User: 210.0 ms, System: 2.2 ms]
  Range (min … max):   212.0 ms … 219.0 ms    14 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: /tmp/bench/after union.php
  Time (mean ± σ):     200.1 ms ±   0.4 ms    [User: 197.7 ms, System: 2.3 ms]
  Range (min … max):   199.4 ms … 200.8 ms    14 runs

Summary
  /tmp/bench/after union.php ran
    1.06 ± 0.01 times faster than /tmp/bench/before union.php

Union types with objects are the only thing tested that were more or less the
same:

<?php

class A { public function get() { return 'abc'; } }
class B { }

function func(A|B $f): string
{
	return strrev($f->get());
}

for ($i = 0; $i < 10000000; $i++) {
	func(new A());
}

results in:

Benchmark 1: /tmp/bench/before union_obj.php
  Time (mean ± σ):     700.6 ms ±  10.4 ms    [User: 696.5 ms, System: 3.9 ms]
  Range (min … max):   688.6 ms … 717.8 ms    10 runs

Benchmark 2: /tmp/bench/after union_obj.php
  Time (mean ± σ):     706.7 ms ±  15.0 ms    [User: 702.5 ms, System: 3.8 ms]
  Range (min … max):   688.9 ms … 725.8 ms    10 runs

Summary
  /tmp/bench/before union_obj.php ran
    1.01 ± 0.03 times faster than /tmp/bench/after union_obj.php

I don't really trust my CPU to provide reliable benchmark results, though. So if someone wants to test this themselves, please do. I've created a Gist with the PHP files at: https://gist.github.com/TimWolla/d8ffbe9b9668d4d05805982ae591dcba

iluuu1994 · 2025-04-08T16:29:19Z

Summary
  sapi/cli/php-old callable-micro-opt/closure.php ran
    1.03 ± 0.02 times faster than sapi/cli/php callable-micro-opt/callable.php
    1.04 ± 0.02 times faster than sapi/cli/php callable-micro-opt/closure.php
    1.29 ± 0.02 times faster than sapi/cli/php-old callable-micro-opt/callable.php

Summary
  sapi/cli/php-old callable-micro-opt/native.php ran
    1.01 ± 0.02 times faster than sapi/cli/php callable-micro-opt/native.php

Summary
  sapi/cli/php-old callable-micro-opt/scalar.php ran
    1.02 ± 0.01 times faster than sapi/cli/php callable-micro-opt/scalar.php

Summary
  sapi/cli/php-old callable-micro-opt/union.php ran
    1.02 ± 0.01 times faster than sapi/cli/php callable-micro-opt/union.php

Summary
  sapi/cli/php-old callable-micro-opt/union_obj.php ran
    1.04 ± 0.01 times faster than sapi/cli/php callable-micro-opt/union_obj.php

With the exception of callable.php, which was significantly faster on the new version, most other benchmark seem to have slightly regressed. This result is consistent, I tried each benchmark multiple times.

nielsdos · 2025-04-10T19:08:40Z

A run on an i7-1185G7:

Summary
  ./sapi/cli/php ~/Downloads/tim/callable.php ran
    1.21 ± 0.05 times faster than ./sapi/cli/php_old ~/Downloads/tim/callable.php

  ./sapi/cli/php ~/Downloads/tim/closure.php ran
    1.00 ± 0.02 times faster than ./sapi/cli/php_old ~/Downloads/tim/closure.php

  ./sapi/cli/php ~/Downloads/tim/native.php ran
    1.00 ± 0.02 times faster than ./sapi/cli/php_old ~/Downloads/tim/native.php

  ./sapi/cli/php ~/Downloads/tim/scalar.php ran
    1.00 ± 0.04 times faster than ./sapi/cli/php_old ~/Downloads/tim/scalar.php

  ./sapi/cli/php_old ~/Downloads/tim/union.php ran
    1.01 ± 0.02 times faster than ./sapi/cli/php ~/Downloads/tim/union.php

  ./sapi/cli/php ~/Downloads/tim/union_obj.php ran
    1.00 ± 0.03 times faster than ./sapi/cli/php_old ~/Downloads/tim/union_obj.php

So more or less status quo, and a slight degradation for union.php.
The performance could be highly dependent on the machine you run it on, and probably the union checking code was hand optimized.
If we're worried about degrading but still want to have a fast path for closures, why not do this instead of this PR (which may be beneficial for other callers of this function too):

diff --git a/Zend/zend_API.c b/Zend/zend_API.c
index e0006e7d727..c8a506d8366 100644
--- a/Zend/zend_API.c
+++ b/Zend/zend_API.c
@@ -4293,6 +4293,10 @@ ZEND_API bool zend_is_callable_at_frame(
 
 ZEND_API bool zend_is_callable_ex(zval *callable, zend_object *object, uint32_t check_flags, zend_string **callable_name, zend_fcall_info_cache *fcc, char **error) /* {{{ */
 {
+	if (Z_TYPE_P(callable) == IS_OBJECT && Z_OBJCE_P(callable) == zend_ce_closure) {
+		return true;
+	}
+
 	/* Determine callability at the first parent user frame. */
 	zend_execute_data *frame = EG(current_execute_data);
 	while (frame && (!frame->func || !ZEND_USER_CODE(frame->func->type))) {

This only affects the callable bench and has no influence on the others. Sure the win isn't as great for type checking as this PR had, but this may be a pragmatic solution?

  ./sapi/cli/php ~/Downloads/tim/callable.php ran
    1.16 ± 0.03 times faster than ./sapi/cli/php_old ~/Downloads/tim/callable.php
  ./sapi/cli/php ~/Downloads/tim/closure.php ran
    1.00 ± 0.02 times faster than ./sapi/cli/php_old ~/Downloads/tim/closure.php

iluuu1994 · 2025-04-10T21:22:23Z

I can confirm that Niels' patch significantly improves performance for callable over master:

Benchmark 1: sapi/cli/php-old benchmarks/callable.php
  Time (mean ± σ):     407.1 ms ±   3.5 ms    [User: 404.3 ms, System: 1.4 ms]
  Range (min … max):   399.0 ms … 411.5 ms    10 runs
 
Benchmark 2: sapi/cli/php benchmarks/callable.php
  Time (mean ± σ):     327.9 ms ±   5.1 ms    [User: 324.4 ms, System: 2.3 ms]
  Range (min … max):   318.9 ms … 338.6 ms    10 runs
 
Summary
  sapi/cli/php benchmarks/callable.php ran
    1.24 ± 0.02 times faster than sapi/cli/php-old benchmarks/callable.php

However, that might not mean much for my CPU because it also significantly improves \Closure, even though the modified code path is not even hit. 😄

Benchmark 1: sapi/cli/php-old benchmarks/closure.php
  Time (mean ± σ):     329.7 ms ±  10.7 ms    [User: 324.8 ms, System: 1.9 ms]
  Range (min … max):   316.0 ms … 344.9 ms    10 runs
 
Benchmark 2: sapi/cli/php benchmarks/closure.php
  Time (mean ± σ):     308.0 ms ±   5.6 ms    [User: 304.6 ms, System: 2.3 ms]
  Range (min … max):   300.8 ms … 321.3 ms    10 runs
 
Summary
  sapi/cli/php benchmarks/closure.php ran
    1.07 ± 0.04 times faster than sapi/cli/php-old benchmarks/closure.php

So, take that with a grain of salt. I hate my CPU. 😋

nielsdos · 2025-04-10T21:26:41Z

However, that might not mean much for my CPU because it also significantly improves \Closure, even though the modified code path is not even hit.

#justCodeLayoutThings
#LoveOoOExecution

So, take that with a grain of salt. I hate my CPU. 😋

But as long as the others don't slow down I take it as a win 😎

EDIT: ofc my patch is not "complete" as we need to fetch the callable name in some cases too, but not in the type check case. But if this seems like a good plan to Tim as well, I can make it a formal patch with a PR.

TimWolla · 2025-04-11T06:49:14Z

EDIT: ofc my patch is not "complete" as we need to fetch the callable name in some cases too, but not in the type check case.

Yes, the callable name was the reason I did not touch that function, but zend_check_type_slow instead.

But if this seems like a good plan to Tim as well, I can make it a formal patch with a PR.

No strong opinion here, please feel free to propose something, given that I can't reliably benchmark this with the hardware I have available. Alternatively for 8e216a2 (the initial commit in this PR), the EXPECTED(Z_OBJCE_P(arg) == zend_ce_closure) could perhaps just be extended to EXPECTED(Z_TYPE_P(arg) == IS_OBJECT && Z_OBJCE_P(arg) == zend_ce_closure) (unless the larger assembly also is somehow bad for the other cases, who knows ¯\_(ツ)_/¯).

nielsdos · 2025-04-11T07:21:37Z

The path from zend_check_type_slow doesn't request a callable name anyway so that doesn't matter.
I'll propose something this evening.

nielsdos · 2025-04-11T17:52:40Z

All sorts of complicated things going on here, function seems very sensitive to changes.

The best improvement for callable.php, on my desktop, is when you add Z_TYPE_P(arg) == IS_OBJECT && Z_OBJCE_P(arg) == zend_ce_closure in zend_check_type_slow after checking for MAY_BE_CALLABLE. However, if you add an EXPECTED annotation on top of that (like originally suggested), then the compiler over-optimizes that code path and the function zend_check_user_type_slow grows from 786 bytes (without EXPECTED but with the check) to 873 bytes (with EXPECTED and with the check). This is related to code layout and different register usages/spills, (on first sight).
This increase in code size caused by the EXPECTED annotation yields a noticeable decrease in performance for native.php. However, the annotation does not seem to improve the performance of callable.php anyway.

My suggestion only influences the callable.php case and with all the extra plumbing added it yields a 18% improvement for callable.php on my system and a minor improvement of 4% for native.php, whereas the suggestion to change zend_check_type_slow yields a 25% improvement and no improvement for native.php.
My suggestion is more isolated and doesn't change any code layout of zend_check_type_slow, but the gains are more limited. I think it's still worthwhile to further explore the idea of adding a check in zend_check_type_slow, but without the EXPECTED annotation.

A "in the middle" solution may be to declare a helper function in the same file as zend_execute.c so that the compiler can perform IPA to get a call with less overhead, and we can force the helper to be outlined. This should keep the same code layout in zend_check_type_slow while still adding a fast-path for closures close to zend_check_type_slow:

diff --git a/Zend/zend_execute.c b/Zend/zend_execute.c
index 89fbcf2cbd7..58be57596e3 100644
--- a/Zend/zend_execute.c
+++ b/Zend/zend_execute.c
@@ -1140,6 +1140,14 @@ static bool zend_check_intersection_type_from_list(
 	return true;
 }
 
+static zend_never_inline bool zend_is_callable_idk_what_to_call_this(zval *callable, uint32_t check_flags)
+{
+	if (EXPECTED(Z_TYPE_P(callable) == IS_OBJECT && Z_OBJCE_P(callable) == zend_ce_closure)) {
+		return true;
+	}
+	return zend_is_callable_ex(callable, NULL, check_flags, NULL, NULL, NULL);
+}
+
 static zend_always_inline bool zend_check_type_slow(
 		const zend_type *type, zval *arg, const zend_reference *ref,
 		bool is_return_type, bool is_internal)
@@ -1178,7 +1186,7 @@ static zend_always_inline bool zend_check_type_slow(
 
 	const uint32_t type_mask = ZEND_TYPE_FULL_MASK(*type);
 	if ((type_mask & MAY_BE_CALLABLE) &&
-		zend_is_callable(arg, is_internal ? IS_CALLABLE_SUPPRESS_DEPRECATIONS : 0, NULL)) {
+		zend_is_callable_idk_what_to_call_this(arg, is_internal ? IS_CALLABLE_SUPPRESS_DEPRECATIONS : 0)) {
 		return 1;
 	}
 	if ((type_mask & MAY_BE_STATIC) && zend_value_instanceof_static(arg)) {

This yields a 25% improvement on callable.php, so the same as the first idea, but the chance is high that this does not unexpectedly influence any other test case.

TimWolla requested review from iluuu1994 and nielsdos April 8, 2025 10:13

TimWolla requested a review from dstogov as a code owner April 8, 2025 10:13

github-actions bot added the Category: Engine label Apr 8, 2025

iluuu1994 reviewed Apr 8, 2025

View reviewed changes

TimWolla changed the title ~~zend_execute: Improve type checking performance when passing a Closure to a callable parameter~~ zend_execute: Streamline typechecks in zend_check_type_slow() if an object is given Apr 8, 2025

zend_execute: Streamline typechecks in zend_check_type_slow() if an object is given #18277

Are you sure you want to change the base?

zend_execute: Streamline typechecks in zend_check_type_slow() if an object is given #18277

Uh oh!

Conversation

TimWolla commented Apr 8, 2025

Uh oh!

iluuu1994 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Girgias Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

iluuu1994 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Girgias Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iluuu1994 Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TimWolla Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

iluuu1994 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

TimWolla Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

iluuu1994 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Girgias Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TimWolla commented Apr 8, 2025

Uh oh!

iluuu1994 commented Apr 8, 2025

Uh oh!

nielsdos commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iluuu1994 commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nielsdos commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TimWolla commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nielsdos commented Apr 11, 2025

Uh oh!

nielsdos commented Apr 11, 2025

Uh oh!

Uh oh!

zend_execute: Streamline typechecks in `zend_check_type_slow()` if an object is given #18277

zend_execute: Streamline typechecks in `zend_check_type_slow()` if an object is given #18277

Girgias Apr 8, 2025 •

edited

Loading

iluuu1994 Apr 8, 2025 •

edited

Loading

Girgias Apr 8, 2025 •

edited

Loading

nielsdos commented Apr 10, 2025 •

edited

Loading

iluuu1994 commented Apr 10, 2025 •

edited

Loading

nielsdos commented Apr 10, 2025 •

edited

Loading

TimWolla commented Apr 11, 2025 •

edited

Loading