Max pooling with reduce_window causes "ValueError: Linearization failed ..." only on some cases #33038

yippiez · 2025-10-31T16:00:09Z

yippiez
Oct 31, 2025

Using reduce_window with 0.0 as it's init value causes a Linearization error when only jit compiling a grad transformed version of the function. Below is the example code with error. The error doesn't happen if -jnp.inf is used or the current version is not jit compiled.

def maxpool_step(x, kernel_size, stride):
    return jax.lax.reduce_window(
        x,
        jnp.float32(0.0),
        jax.lax.max,
        window_dimensions=(1, 1, kernel_size, kernel_size),
        window_strides=(1, 1, stride, stride),
        padding="VALID",
    )

_grad = jax.grad(lambda x: jnp.sum(maxpool_step(x, kernel_size=2, stride=2)))
_grad_jit = jit(_grad) # => ValueError: Linearization failed ...

Interestingly removing jit compilation from the above make it not throw out this error even though it's the same. Below is the working version in which both gradand jit functions work.

def maxpool_step(x, kernel_size, stride):
    return jax.lax.reduce_window(
        x,
        -jnp.inf, #  0.0, jnp.float32(0.0), jnp.array(0.0) or jnp.array(-jnp.inf) all cause errors
        jax.lax.max,
        window_dimensions=(1, 1, kernel_size, kernel_size),
        window_strides=(1, 1, stride, stride),
        padding="VALID",
    )

My questions are:

Why does it throw an error for just the certain values? ( 0.0 / jnp.float32(0.0) / jnp.array(0.0) / jnp.array(-jnp.inf) )
In the first case why does it throw an error only when the grad function is jit compiled?

Answered by amifalk

Nov 3, 2025

There are a few things going on here.

init_value = 0 should always fail. The gradient of jax.lax.reduce_window is only defined for jax.lax.max when init_value is -inf. If you want to use a different init value, you need to define the gradient yourself.
When you jit compile the function using jax array types for init_value (jnp.float32(...) and jnp.array(...)), the array types are traced. This breaks the logic of jax.lax.reduce_window.

The gradient computation of jax.lax.reduce_window has static branching logic that depends on the actual scalar value of init_value. Since values in traced arrays are are not accessible, the code fails when you call jit over top of grad.

jnp.inf is an …

View full answer

amifalk · 2025-11-03T20:59:13Z

amifalk
Nov 3, 2025

There are a few things going on here.

init_value = 0 should always fail. The gradient of jax.lax.reduce_window is only defined for jax.lax.max when init_value is -inf. If you want to use a different init value, you need to define the gradient yourself.
When you jit compile the function using jax array types for init_value (jnp.float32(...) and jnp.array(...)), the array types are traced. This breaks the logic of jax.lax.reduce_window.

The gradient computation of jax.lax.reduce_window has static branching logic that depends on the actual scalar value of init_value. Since values in traced arrays are are not accessible, the code fails when you call jit over top of grad.

jnp.inf is an alias for the python type float("inf"). Since this is a constant and not a jax array type, it is not traced under jit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Max pooling with reduce_window causes "ValueError: Linearization failed ..." only on some cases #33038

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Max pooling with reduce_window causes "ValueError: Linearization failed ..." only on some cases #33038

Uh oh!

Uh oh!

yippiez Oct 31, 2025

Replies: 1 comment

Uh oh!

amifalk Nov 3, 2025

yippiez
Oct 31, 2025

amifalk
Nov 3, 2025