is `MultiHeadAttention` in `flax.nnx` JIT-compilable with the `decode=True` parameter? #4523

Neulus · 2025-02-02T04:26:50Z

Neulus
Feb 2, 2025

I was trying to gain some speed-ups with JIT for autoregressive decoding.

import jax
from flax.nnx import MultiHeadAttention, Rngs, jit

rngs = Rngs(0)
attn = MultiHeadAttention(8, 512, rngs=rngs)
jited_attn = jit(attn)

attn.init_cache((1, 100, 512))
for i in range(1, 100):
    test_input = jax.random.uniform(jax.random.PRNGKey(0), (1, 1, 512))
    resp = jited_attn(test_input, decode=True)
    print("Itering {i}".format(i=i))

This fails with error:

...

File ".venv/lib/python3.12/site-packages/flax/nnx/nn/attention.py", line 486, in __call__
    if decode:
       ^^^^^^
jax.errors.TracerBoolConversionError: Attempted boolean conversion of traced array with shape bool[].
The error occurred while tracing the function <unknown> for jit. This concrete value was not available in Python because it depends on the value of the argument decode.

Autoregressive decoding without JIT

import jax
from flax.nnx import MultiHeadAttention, Rngs, jit

rngs = Rngs(0)
attn = MultiHeadAttention(8, 512, rngs=rngs)

attn.init_cache((1, 100, 512))
for i in range(1, 100):
    test_input = jax.random.uniform(jax.random.PRNGKey(0), (1, 1, 512))
    resp = attn(test_input, decode=True)
    print("Itering {i}".format(i=i))

works well.

in my understanding, a jax JIT function should be a pure function. however, involving a KV-cache inside the module makes it impure. I wonder if this architecture was intentional or if I'm missing something.

Answered by cgarciae

Feb 3, 2025

Hey @Neulus, because of how instance methods work in python you should not transform them as self will be passed as a capture. Instead transform a regular function with the Module as an explicit input. Also, you can use set_attributes to recursively set Module propierties like decode. Here is a working example:

import jax
from flax import nnx

rngs = nnx.Rngs(0)
attn = nnx.MultiHeadAttention(8, 512, rngs=rngs)

attn.init_cache((1, 100, 512))
attn.set_attributes(decode=True)


@nnx.jit
def forward(attn, inputs):
  return attn(inputs)


for i in range(4):
  test_input = jax.random.uniform(jax.random.key(i), (1, 1, 512))
  resp = forward(attn, test_input)
  print(attn.cached_key.value[0, :4, 0…

View full answer

cgarciae · 2025-02-03T22:29:32Z

cgarciae
Feb 3, 2025
Maintainer

Hey @Neulus, because of how instance methods work in python you should not transform them as self will be passed as a capture. Instead transform a regular function with the Module as an explicit input. Also, you can use set_attributes to recursively set Module propierties like decode. Here is a working example:

import jax
from flax import nnx

rngs = nnx.Rngs(0)
attn = nnx.MultiHeadAttention(8, 512, rngs=rngs)

attn.init_cache((1, 100, 512))
attn.set_attributes(decode=True)


@nnx.jit
def forward(attn, inputs):
  return attn(inputs)


for i in range(4):
  test_input = jax.random.uniform(jax.random.key(i), (1, 1, 512))
  resp = forward(attn, test_input)
  print(attn.cached_key.value[0, :4, 0, :5])  # watch cache grow

1 reply

Neulus Feb 3, 2025
Author

Thank you! I guess I didn't fully understand the Mutable Module cannot be passed by closure section before, but it makes a lot more sense now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

is `MultiHeadAttention` in `flax.nnx` JIT-compilable with the `decode=True` parameter? #4523

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

is MultiHeadAttention in flax.nnx JIT-compilable with the decode=True parameter? #4523

Uh oh!

Uh oh!

Neulus Feb 2, 2025

Replies: 1 comment · 1 reply

Uh oh!

cgarciae Feb 3, 2025 Maintainer

Uh oh!

Neulus Feb 3, 2025 Author

is `MultiHeadAttention` in `flax.nnx` JIT-compilable with the `decode=True` parameter? #4523

Neulus
Feb 2, 2025

Replies: 1 comment 1 reply

cgarciae
Feb 3, 2025
Maintainer

Neulus Feb 3, 2025
Author