-
|
Hey, guys, I'm trying to figure something out: I'm using jax.jit for the entire model forward pass, but I want to switch the mesh within the Moe layer—for example, attention weights tp=4, Moe weights ep=2, tp=2. The goal is to decouple the mesh used by the attention mechanism. What's the best approach to achieve this? |
Beta Was this translation helpful? Give feedback.
Answered by
yashk2810
Nov 7, 2025
Replies: 1 comment 2 replies
-
|
Great question! Yes, you can switch the abstract mesh inside jit. For example: Note that this kind of switching is best done under |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
Prayer3th
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Great question!
Yes, you can switch the abstract mesh inside jit. For example: