Skip to content

Conversation

@Shreya-gaur
Copy link
Contributor

@Shreya-gaur Shreya-gaur commented Nov 21, 2025

The PR introduces Blockscaled Ragged Contiguous Grouped GEMM for MoEs.

In MoEs, the weights have same dimensions across all experts and are stored contiguously in memory. The activations can have different dimensions across experts and may not be contiguous in memory either.

In this PR, I modified the grouped gemm kernel in CUTLASS to avoid tensormap update for weight matrix(matrix A) and it's corresponding scale(SFA). The weights are loaded using TMA. Note that the activation matrix and it's corresponding scale (SFB) still requires tensormap updates as the group, the kernel is working on, changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant