Skip to content

[tx] Make sharding explicit in LoRA constructors#997

Merged
pcmoritz merged 4 commits intoNovaSky-AI:mainfrom
pcmoritz:tx-update-sharding
Jan 31, 2026
Merged

[tx] Make sharding explicit in LoRA constructors#997
pcmoritz merged 4 commits intoNovaSky-AI:mainfrom
pcmoritz:tx-update-sharding

Conversation

@pcmoritz
Copy link
Collaborator

@pcmoritz pcmoritz commented Jan 31, 2026

This is in preparation for merging #996, so we don't need to depend on the jax tracer. It is also slightly cleaner this way and the assert is not needed any more, since the error is "defined away".

It also adds the FSDP sharding for llama3.

@pcmoritz pcmoritz added the tx label Jan 31, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the LoRA layer constructors (LoRAEmbed, LoRALinear, LoRAExpert) to explicitly accept a sharding parameter. This change streamlines the sharding configuration by removing the need for nnx.with_partitioning calls at the instantiation sites within the model definitions (deepseekv3.py, llama3.py, qwen3.py) and eliminates the associated runtime assertions. The refactoring consistently applies across all affected files, improving clarity and control over sharding for LoRA layers. All changes are well-implemented and align with the stated objective of making sharding explicit.

Copy link
Contributor

@raulchen raulchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can define constants for the shardings, as they are repeated.

@pcmoritz
Copy link
Collaborator Author

pcmoritz commented Jan 31, 2026

Do you mean across models? Within models there is generally not a lot of repetition, only a little bit, but then it is also nice that each tensor has its sharding explicitly directly next to it. I'm going to merge this for now so we can unblock the other PR, but we can think about whether there is a good way to structure this in a better way to get rid of repetition (and forgetting certain sharding) across models. Let me know if you have some ideas :)

@pcmoritz pcmoritz merged commit 7103a2f into NovaSky-AI:main Jan 31, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants