Skip to content

Conversation

@NouamaneTazi
Copy link
Member

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guidelines?
  • Did you write any new necessary tests?
  • Did you log the throughput and loss you get to ensure the PR works as expected in actual training?
  • Did you log the memory usage? you can use this tool to understand the memory usage breakdown in nanotron.
  • If you modified anything related to checkpoints, did you verify that saving and reloading checkpoints still works correctly?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

NouamaneTazi and others added 5 commits April 24, 2025 11:59
* fix makefile, sync with datatrove, update lighteval config

* fix path

---------

Co-authored-by: Hynek Kydlicek <[email protected]>
* Nouamane/lighteval (#356)

* InitScalingMethod

* InitScalingMethod

* eval

* try adding lightevalrunner to trainer

* amend

* amend

* amend

* amend

* amend

* amend

* .

* amend

* amend

* .

* qos to low

* add nanotron_path

* some fix: logs, and config

* cp instead of sync

* eval_interval

* serialize sanity checks

* add output dir and s3_save path in the config

* fix s3 only if define

* fixes

* add requeue

* add wandb with lighteval and fix eval interval

* fix this little space :(

* folder_path should always have s3 when using s3 (fix consumed tokens issue)

* config qwen

* .

---------

Co-authored-by: elie <[email protected]>
Co-authored-by: “eliebak” <[email protected]>

* fix inference in case of varlen (input with paddings)

* .

* legacy

* remove bos token

* max-micro-batch

* separate inference from training

* use_decode_text

* add no use cache case to decode_tokenized

---------

Co-authored-by: elie <[email protected]>
Co-authored-by: “eliebak” <[email protected]>
@NouamaneTazi NouamaneTazi changed the base branch from main to dev June 23, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants