[train][FullyAsync] Implement customizable weight sync frequency by tamoghnokandar · Pull Request #1293 · NovaSky-AI/SkyRL

tamoghnokandar · 2026-03-08T02:59:38Z

Summary

This PR changes fully-async training to treat trainer.train_batch_size as the outer async training step, while keeping trainer.policy_mini_batch_size as the inner optimizer minibatch size. Each fully-async step now waits for a full outer batch of prompt-groups, runs the existing inner minibatch loop over that batch, performs a single weight sync, and then advances global_step. Fixes issue #1205.

What Changed

Fully-Async trainer semantics

Introduced explicit outer-batch semantics in skyrl/train/fully_async_trainer.py.
train_batch_size is now the unit for:
- outer-step aggregation
- steps-per-epoch calculation
- consumed UID accounting
- staleness/capacity tracking
- worker-count validation
policy_mini_batch_size remains the inner slicing unit used by the existing optimizer path.
Added num_policy_minibatches_per_outer_step to make the outer-step to inner-minibatch relationship explicit.
Relaxed validation from equality to divisibility:
- train_batch_size >= policy_mini_batch_size
- train_batch_size % policy_mini_batch_size == 0

Training loop behavior

The fully-async loop now waits until it has train_batch_size completed generation groups before building a training batch.
A single outer step now:
1. collects train_batch_size groups
2. converts them into one training batch
3. runs _run_training(...)
4. marks train_batch_size UIDs consumed
5. performs one pause/sync/resume cycle
6. increments global_step once
Per-epoch accounting and resume assertions were updated to use outer-batch units instead of policy minibatch units.

Staleness and capacity

Fully-async staleness capacity now uses train_batch_size as the consumer quantum.
Worker-count bounds are now enforced as:

train_batch_size <= num_parallel_generation_workers <= train_batch_size * (max_staleness_steps + 1)
This aligns max_staleness_steps with outer fully-async training steps rather than inner optimizer minibatches.

Tests

Added/updated tests in tests/train/test_fully_async_trainer.py to cover:

steps-per-epoch and effective dataloader length using train_batch_size
outer-step staleness capacity using train_batch_size
aggregation of a full outer batch before running inner minibatch optimizer steps
resume-time consumed UID accounting using completed outer steps

Impact

With this change, fully-async training behaves as intended for configs such as:

train_batch_size = 1024
policy_mini_batch_size = 256

In that setup, one fully-async outer step consumes 1024 prompt-groups, runs 4 inner policy minibatches, performs one weight sync, and increments global_step once.

Reward Curve

Eval Curve

Epochs

I ran for around 9 epochs before running out of GPU credits. PR description written by Codex.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

gemini-code-assist

Code Review

This pull request successfully implements customizable weight sync frequency in fully-async training by introducing a distinction between an outer train_batch_size and an inner policy_mini_batch_size. The changes are consistently applied across the trainer logic, configuration files, and documentation. The addition of new tests ensures the new outer-batch semantics and resume logic are working as expected. My only feedback is a minor correction in the documentation.

gemini-code-assist · 2026-03-08T03:05:36Z

docs/content/docs/tutorials/fully_async.mdx

 - AReal: https://arxiv.org/abs/2505.24298v3
 - PipelineRL: https://arxiv.org/abs/2509.19128v2
- ScaleRL: https://arxiv.org/abs/2510.13786
+- ScaleRL: https://arxiv.org/abs/2510.13786


The arXiv links in the references section appear to be placeholders pointing to future dates (e.g., 2505.24298v3 for May 2025). This could be misleading for readers. Please replace these with the correct references or remove them if they are not yet available.

…kyRL into add_weight_sync

tamoghnokandar · 2026-03-08T19:11:35Z

@CharlieFRuan This PR is ready for review.

tamoghnokandar and others added 5 commits March 6, 2026 14:41

Add customizable weight syncing support

dc6c604

fix errors

3dc0403

Fix Bugs

23967a1

Delete modal_run_fully_async.py

9a8bea3

Delete modal_run_tests.py

cd65c94

devin-ai-integration bot reviewed Mar 8, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 8, 2026

View reviewed changes

tamoghnokandar added 2 commits March 7, 2026 22:07

Fix tests

725cf3b

Merge branch 'add_weight_sync' of https://github.com/tamoghnokandar/S…

6edbbae

…kyRL into add_weight_sync

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train][FullyAsync] Implement customizable weight sync frequency#1293

[train][FullyAsync] Implement customizable weight sync frequency#1293
tamoghnokandar wants to merge 7 commits intoNovaSky-AI:mainfrom
tamoghnokandar:add_weight_sync

tamoghnokandar commented Mar 8, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Uh oh!

tamoghnokandar commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tamoghnokandar commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Fully-Async trainer semantics

Training loop behavior

Staleness and capacity

Tests

Impact

Reward Curve

Eval Curve

Epochs

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

tamoghnokandar commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tamoghnokandar commented Mar 8, 2026 •

edited

Loading