Add loss_fn parameterization to forward_backward by tyler-griggs · Pull Request #924 · NovaSky-AI/SkyRL

tyler-griggs · 2026-01-23T01:53:55Z

Summary

Adds optional loss_fn and loss_fn_config parameters to forward_backward() for Tinker API compatibility
Maps Tinker algorithm names (e.g., "ppo") to SkyRL equivalents
Removes ppo_train() from FSDP workers - uses gradient scaling at optim_step instead
Updates all tests to use the new unified API

Changes

WorkerDispatch (worker_dispatch.py):
- Added loss_fn and loss_fn_config parameters to forward_backward()
- Passes these parameters through to worker mesh methods
PolicyWorkerBase (worker.py):
- Added convert_tinker_loss_config() static method to convert Tinker's absolute ratio bounds to SkyRL's offset format
- Gradient scaling now happens at optim_step time based on accumulated micro batches
- Removed separate ppo_train() path for FSDP workers
Tests:
- Updated test helpers to use comprehensive parameter sets
- Added test_convert_tinker_loss_config for Tinker config conversion
- Updated all GPU tests to use pass_through routing and positional batch parameters

Test Plan

✅ CPU tests pass: test_normalize_mini_batch_size, test_convert_tinker_loss_config
GPU tests can be run with: pytest tests/gpu/gpu_ci/test_training_step.py

Stack

Built on top of [Tinker] Add save_weights_for_sampler() to WorkerDispatch #922 (save_weights_for_sampler)

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces parameterization for loss_fn to forward_backward for Tinker API compatibility and refactors weight synchronization into a new save_weights_for_sampler method. The changes are generally positive, simplifying the API and improving test coverage. However, I've identified a critical issue in trainer.py where the critic training path appears to be broken due to incorrect arguments being passed to _execute_training_step. Additionally, there's an inconsistency in a new test file regarding a test utility function call.

gemini-code-assist · 2026-01-23T01:56:24Z

skyrl-train/skyrl_train/trainer.py

+                with Timer("critic_train", self.all_timings):
+                    critic_status = self._execute_training_step("critic", data, "critic")
+            with Timer("policy_train", self.all_timings):
+                policy_status = self._execute_training_step("policy", data, "policy")


There appears to be a critical issue with the arguments passed to _execute_training_step for both the critic and policy models. The function signature for _execute_training_step is (self, model: str, data: TrainingInputBatch), but it's being called with three arguments here (e.g., self._execute_training_step("critic", data, "critic")). This will result in a TypeError at runtime.

While the intent seems to be to pass a loss_fn, the implementation appears incomplete. Specifically:

The signature of _execute_training_step hasn't been updated to accept a third argument.

Even if it were updated, the critic training path would likely fail. The loss_fn would be "critic", which is not handled by PolicyWorkerBase._get_loss_fn, and CriticWorkerBase doesn't have a comparable method to handle a parameterized loss function.

To fix this, you'll need to update the signature of _execute_training_step and ensure that both policy and critic workers can correctly handle the new loss_fn parameter. For the critic, you might want to pass None as the loss_fn if it's not meant to be parameterized, and handle that case in _execute_training_step.

gemini-code-assist · 2026-01-23T01:56:24Z

skyrl-train/tests/gpu/gpu_ci/test_save_weights_for_sampler.py

+
+        # === Step 1: Do a training step ===
+        dp_size = policy_group.actor_infos[0].rank.dp_size
+        dummy_batch = make_dummy_training_batch(batch_size=dp_size)


The call to make_dummy_training_batch here and on line 190 seems inconsistent with changes in other test files. In other files like test_save_load_checkpoint.py and test_training_step.py, the batch_size argument was removed from this call (e.g., make_dummy_training_batch()).

If the signature of make_dummy_training_batch has changed, this could lead to test failures. For consistency across the test suite, please update this call to match the new pattern.

Suggested change

dummy_batch = make_dummy_training_batch(batch_size=dp_size)

dummy_batch = make_dummy_training_batch()

- Remove ppo_train() from PolicyWorkerBase and CriticWorkerBase - Workers now use forward_backward() + optim_step() with gradient scaling - Trainer branches on strategy: Megatron uses ppo_train, FSDP uses forward_backward + optim_step - WorkerDispatch forward_backward no longer takes Tinker params (loss_fn, loss_fn_config) - Update tests to use TrainingInputBatch and remove ppo_train tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…anges - Megatron: Remove redundant batch_to_experience call (iterator already yields Experience) - test_save_load_model.py: Use TrainingInputBatch, remove extra forward_backward arg - test_worker_offload.py: Use TrainingInputBatch, remove extra forward_backward arg Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tyler-griggs changed the base branch from main to arm January 23, 2026 01:56

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

tyler-griggs changed the base branch from arm to main January 23, 2026 01:56

tyler-griggs and others added 4 commits January 23, 2026 01:59

add loss_fn parameterization

730d87a

format

74c3538

tyler-griggs force-pushed the tgriggs/loss_fn_clean branch from 7a0f4c3 to 70fb844 Compare January 23, 2026 01:59

tyler-griggs marked this pull request as draft January 23, 2026 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add loss_fn parameterization to forward_backward#924

Add loss_fn parameterization to forward_backward#924
tyler-griggs wants to merge 4 commits intoNovaSky-AI:mainfrom
tyler-griggs:tgriggs/loss_fn_clean

tyler-griggs commented Jan 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	dummy_batch = make_dummy_training_batch(batch_size=dp_size)
	dummy_batch = make_dummy_training_batch()

Conversation

tyler-griggs commented Jan 23, 2026

Summary

Changes

Test Plan

Stack

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant