Remove batch-level padding from tokenize_sft_batch by Kovbo · Pull Request #582 · OpenPipe/ART

Kovbo · 2026-02-25T20:03:39Z

Summary

Remove unnecessary padding from tokenize_sft_batch: each trajectory tensor now keeps its natural sequence length instead of being padded to the longest in the batch
Add padding-strip in unsloth training loop (service.py): trims any trailing padding before .to(device) for robustness

Context

tokenize_sft_batch was padding all trajectories to the longest sequence in the batch. Since every consumer (unsloth, megatron) processes trajectories individually with gradient accumulation, this padding was pure waste — inflating CPU tensors and forcing extra GPU compute on padding tokens. The megatron trainer already stripped padding on its own.

Now padding is eliminated at the source. The unsloth training loop also defensively strips padding before moving to GPU, so it works correctly whether tensors arrive padded or not.

tokenize_sft_batch was padding all trajectories to the longest sequence in the batch, but every consumer (unsloth, megatron) processes them individually. This wasted CPU memory and GPU compute on padding tokens. Now each trajectory tensor keeps its natural length. The unsloth training loop strips any residual padding before .to(device) for robustness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Match the serverless-training microbatch approach: process trajectories in configurable microbatch groups with padding trimmed to the longest in each group. Changing microbatch_size is a one-line change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

angkywilliam · 2026-02-25T20:51:38Z

src/art/unsloth/service.py

+            # Process trajectories in microbatches, trimming padding to the
+            # longest sequence in each microbatch to avoid wasted GPU compute.
+            microbatch_size = 1
+            for i in range(0, len(batch.trajectory_tensors), microbatch_size):


Not sure if we need this flow.
We already remove the extra padding logic in the processing flow above.

You are right, no need to have it. Removed!

Padding is now removed at the source in tokenize_sft_batch, so the training loop doesn't need microbatch trimming logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Kovbo marked this pull request as draft February 25, 2026 20:03

Kovbo and others added 2 commits February 25, 2026 20:05

Fix ruff formatting

3091560

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Kovbo marked this pull request as ready for review February 25, 2026 20:25

Kovbo requested a review from angkywilliam February 25, 2026 20:50

angkywilliam approved these changes Feb 25, 2026

View reviewed changes

Simplify training loop back to plain per-trajectory iteration

d918885

Padding is now removed at the source in tokenize_sft_batch, so the training loop doesn't need microbatch trimming logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Kovbo merged commit 5207db0 into main Feb 26, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove batch-level padding from tokenize_sft_batch#582

Remove batch-level padding from tokenize_sft_batch#582
Kovbo merged 4 commits intomainfrom
remove-sft-batch-padding

Kovbo commented Feb 25, 2026 •

edited

Loading

Uh oh!

angkywilliam Feb 25, 2026

Uh oh!

Kovbo Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Kovbo commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Uh oh!

angkywilliam Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Kovbo Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kovbo commented Feb 25, 2026 •

edited

Loading