Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
498d3df
SFT data iterator
angkywilliam Nov 13, 2025
3bd818f
Add SFT LR utils
angkywilliam Nov 14, 2025
66ec620
train_sft skeleton
angkywilliam Nov 14, 2025
4aeda2f
SFT Shape 0.1
angkywilliam Nov 14, 2025
4ff152b
Add shuffle to SFTConfig
angkywilliam Nov 14, 2025
b6f0380
change SFT args order
angkywilliam Nov 14, 2025
e32db37
Refactor SFT to accept batched trajectories
angkywilliam Nov 18, 2025
9138b07
Tokenize SFT Batch
angkywilliam Nov 19, 2025
18a7897
Add num_trainable_tokens to SFTBatch
angkywilliam Nov 19, 2025
90bf94b
draft train_sft
angkywilliam Nov 19, 2025
12e2142
Flatten trajectory for train_sft
angkywilliam Nov 21, 2025
4ea6c5e
Tokenize SFT Batches support flat list and add padding
angkywilliam Nov 21, 2025
f7bb203
Fix max_length duplicate name issue
angkywilliam Nov 21, 2025
d59e524
Remove unused file
angkywilliam Nov 21, 2025
7f6309a
remove unused typing
angkywilliam Nov 21, 2025
5ec5575
sft iterator
angkywilliam Nov 22, 2025
d6688cf
SFT Iterator
angkywilliam Nov 22, 2025
6c63af5
Use Unsloth for train on response
angkywilliam Nov 25, 2025
d2b39d5
Merge branch 'main' of github.com:OpenPipe/ART into sft
Kovbo Jan 14, 2026
ca5177b
refactoring
Kovbo Jan 14, 2026
c3a06b4
implement local backend SFT training
Kovbo Jan 15, 2026
9cf747d
Add SFT to Local Backend
Kovbo Jan 15, 2026
28205cb
avg loss
Kovbo Jan 15, 2026
64454b1
refactor, sft works good
Kovbo Jan 17, 2026
739eb45
Merge branch 'sft' of github.com:OpenPipe/ART into sft
Kovbo Jan 17, 2026
9918f65
Merge remote-tracking branch 'origin/main' into sft
Kovbo Jan 20, 2026
fb706f9
remove logging
Kovbo Jan 20, 2026
08d87d1
move tokenizer, update backend
Kovbo Jan 20, 2026
0573bc8
update lr schedule and tests
Kovbo Jan 20, 2026
904c3ff
refactor sft training from file
Kovbo Jan 20, 2026
2078d5e
change batch sft
Kovbo Jan 21, 2026
381ac7d
refactor step count based on checkpoints
Kovbo Jan 21, 2026
4bc79ed
update sft warmup script
Kovbo Jan 21, 2026
db6833c
fix model registration
Kovbo Jan 21, 2026
9544df9
make local random
Kovbo Jan 22, 2026
c6b2874
refactor backend
Kovbo Jan 22, 2026
834b37e
refactor
Kovbo Jan 22, 2026
736f259
Merge branch 'main' of github.com:OpenPipe/ART into sft
Kovbo Jan 22, 2026
84e6ceb
update example
Kovbo Jan 22, 2026
e2ea1ec
Pyright fix
Kovbo Jan 22, 2026
0fa52f8
remove iterate file epochs, refactor
Kovbo Jan 22, 2026
e43cbea
refactor
Kovbo Jan 22, 2026
2fae9c8
Merge branch 'main' of github.com:OpenPipe/ART into sft-local-backend
Kovbo Jan 22, 2026
d336f18
add serverless endpoint
Kovbo Jan 22, 2026
c9f63fe
Rename training_folder_url to training_data_url
Kovbo Jan 23, 2026
61ff551
update defaults, change reporting
Kovbo Jan 23, 2026
997b69f
update lables
Kovbo Jan 24, 2026
e67accd
make sft to produce only one checkpoint step
Kovbo Jan 26, 2026
3238810
refactor train from file
Kovbo Jan 26, 2026
393495f
refactor
Kovbo Jan 29, 2026
eb39441
Merge origin/main into sft-local-backend
Kovbo Jan 30, 2026
ae21b5b
Refactor SFTTrainConfig
Kovbo Feb 2, 2026
4daedeb
refactor
Kovbo Feb 2, 2026
e5ee192
Merge remote-tracking branch 'origin/main' into sft-local-backend
Kovbo Feb 2, 2026
2991645
correctly register lora, fix unsloth proxy check
Kovbo Feb 4, 2026
d2513eb
Merge branch 'main' of github.com:OpenPipe/ART into sft-local-backend
Kovbo Feb 4, 2026
24dcc4c
add sft train from file streaming
Kovbo Feb 4, 2026
f38ff55
add openpipe qwen back
Kovbo Feb 4, 2026
e8c9f9a
lint fix
Kovbo Feb 5, 2026
5896871
calculate pbar
Kovbo Feb 5, 2026
0667087
rename to training_data_url
Kovbo Feb 5, 2026
ced5ce6
accept model run_id from server
Kovbo Feb 5, 2026
c3bf7c3
update optimizer hparams
Kovbo Feb 6, 2026
d897dd6
add claude command
Kovbo Feb 7, 2026
892ce97
remove queue, add skills
Kovbo Feb 10, 2026
264ec5c
add docs and colab example
Kovbo Feb 14, 2026
e798e64
move zero_grad
Kovbo Feb 14, 2026
a1dcf1d
Merge branch 'main' of github.com:OpenPipe/ART into sft-local-backend
Kovbo Feb 14, 2026
0fe0948
add final step arg
Kovbo Feb 14, 2026
2ccd819
update docs
Kovbo Feb 16, 2026
78fc058
Merge branch 'sft-local-backend' of github.com:OpenPipe/ART into sft-…
Kovbo Feb 16, 2026
60d0cac
update docs and trajectories
Kovbo Feb 16, 2026
f0ded2d
lint fix
Kovbo Feb 16, 2026
e6fb81f
add cli skills
Kovbo Feb 17, 2026
8797dff
add chunking
Kovbo Feb 17, 2026
413ef3b
lint fix
Kovbo Feb 17, 2026
1c1372c
remove inline trajectories from skills
Kovbo Feb 17, 2026
a68f925
update chunking
Kovbo Feb 18, 2026
8b9c8a2
change default chunk to 10
Kovbo Feb 18, 2026
8904bd1
remove leftovers
Kovbo Feb 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
386 changes: 386 additions & 0 deletions .agents/skills/train-rl/SKILL.md

Large diffs are not rendered by default.

298 changes: 298 additions & 0 deletions .agents/skills/train-sft/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
---
name: train-sft
description: SFT training reference for the ART framework. Use when the user asks to create, write, or help with an SFT training script, fine-tune a model, train from a JSONL dataset, do distillation, or anything related to supervised fine-tuning.
---

# SFT Training Wizard

You are guiding the user through setting up Supervised Fine-Tuning (SFT) for a language model using the ART framework. Act as an interactive wizard: ask questions, validate inputs, and generate a complete runnable script.

**Important**: Ask ONE question at a time. Wait for the user's response before asking the next question. Never bundle multiple questions into a single message.

**Adaptability note**: Some steps reference tools like AskUserQuestion, Glob, or Bash. If you don't have access to these tools, simply ask the user the same questions as plain text and skip any steps that require running code (e.g., file search, dataset validation, hyperparameter computation). Do NOT fabricate results — never pretend you ran a tool or searched for files when you didn't.

## Step 1: Determine Training Scenario

Ask the user ONE question at a time. Wait for their response before moving to the next question.

**Training scenario:**
1. **Train from a JSONL file** — They have a dataset file with chat-formatted examples
2. **Distillation** — They want to train a smaller model using outputs from a larger teacher model

## Step 2: Determine Backend

**Backend:**
1. **ServerlessBackend (Recommended)** — Train on remote managed GPUs. No local GPU needed, production-ready inference endpoint.
2. **LocalBackend** — Train on your local GPU. Full control, fast iteration.

## Step 3: Select and Validate Dataset (JSONL scenario)

**IMPORTANT**: Do NOT assume a dataset. Do NOT make up or hallucinate file paths. Never pretend you searched for files if you didn't actually run a search tool.

If you have access to file system tools (Glob) and can actually execute them, search for `.jsonl` files using Glob (`**/*.jsonl`). Present real results as options. Always include "Provide my own file path" as the last option.

Otherwise, ask the user: "What is the path to your JSONL training file?" — nothing more.

Once the user has provided a file path, validate it if you can run code using the script below. If you cannot run code, skip validation and move on.

```python
import json, sys
ROLES = {"system", "user", "assistant", "developer", "tool", "function"}
errors = []
for i, line in enumerate(open(sys.argv[1]), 1):
try:
r = json.loads(line)
msgs = r.get("input", r).get("messages", [])
assert isinstance(msgs, list) and msgs, "no messages"
for j, m in enumerate(msgs):
assert m.get("role") in ROLES, f"messages[{j}]: invalid role {m.get('role')!r}"
assert m.get("content") or m.get("function_call") or m.get("tool_calls"), f"messages[{j}]: no content"
if "input" not in r:
assert msgs[-1]["role"] == "assistant", "last message must be from assistant"
tools = r.get("tools")
if tools is not None:
assert isinstance(tools, list), "tools must be a list"
except Exception as e:
errors.append(f" Line {i}: {e}")
print(f"{len(errors)} error(s):\n" + "\n".join(errors) if errors else f"Valid! {i} rows")
sys.exit(1 if errors else 0)
```

The JSONL format supports these fields per row:
- **`messages`** (required): List of chat messages
- **`tools`** (optional): List of tool/function definitions for tool-call training
- **`response_format`** (optional): Structured output schema (not used during training, but useful as metadata)

Report the row count and validation result to the user. Do NOT read the whole dataset file. Do NOT name the dataset. If the format is wrong, help them fix it or convert their data.

## Step 4: Gather Base Parameters

Do NOT ask the user to review or confirm their answers after collecting them — just proceed to the next step.

- **Base model**: Recommend ONLY these models:
- `OpenPipe/Qwen3-14B-Instruct`
- `Qwen/Qwen3-30B-A3B-Instruct-2507`
- `meta-llama/Llama-3.1-8B-Instruct`
- **Project name**: A name for this training project (default: `sft-project`)
- **Run name**: A static, descriptive name (e.g., `agent-001`, `pii-redactor-001`, `math-tutor-001`). Ask the user for a meaningful name. Do NOT generate random names.

For **distillation** also ask:
- **Teacher model**: The larger model to distill from (e.g., an OpenRouter model)
- **Teacher API base URL and key**: If using a third-party provider
- **Prompts**: What prompts to send to the teacher model

## Step 5: Gather Hyperparameters

This step only applies if you can run code AND know the row count from validation. If you cannot run code, skip this step entirely — do NOT make up or guess hyperparameter values. The `train_sft_from_file` function has sensible built-in defaults.

Run this Python snippet via Bash to compute defaults (replace `NUM_ROWS` with the actual row count). Do NOT show any formulas or calculation steps to the user — only show the final values.

```python
import math, sys
n = int(sys.argv[1])
epochs = max(1, min(10, round(10000 / n)))
batch_size = 2
total_steps = math.ceil(n * epochs / batch_size)
steps_per_epoch = math.ceil(n / batch_size)
warmup_steps = max(10, min(1000, round(steps_per_epoch * 0.05)))
warmup_ratio = round(warmup_steps / total_steps, 4)
print(f"epochs={epochs} batch_size={batch_size} lr=2e-4 schedule=linear warmup_ratio={warmup_ratio}")
```

Present the output values to the user, then ask:
- **Use defaults (Recommended)** — show all values in the description
- **Customize** — adjust individual hyperparameters

If they choose "Customize", ask which parameters to change.

### For distillation:
Use the same defaults computation as JSONL (replace `NUM_ROWS` with the number of trajectories). `create_sft_dataset_iterator` handles the LR schedule automatically.

## Step 6: Generate the Training Script

Write a complete, runnable Python script. Use the patterns below. Every script MUST:
- Call `await backend.close()` at the end so the process doesn't hang
- Print post-training info and usage examples (see shared block below)

### Post-training block (append to ALL scripts before `backend.close()`):
```python
# --- Training complete ---
step = await model.get_step()
inference_name = model.get_inference_name()
client = model.openai_client()

print("\n" + "=" * 60)
print("SFT TRAINING COMPLETE")
print("=" * 60)
print(f" Model: {inference_name}")
print(f" Base model: <BASE_MODEL>")
print(f" Training step: {step}")
print(f" Inference URL: {client.base_url}")
print(f" W&B run: https://wandb.ai/<YOUR_TEAM>/<PROJECT_NAME>/runs/<RUN_NAME>")
print("=" * 60)

print("\n--- Python usage (openai SDK) ---\n")
print(f'''\
from openai import OpenAI

client = OpenAI(
base_url="{client.base_url}",
api_key="not-needed",
)

response = client.chat.completions.create(
model="{inference_name}",
messages=[
{{"role": "user", "content": "Your prompt here"}},
],
)
print(response.choices[0].message.content)
''')

print("--- curl usage ---\n")
print(f'''\
curl {client.base_url}chat/completions \\
-H "Content-Type: application/json" \\
-d '{{
"model": "{inference_name}",
"messages": [
{{"role": "user", "content": "Your prompt here"}}
]
}}'
''')

await backend.close()
```

### Backend setup

Use the appropriate backend based on the user's choice:

**LocalBackend:**
```python
from art.local import LocalBackend

backend = LocalBackend()
model = art.TrainableModel(
name="<RUN_NAME>",
project="<PROJECT_NAME>",
base_model="<BASE_MODEL>",
_internal_config=art.dev.InternalModelConfig(
engine_args={"gpu_memory_utilization": 0.7},
),
)
await model.register(backend)
```

**ServerlessBackend:**
```python
from art.serverless.backend import ServerlessBackend

backend = ServerlessBackend() # uses WANDB_API_KEY env var
model = art.TrainableModel(
name="<RUN_NAME>",
project="<PROJECT_NAME>",
base_model="<BASE_MODEL>",
)
await model.register(backend)
```

Note: `_internal_config` with `gpu_memory_utilization` is only used with LocalBackend. Do NOT include it for ServerlessBackend.

### JSONL file training pattern:

If hyperparameters were computed in Step 5, pass them explicitly. If Step 5 was skipped, omit them — `train_sft_from_file` has sensible defaults.

```python
"""SFT training script generated by /train-sft wizard."""
import asyncio
import art
<BACKEND_IMPORT>
from art.utils.sft import train_sft_from_file

async def main():
<BACKEND_SETUP>

await train_sft_from_file(
model=model,
file_path="<FILE_PATH>",
# Only include these if hyperparameters were computed:
# epochs=<EPOCHS>,
# batch_size=<BATCH_SIZE>,
# peak_lr=<PEAK_LR>,
# schedule_type="<SCHEDULE_TYPE>",
# warmup_ratio=<WARMUP_RATIO>,
verbose=True,
)

# ... post-training block + backend.close() ...

if __name__ == "__main__":
asyncio.run(main())
```

### Distillation pattern:
```python
"""Distillation SFT script generated by /train-sft wizard."""
import asyncio, os
from dotenv import load_dotenv
from openai import AsyncOpenAI
import art
<BACKEND_IMPORT>
from art.utils.sft import create_sft_dataset_iterator

load_dotenv()

async def main():
teacher_client = AsyncOpenAI(
api_key=os.environ["<API_KEY_ENV_VAR>"],
base_url="<TEACHER_API_BASE>",
)
prompts = ["<PROMPT_1>", "<PROMPT_2>"]

trajectories = []
for prompt in prompts:
completion = await teacher_client.chat.completions.create(
model="<TEACHER_MODEL>",
messages=[{"role": "user", "content": prompt}],
)
trajectories.append(
art.Trajectory(
messages_and_choices=[
{"role": "user", "content": prompt},
{"role": "assistant", "content": completion.choices[0].message.content},
],
tools=<TOOLS_OR_NONE>,
)
)

<BACKEND_SETUP>

for chunk in create_sft_dataset_iterator(
trajectories,
epochs=<EPOCHS>,
batch_size=<BATCH_SIZE>,
peak_lr=<PEAK_LR>,
schedule_type="<SCHEDULE_TYPE>",
warmup_ratio=<WARMUP_RATIO>,
):
await model.train_sft(chunk.trajectories, chunk.config, verbose=True)

# ... post-training block + backend.close() ...

if __name__ == "__main__":
asyncio.run(main())
```

## Step 7: Write and Offer to Run

1. Write the script to a file (suggest `sft_train.py`)
2. Ask the user if they want to run it now with `uv run python <script_path>`
3. If yes, run it **directly using the Bash tool** (do NOT delegate to a Task subagent) so training logs stream live to the user. Use a **2-minute timeout**. If it times out, check progress and decide whether to continue.
4. **LocalBackend only — GPU memory errors**: If training fails with OOM, lower `gpu_memory_utilization` in the existing `_internal_config` (e.g. from `0.7` to `0.5`).
5. **LocalBackend only — Stale GPU memory**: If available GPU memory looks too small, previous training runs may still be occupying memory. Before retrying, run `nvidia-smi` to check, and if needed kill leftover processes with `kill <pid>` to free memory.

## Important Notes

- LocalBackend requires a GPU.
- ServerlessBackend requires a `WANDB_API_KEY` environment variable.
1 change: 1 addition & 0 deletions .claude/skills
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ replays/
trajectories/
.DS_Store
.local/
.claude/
.claude/settings.local.json
.vscode/
.ruff_cache/
!/src/art/wandb/
Expand Down
Loading