Notebook fix #208

surfiniaburger · 2025-11-18T02:00:21Z

Instead of Print Statements I switched to logger and also saved the logs as the server is booted. So while the server runs we can easily monitor its health. And previously, I was using a reward system that only focused on outcome based scoring, inspired by RLVR I moved to process based scoring. based on a few training I made, I observed that some of the output from the model where malformed, so I moved to "Format-First" Hierarchical Curriculum, where if the model fails the format, we don't consider its reasoning trace. The details to all relevant changes I made are in here.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

pankit-eng · 2025-12-02T22:26:22Z

@surfiniaburger - do you mind adding description on the PR as to what the issue was and what was the fix?

surfiniaburger · 2025-12-03T20:26:40Z

@pankit-eng I have included the description. I have been undertaking a major refactor, which provides for MCP and A2A. The goal is to build out the pipeline for the DIPG safety Gym as a means to learn more about LLM Alignment and document my findings. It's no small task. Aside from the PR, I'd love to know the areas I should do more research on. Your feedback would be highly appreciated.

burtenshaw · 2025-12-04T09:19:58Z

@surfiniaburger This looks ok because the changes are purely within the DIPG env. Could you do these to merge:

merge changes from main.
push the env to the hf hub.
updated docs/environments.md with the dipg env and link to the env on the hub

In a future PR, you might also want to add a guide to the docs which explains how to use the dipg env.

surfiniaburger · 2025-12-04T14:45:36Z

I'll look into it now

surfiniaburger · 2025-12-04T22:01:55Z

@burtenshaw Thanks for the heads up, I'll work towards adding a guide.

surfiniaburger and others added 12 commits November 18, 2025 02:02

Update examples/dipg/dipg-rl.ipynb

5b88e37

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

hyperparameter finetuning

f988134

+1

30f2e60

removed security vulnerability

86e733a

update notebook

18d2ad2

sft eval

d43b8c3

update notebook

ad2f0c6

Update src/envs/dipg_safety_env/README.md

7b32d18

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add new test

66b7131

hierarchical logic

9e7e0a5

update notebook

ddb101a

update notebook

a4dee23

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2025

jspisak added the bug Something isn't working label Nov 19, 2025

burtenshaw self-requested a review December 4, 2025 09:20

surfiniaburger added 2 commits December 4, 2025 18:09

Merge remote-tracking branch 'upstream/main' into notebook-fix

2d89c28

docs: add link to dipg_safety_env hugginface space

bd3d905

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notebook fix #208

Notebook fix #208

Uh oh!

surfiniaburger commented Nov 18, 2025 •

edited

Loading

Uh oh!

pankit-eng commented Dec 2, 2025

Uh oh!

surfiniaburger commented Dec 3, 2025

Uh oh!

burtenshaw commented Dec 4, 2025

Uh oh!

surfiniaburger commented Dec 4, 2025 •

edited

Loading

Uh oh!

surfiniaburger commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Notebook fix #208

Are you sure you want to change the base?

Notebook fix #208

Uh oh!

Conversation

surfiniaburger commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pankit-eng commented Dec 2, 2025

Uh oh!

surfiniaburger commented Dec 3, 2025

Uh oh!

burtenshaw commented Dec 4, 2025

Uh oh!

surfiniaburger commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

surfiniaburger commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

surfiniaburger commented Nov 18, 2025 •

edited

Loading

surfiniaburger commented Dec 4, 2025 •

edited

Loading