Skip to content

Conversation

@surfiniaburger
Copy link
Contributor

@surfiniaburger surfiniaburger commented Nov 18, 2025

Instead of Print Statements I switched to logger and also saved the logs as the server is booted. So while the server runs we can easily monitor its health. And previously, I was using a reward system that only focused on outcome based scoring, inspired by RLVR I moved to process based scoring. based on a few training I made, I observed that some of the output from the model where malformed, so I moved to "Format-First" Hierarchical Curriculum, where if the model fails the format, we don't consider its reasoning trace. The details to all relevant changes I made are in here.

surfiniaburger and others added 12 commits November 18, 2025 02:02
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2025
@jspisak jspisak added the bug Something isn't working label Nov 19, 2025
@pankit-eng
Copy link
Contributor

@surfiniaburger - do you mind adding description on the PR as to what the issue was and what was the fix?

@surfiniaburger
Copy link
Contributor Author

@pankit-eng I have included the description. I have been undertaking a major refactor, which provides for MCP and A2A. The goal is to build out the pipeline for the DIPG safety Gym as a means to learn more about LLM Alignment and document my findings. It's no small task. Aside from the PR, I'd love to know the areas I should do more research on. Your feedback would be highly appreciated.

@burtenshaw
Copy link
Collaborator

@surfiniaburger This looks ok because the changes are purely within the DIPG env. Could you do these to merge:

  • merge changes from main.
  • push the env to the hf hub.
  • updated docs/environments.md with the dipg env and link to the env on the hub

In a future PR, you might also want to add a guide to the docs which explains how to use the dipg env.

@burtenshaw burtenshaw self-requested a review December 4, 2025 09:20
@surfiniaburger
Copy link
Contributor Author

surfiniaburger commented Dec 4, 2025

I'll look into it now

@surfiniaburger
Copy link
Contributor Author

@burtenshaw Thanks for the heads up, I'll work towards adding a guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants