-
Notifications
You must be signed in to change notification settings - Fork 119
Notebook fix #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Notebook fix #208
Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@surfiniaburger - do you mind adding description on the PR as to what the issue was and what was the fix? |
|
@pankit-eng I have included the description. I have been undertaking a major refactor, which provides for MCP and A2A. The goal is to build out the pipeline for the DIPG safety Gym as a means to learn more about LLM Alignment and document my findings. It's no small task. Aside from the PR, I'd love to know the areas I should do more research on. Your feedback would be highly appreciated. |
|
@surfiniaburger This looks ok because the changes are purely within the DIPG env. Could you do these to merge:
In a future PR, you might also want to add a guide to the docs which explains how to use the dipg env. |
|
I'll look into it now |
|
@burtenshaw Thanks for the heads up, I'll work towards adding a guide. |
Instead of Print Statements I switched to logger and also saved the logs as the server is booted. So while the server runs we can easily monitor its health. And previously, I was using a reward system that only focused on outcome based scoring, inspired by RLVR I moved to process based scoring. based on a few training I made, I observed that some of the output from the model where malformed, so I moved to "Format-First" Hierarchical Curriculum, where if the model fails the format, we don't consider its reasoning trace. The details to all relevant changes I made are in here.