feat: v0.1.0 Refactor & Packaging #89

zbloss · 2025-10-01T19:54:28Z

This pull request introduces infrastructure and documentation improvements to streamline development, testing, and deployment for the HRM repository. The most significant changes are the addition of a multi-stage GPU-enabled Dockerfile, a GitHub Actions CI workflow for Python linting and tests, and expanded installation and usage instructions in the README.md. There are also updates to dataset and training script usage, and minor code cleanups in the evaluation notebook.

Infrastructure & Deployment

Added a multi-stage Dockerfile supporting both FlashAttention 2 and 3 for Ampere and Hopper GPUs, including CUDA 12.6, Python 3.12, and optimized dependency installation using uv. This enables reproducible GPU builds for both development and production.
Added .dockerignore to exclude unnecessary files and directories (such as caches, data, checkpoints, notebooks, and test artifacts) from Docker build context, reducing image size and improving build performance.

Continuous Integration

Introduced a GitHub Actions workflow (.github/workflows/ci.yml) to automatically lint and test the codebase on Python 3.11, 3.12, and 3.13, ensuring code quality and compatibility across multiple Python versions.
Specified the default Python version as 3.12 in .python-version for consistent local and CI environments.

Documentation & Usage

Expanded the README.md with detailed package structure, installation options (including uv, pip, and Docker), FlashAttention setup instructions, Python API usage example, and updated commands for dataset preparation, training, and evaluation to use the new scripts/ directory and uv run. [1] [2] [3] [4] [5]

Notebook Cleanup

Minor import reordering and formatting improvements in arc_eval.ipynb for clarity and consistency. [1] [2]

Issues

Closes #88

feat: Add CI workflow and remove pre-commit hooks

zbloss · 2025-10-01T19:57:13Z

I ran examples/02_train_sudoku_extreme.py to confirm the code still works as expected and it does.

WandB Results

I was not able to replicate the results in the paper but I am running on a much smaller GPU so I had to decrease the batch size and learning rate which I believe is the core issue with my results.

alexander-rakhlin · 2025-10-02T14:29:44Z

@zbloss were you able to run evaluation of their ARC-2 checkpoint I'm getting errors regarding size mismatch

zbloss · 2025-10-02T14:31:38Z

@zbloss was you able to run evaluation of their ARC-2 checkpoint I'm getting errors re. size mismatch

I did not try to load the existing checkpoints, I couldn't get them to load before these changes with similar issues.

alexander-rakhlin · 2025-10-02T14:34:39Z

I did not try to load the existing checkpoints, I couldn't get them to load before these changes with similar issues.

So this checkpoint works after your changes?

zbloss · 2025-10-02T16:07:43Z

No it does not work before or with the changes

zbloss · 2025-10-04T18:46:20Z

@alexander-rakhlin I have opened up a PR to add this model to Huggingface's Transformers library with working checkpoints in safetensor format.

I'm waiting on the 🤗 team to review and approve, so you'll have to pull my fork if you want to use it immediately.

huggingface/transformers#41272

Weights are here:

alexander-rakhlin · 2025-10-04T20:03:52Z

@zbloss thank you. I also trained Sudoku and it works just fine, except for invalid puzzles with multiple solutions. Currently, I am training ARC-1. I think I found the reason why their checkpoint fails and will let you know once I verify it.

alexander-rakhlin · 2025-10-06T22:39:29Z

@zbloss
#90 (comment)

zbloss added 5 commits October 1, 2025 15:33

feat: Add CI workflow and remove pre-commit hooks

38fb949

Install dependencies

2c20cb0

uv run pytest

ce02c8f

consolidate dependency groups

42b5e2c

Merge pull request #1 from zbloss/zbloss/v0.1.0

7e7084b

feat: Add CI workflow and remove pre-commit hooks

zbloss mentioned this pull request Oct 1, 2025

Add Hierarchical Reasoning Model huggingface/transformers#41271

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: v0.1.0 Refactor & Packaging #89

feat: v0.1.0 Refactor & Packaging #89

Uh oh!

zbloss commented Oct 1, 2025 •

edited

Loading

Uh oh!

zbloss commented Oct 1, 2025

Uh oh!

alexander-rakhlin commented Oct 2, 2025 •

edited

Loading

Uh oh!

zbloss commented Oct 2, 2025

Uh oh!

alexander-rakhlin commented Oct 2, 2025

Uh oh!

zbloss commented Oct 2, 2025

Uh oh!

zbloss commented Oct 4, 2025

Uh oh!

alexander-rakhlin commented Oct 4, 2025

Uh oh!

alexander-rakhlin commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: v0.1.0 Refactor & Packaging #89

Are you sure you want to change the base?

feat: v0.1.0 Refactor & Packaging #89

Uh oh!

Conversation

zbloss commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues

Uh oh!

zbloss commented Oct 1, 2025

Uh oh!

alexander-rakhlin commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zbloss commented Oct 2, 2025

Uh oh!

alexander-rakhlin commented Oct 2, 2025

Uh oh!

zbloss commented Oct 2, 2025

Uh oh!

zbloss commented Oct 4, 2025

Uh oh!

alexander-rakhlin commented Oct 4, 2025

Uh oh!

alexander-rakhlin commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zbloss commented Oct 1, 2025 •

edited

Loading

alexander-rakhlin commented Oct 2, 2025 •

edited

Loading