Hi all! I find this interesting, and I would like to participate.
However, it's unclear to me what the "goal" is. I.e., when should we stop the clock?
- When we reach a certain training validation loss?
- When we reach a certain generation quality, according to some fidelity metric?
- Both?
Additionally, when should the clock be running? In the modded-nanogpt speedrun, we only allow the clock to run during training loops, including data fetching between steps, but not during validation. I propose we do the same as modded-nanogpt and make this explicit and also log everything into text files.
And IMO, it's best to have an initial, downloadable benchmark logs we can compare against.
Hi all! I find this interesting, and I would like to participate.
However, it's unclear to me what the "goal" is. I.e., when should we stop the clock?
Additionally, when should the clock be running? In the
modded-nanogptspeedrun, we only allow the clock to run during training loops, including data fetching between steps, but not during validation. I propose we do the same asmodded-nanogptand make this explicit and also log everything into text files.And IMO, it's best to have an initial, downloadable benchmark logs we can compare against.