-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Background
Megatron-LM and Megatron Core are geared toward large-scale, multi-GPU and multi-node training. While this is essential for production and research at scale, it raises the entry barrier for new contributors and developers who don't have access to large GPU clusters. A minimal, local quickstart that runs on a single GPU (and provides a CPU fallback) would make the project much more accessible: it enables quicker development cycles, easier PR testing for external contributors, and a fast sanity-check path in CI.
Proposed change
- Add an
examples/quickstart/directory containing:train_small_gpt.py: a simplified training wrapper that runs a tiny GPT-like model (few layers, small hidden size) for a configurable number of steps.configs/quickstart_small.yaml: minimal config for model size, dataset (synthetic/random data), optimizer, and runtime settings.
- Add a doc page
docs/get-started/local_quickstart.mdwith step-by-step instructions for:- setting up a Python virtual environment,
- installing only the minimal dependencies required for the quickstart (torch, numpy, etc.),
- running the example on a single GPU and on CPU (fallback).
- Ensure the quickstart does not require TransformerEngine, FP8, or other optional binary dependencies. Use synthetic/random data so no dataset downloads are needed.
- Optionally, add a lightweight script or Makefile target to run the quickstart easily (e.g.,
tools/quickstart/run_quickstart.sh).
Acceptance criteria
- A user can follow
docs/get-started/local_quickstart.mdand run a full training loop locally on:- a single GPU (if available), completing in a short time (e.g., <10 minutes for default steps),
- or on CPU (with a longer but still reasonable runtime for verification).
- The example uses synthetic/random input data and does not require external datasets or heavy optional dependencies.
- The new files are placed under
examples/quickstart/and the docs are linked from the main README.md under "Getting Started". - Example is minimal, easy to read, and well-documented; it includes suggested command lines and expected lightweight output.
Implementation notes / suggestions
train_small_gpt.pycan be a thin wrapper that constructs a minimal model from transformer or a small custom torch.nn.Module that mimics the model shape used by Megatron modules. It should:- accept a
--deviceflag (cudaorcpu) and a--stepsflag for number of optimization steps, - simulate datasets using random tensors to avoid downloads,
- log a couple of metrics (loss) to stdout so users can verify training progress.
- accept a
- Keep the dependency list in docs minimal:
torch,numpy, andpyyamlfor configs. Mark transformer-engine and other heavy extras as optional for this quickstart. - Add a short CI sanity job (optional) that runs the example on CPU to verify the quickstart remains functional. This job can be lightweight and time-limited.
Suggested branch name
doc/quickstart-single-gpu
Labels
documentation, enhancement, good-first-issue
Estimated difficulty
Low
I am willing to submit a PR
I’m happy to implement this and submit a PR that adds examples/quickstart/ and docs/get-started/local_quickstart.md. The PR will include the minimal script(s), config, and documentation. If maintainers prefer a slightly different layout, I can adapt the changes to match repo conventions.