[ENHANCEMENT] Add a minimal "quickstart" example for single-GPU / CPU local runs

## Background
Megatron-LM and Megatron Core are geared toward large-scale, multi-GPU and multi-node training. While this is essential for production and research at scale, it raises the entry barrier for new contributors and developers who don't have access to large GPU clusters. A minimal, local quickstart that runs on a single GPU (and provides a CPU fallback) would make the project much more accessible: it enables quicker development cycles, easier PR testing for external contributors, and a fast sanity-check path in CI.

## Proposed change
- Add an `examples/quickstart/` directory containing:
  - `train_small_gpt.py`: a simplified training wrapper that runs a tiny GPT-like model (few layers, small hidden size) for a configurable number of steps.
  - `configs/quickstart_small.yaml`: minimal config for model size, dataset (synthetic/random data), optimizer, and runtime settings.
- Add a doc page `docs/get-started/local_quickstart.md` with step-by-step instructions for:
  - setting up a Python virtual environment,
  - installing only the minimal dependencies required for the quickstart (torch, numpy, etc.),
  - running the example on a single GPU and on CPU (fallback).
- Ensure the quickstart does not require TransformerEngine, FP8, or other optional binary dependencies. Use synthetic/random data so no dataset downloads are needed.
- Optionally, add a lightweight script or Makefile target to run the quickstart easily (e.g., `tools/quickstart/run_quickstart.sh`).

## Acceptance criteria
- A user can follow `docs/get-started/local_quickstart.md` and run a full training loop locally on:
  - a single GPU (if available), completing in a short time (e.g., <10 minutes for default steps),
  - or on CPU (with a longer but still reasonable runtime for verification).
- The example uses synthetic/random input data and does not require external datasets or heavy optional dependencies.
- The new files are placed under `examples/quickstart/` and the docs are linked from the main README.md under "Getting Started".
- Example is minimal, easy to read, and well-documented; it includes suggested command lines and expected lightweight output.

## Implementation notes / suggestions
- `train_small_gpt.py` can be a thin wrapper that constructs a minimal model from transformer or a small custom torch.nn.Module that mimics the model shape used by Megatron modules. It should:
  - accept a `--device` flag (`cuda` or `cpu`) and a `--steps` flag for number of optimization steps,
  - simulate datasets using random tensors to avoid downloads,
  - log a couple of metrics (loss) to stdout so users can verify training progress.
- Keep the dependency list in docs minimal: `torch`, `numpy`, and `pyyaml` for configs. Mark transformer-engine and other heavy extras as optional for this quickstart.
- Add a short CI sanity job (optional) that runs the example on CPU to verify the quickstart remains functional. This job can be lightweight and time-limited.

Suggested branch name
doc/quickstart-single-gpu

Labels
documentation, enhancement, good-first-issue

Estimated difficulty
Low

I am willing to submit a PR
I’m happy to implement this and submit a PR that adds `examples/quickstart/` and `docs/get-started/local_quickstart.md`. The PR will include the minimal script(s), config, and documentation. If maintainers prefer a slightly different layout, I can adapt the changes to match repo conventions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Add a minimal "quickstart" example for single-GPU / CPU local runs #3994

Background

Proposed change

Acceptance criteria

Implementation notes / suggestions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENHANCEMENT] Add a minimal "quickstart" example for single-GPU / CPU local runs #3994

Description

Background

Proposed change

Acceptance criteria

Implementation notes / suggestions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions