A Python-based training pipeline for fine-tuning GPT models on the Nvidia Nemotron dataset.
- Multi-phase training with progressive learning rates
- Memory-optimized data loading and processing
- CUDA optimizations for better GPU utilization
- Dataset validation and analysis tools
- Progress tracking with detailed statistics
- Checkpoint management and safe model saving
- Interactive dataset management system
- Python 3.8+
- PyTorch with CUDA support
- Transformers library
- Datasets library
- tqdm for progress bars
- CUDA-capable GPU with 8GB+ VRAM
pip install torch transformers datasets tqdm
- Set up your environment variables:
os.environ["HUGGINGFACE_TOKEN"] = "your_token_here"
- Run the training script:
python model_trainer_v_1_1.py
-
Replace [User] with the user you are currently logged into your computer as.
-
Follow the interactive prompts to manage datasets and start training.
-
Chat with the AI by running chat.py
The training pipeline is configured through the DATASET_CONFIGS
dictionary in the script.
You can modify training parameters like batch size, learning rate, and model checkpointing
frequency in the train()
function.
BSD3 liscense.