diff --git a/sdk/python/foundation-models/system/reinforcement-learning/reinforcement-learning.ipynb b/sdk/python/foundation-models/system/reinforcement-learning/reinforcement-learning.ipynb new file mode 100644 index 000000000..8eac40859 --- /dev/null +++ b/sdk/python/foundation-models/system/reinforcement-learning/reinforcement-learning.ipynb @@ -0,0 +1,763 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " Ignite Demo to Train, Customize, Optimize and Host Reasoning Models in AzureML\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "

Sections Breakdown

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
    \n", + "
  1. 🔧 Setup Workspace: Configure Azure ML workspace and authenticate
  2. \n", + "
  3. 🧠 RFT Training (GRPO): Fine-tune reasoning model using Group Relative Policy Optimization
  4. \n", + "
  5. RFT Training (Reinforce++): Fine-tune using critic-free reinforcement learning
  6. \n", + "
  7. 📦 Create Data Assets: Convert pipeline outputs to reusable data assets
  8. \n", + "
  9. 📊 Model Performance Comparison: Evaluate and compare base model vs GRPO vs Reinforce++
  10. \n", + "
  11. 🎯 Create Draft Model: Train EAGLE3 draft model for speculative decoding
  12. \n", + "
  13. 🔗 Combine Draft and Base Model: Package base and draft models for deployment
  14. \n", + "
  15. 🚀 Deploy Speculative Endpoint: Deploy managed online endpoint with speculative decoding
  16. \n", + "
  17. 📡 Deploy Base Endpoint: Deploy baseline endpoint for performance comparison
  18. \n", + "
  19. 🧪 Test Base and Speculative Decoding Endpoints: Validate both endpoints with inference requests
  20. \n", + "
  21. 📈 Endpoints Performance Evaluation: Compare metrics between base and speculative decoding endpoints
  22. \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

Prerequisites & Requirements

\n", + "
\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Compute Requirements\n", + "* **Training:** Standard_ND96isr_H100_v5, Standard_ND96amsr_A100_v4\n", + "* **Deployment:** Kubernetes cluster with GPU instances (octagpu)\n", + "##### Dataset & Models\n", + "* **Dataset:** [FinQA](https://finqasite.github.io/) - 2.8k financial reports with 8k Q&A pairs\n", + "* **Models:** [Llama-3.1-8B-Instruct-FP8](https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP8), [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " 💡 Note: Ensure your Azure ML workspace has access to the required compute resources and GPU instances before proceeding with the training and deployment steps.\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " RFT Finetuning - GRPO & Reinforce Plus Plus\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "

⚙️ Section 1. Setup Workspace and Register Components

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "markdown" + } + }, + "source": [ + "

This section establishes connectivity to your workspace and sets up the required authentication.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -r requirements.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "from scripts.utils import setup_workspace\n", + "from scripts.dataset import prepare_finqa_dataset\n", + "from scripts.run import get_run_metrics\n", + "from scripts.reinforcement_learning import run_rl_training_pipeline\n", + "from scripts.evaluation import run_evaluation_pipeline\n", + "from scripts.speculative_decoding import (\n", + " run_draft_model_pipeline,\n", + " prepare_combined_model_for_deployment,\n", + " deploy_speculative_decoding_endpoint,\n", + ")\n", + "from scripts.deployment import create_managed_deployment, test_deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Setup Azure ML workspace and registry connections\n", + "ml_client, registry_ml_client = setup_workspace(\n", + " config_path=\"./config.json\", registry_name=\"Ignite_2025_Demo\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Prepare dataset for Finetuning. This would save train, test and valid dataset under data folder

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "train_data_path, test_data_path, valid_data_path = prepare_finqa_dataset(\n", + " ml_client, data_dir=\"data\", register_datasets=False\n", + ") # Prepare the FinQA dataset for training and evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "##### 📖 Components and Pipelines used in this notebook can be installed locally by following the instructions listed here : [Ignite Components and Pipelines](Ignite_Components_And_Pipelines/README.md)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## \n", + "\n", + "
\n", + "

🧩 Section 2. Run RFT Training Pipeline (GRPO)

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

GRPO (Group Relative Policy Optimization) is an advanced reinforcement learning technique for fine-tuning LLMs that uses relative learning instead of absolute rewards by comparing model outputs within groups/batches. \n", + "