Robotics-Ark · Refinath · Jan 6, 2026 · Nov 30, 2025 · Dec 3, 2025 · Dec 3, 2025
diff --git a/README.md b/README.md
@@ -94,4 +94,88 @@ arkml.tools.train algo=<ml_algorithm> \
  data.dataset_path=/path/to/dataset \
  output_dir=/output/path
 
-```
+```
+
+## Pi0.5
+
+Pi0.5 is an upgraded version of the Pi0 Vision-Language-Action model with enhanced capabilities for robotic manipulation tasks. It features a multi-stage training approach with flow matching for precise action prediction.
+
+### Training Stages
+
+#### Pretraining Stage
+The pretraining stage focuses on learning foundational representations using multiple modalities and FAST tokenization:
+
+```bash
+CUDA_VISIBLE_DEVICES=0 HYDRA_FULL_ERROR=1 \
+arkml-train algo=pi05 \
+ data.dataset_path=/path/to/pi05/dataset \
+ output_dir=/output/path \
+ algo.model.policy_type=pi0.5 \
+ algo.training.stage=pretrain \
+ algo.training.pretrain_steps=280000
+```
+
+The pretraining stage optimizes:
+- Cross-entropy loss for text tokens (CE(text))
+- Cross-entropy loss for FAST tokens (CE(FAST tokens))
+
+#### Post-training Stage
+The post-training stage refines the model with flow matching and subtask prediction:
+
+```bash
+CUDA_VISIBLE_DEVICES=0 HYDRA_FULL_ERROR=1 \
+arkml-train algo=pi05 \
+ data.dataset_path=/path/to/pi05/dataset \
+ output_dir=/output/path \
+ algo.model.policy_type=pi0.5 \
+ algo.training.stage=posttrain \
+ algo.training.posttrain_steps=80000 \
+ algo.training.flow_alpha=10.0
+```
+
+The post-training stage optimizes:
+- Cross-entropy loss for subtasks (CE(subtask))
+- Flow matching loss weighted by alpha (alpha * flow_matching_loss)
+
+### Running Inference
+
+To run inference with a trained Pi0.5 model:
+
+```bash
+HYDRA_FULL_ERROR=1 arkml-policy algo=pi05 \
+  algo.model.model_path=path/to/pi05/model \
+  policy_node_name=pi05_node
+```
+
+You can then call the inference endpoints:
+- `pi05_node/policy/predict` - Get next action prediction
+- `pi05_node/policy/reset` - Reset policy state
+- `pi05_node/policy/start` - Start policy service
+- `pi05_node/policy/stop` - Stop policy service
+
+### Configuration Explanation
+
+The Pi0.5 configuration includes several key parameters:
+
+**Model Configuration:**
+- `model.backbone_type`: Vision-language backbone architecture (e.g., 'siglip_gemma')
+- `model.use_fast_tokens`: Whether to use FAST tokenizer for action discretization
+- `model.use_flow_matching`: Whether to use flow matching for action prediction
+
+**Training Configuration:**
+- `training.stage`: Current training stage ('pretrain' or 'posttrain')
+- `training.pretrain_steps`: Number of steps for pretraining (280000 default)
+- `training.posttrain_steps`: Number of steps for post-training (80000 default)
+- `training.integration_steps`: Number of steps for Euler integration in flow matching
+- `training.flow_alpha`: Weight for flow matching loss (10.0 default)
+
+**Dataset Configuration:**
+The dataset configuration uses mixture sampling with:
+- Primary dataset for main training data
+- Secondary datasets for auxiliary data
+- Configurable weights for balancing different data sources
+
+The model uses a multi-head architecture with:
+- Subtask head for high-level task planning
+- FAST head for discretized action prediction
+- Flow head for continuous action prediction using flow matching
diff --git a/arkml/algos/vla/pi05/README.md b/arkml/algos/vla/pi05/README.md
@@ -0,0 +1,190 @@
+# Pi0.5 Implementation
+
+This directory contains the complete Pi0.5 implementation following the HuggingFace wrapper pattern for the Ark ML framework.
+
+## Architecture Overview
+
+Pi0.5 is an advanced Vision-Language-Action model that implements:
+- **Multi-stage training**: Pretraining (CE(text) + CE(FAST tokens)) and Post-training (CE(subtask) + α × flow_matching_loss)
+- **Flow matching**: For precise action prediction using vector field networks
+- **Multiple prediction heads**: Subtask, FAST, and flow heads
+- **Enhanced backbone**: Support for SigLIP-Gemma vision-language architecture
+
+## Directory Structure
+
+```
+pi05/
+├── models.py           # Core Pi0.5 policy (HuggingFace wrapper)
+├── algorithm.py        # Training algorithm
+├── trainer.py          # Multi-stage trainer
+├── evaluator.py        # Evaluation metrics
+├── dataset.py          # Multi-modality dataset
+├── config_utils.py     # Configuration utilities
+├── compute_stats.py    # Statistics computation
+├── utils.py           # Utility functions
+└── README.md          # This file
+```
+
+## Usage Instructions
+
+### 1. Loading a Pre-trained Model
+
+```python
+from arkml.algos.vla.pi05.models import Pi05Policy
+
+# Load from Hugging Face Hub or local path
+policy = Pi05Policy(
+    policy_type='pi0.5',
+    model_path='your-huggingface-username/pi05-model',  # or local path
+    backbone_type='siglip_gemma',  # Vision-language backbone
+    use_fast_tokens=True,          # Enable FAST tokenization
+    use_flow_matching=True,        # Enable flow matching
+    obs_dim=9,                     # Observation dimension
+    action_dim=8,                  # Action dimension  
+    image_dim=(3, 480, 640),      # Image dimensions (C, H, W)
+    pred_horizon=1                 # Prediction horizon
+)
+
+# Move to device
+policy = policy.to_device('cuda')
+```
+
+### 2. Making Predictions
+
+```python
+import torch
+
+# Prepare observation dictionary
+observation = {
+    'image': torch.randn(1, 3, 224, 224),  # Image tensor
+    'state': torch.randn(9),               # State vector
+    'task': 'pick up the red block'        # Task instruction (optional)
+}
+
+# Get action prediction
+action = policy.predict(observation)
+print(f"Predicted action: {action}")
+```
+
+### 3. Training a New Model
+
+```python
+from arkml.algos.vla.pi05.algorithm import Pi05Algorithm
+from arkml.algos.vla.pi05.dataset import create_pi05_dataloader
+from omegaconf import DictConfig
+
+# Create your dataset and dataloader
+train_dataloader = create_pi05_dataloader(
+    dataset_path='path/to/your/dataset',
+    batch_size=8,
+    shuffle=True
+)
+
+# Load your policy
+policy = Pi05Policy(
+    policy_type='pi0.5',
+    model_path='path/to/pretrained/model',  # Or use a base model
+    # ... other parameters
+)
+
+# Configure training
+config = DictConfig({
+    'trainer': {
+        'lr': 2e-4,
+        'batch_size': 8,
+        'max_epochs': 10,
+        'weight_decay': 0.01,
+        'num_workers': 4,
+        'use_bf16': True
+    },
+    'training': {
+        'stage': 'pretrain',      # 'pretrain' or 'posttrain'
+        'flow_alpha': 10.0,       # Weight for flow matching loss
+        'pretrain_steps': 280000, # Steps for pretraining
+        'posttrain_steps': 80000  # Steps for post-training
+    }
+})
+
+# Create algorithm and train
+algorithm = Pi05Algorithm(policy=policy, device='cuda', cfg=config)
+results = algorithm.train(train_dataset=your_train_dataset)
+```
+
+### 4. Configuration Options
+
+Key configuration parameters:
+
+- `backbone_type`: Vision-language backbone ('siglip_gemma', etc.)
+- `use_fast_tokens`: Whether to use FAST tokenization for action discretization
+- `use_flow_matching`: Whether to use flow matching for action prediction
+- `training_stage`: 'pretrain' or 'posttrain' for multi-stage training
+- `flow_alpha`: Weight for flow matching loss (default: 10.0)
+
+## Training Stages
+
+Pi0.5 supports multi-stage training:
+
+### Pretraining Stage
+```
+CE(text) + CE(FAST tokens)
+```
+- Focuses on learning foundational representations
+- Uses multiple modalities and FAST tokenization
+
+### Post-training Stage  
+```
+CE(subtask) + α × flow_matching_loss
+```
+- Refines the model with flow matching and subtask prediction
+- Enables precise action prediction using flow matching
+
+## Evaluation Metrics
+
+The evaluator provides comprehensive metrics:
+- Action MSE and MAE
+- Accuracy within threshold
+- Subtask prediction accuracy
+- Multi-modality evaluation
+
+## Integration with LeRobot
+
+This implementation uses the LeRobot Pi0.5 policy under the hood:
+- Follows LeRobot's model architecture
+- Compatible with LeRobot datasets and tools
+- Supports LeRobot's training and evaluation pipelines
+
+## Example Usage Script
+
+For a complete example, see the example script that demonstrates:
+- Model loading
+- Training setup
+- Prediction workflow
+- Evaluation process
+
+## Requirements
+
+- LeRobot >= 0.4.3
+- Transformers
+- PyTorch >= 1.12
+- Compatible with ark_ml framework
+
+## Testing
+
+Run tests to verify functionality:
+```bash
+python -m pytest tests_and_benchmarks/pi05_tests/
+```
+
+## Benchmarks
+
+Run performance benchmarks:
+```bash
+python tests_and_benchmarks/pi05_benchmarks/benchmark_pi05.py
+```
+
+## Notes
+
+- This implementation follows the same pattern as PiZero for consistency
+- Multi-stage training requires different dataset configurations for each stage
+- Flow matching is particularly effective for precise manipulation tasks
+- FAST tokenization enables efficient action discretization during pretraining
diff --git a/__init__.py → arkml/algos/vla/pi05/__init__.py b/__init__.py → arkml/algos/vla/pi05/__init__.py
diff --git a/arkml/algos/vla/pi05/algorithm.py b/arkml/algos/vla/pi05/algorithm.py
@@ -0,0 +1,103 @@
+from typing import Any
+import torch
+from torch.utils.data import DataLoader
+from arkml.core.algorithm import BaseAlgorithm
+from arkml.core.policy import BasePolicy
+from arkml.core.registry import ALGOS
+from arkml.algos.vla.pi05.trainer import Pi05Trainer
+from arkml.algos.vla.pi05.evaluator import Pi05Evaluator
+from omegaconf import DictConfig
+
+@ALGOS.register("pi05")
+class Pi05Algorithm(BaseAlgorithm):
+    """
+    Algorithm wrapper for Pi0.5 training and evaluation.
+    Implements the complete training pipeline for Pi0.5 with multi-stage training.
+    """
+
+    def __init__(self, policy: BasePolicy, device: str, cfg: DictConfig) -> None:
+        self.policy = policy
+        self.device = device
+        self.cfg = cfg
+
+        # Extract training configuration
+        self.lr = cfg.trainer.get('lr', 2e-4)
+        self.batch_size = cfg.trainer.get('batch_size', 8)
+        self.max_epochs = cfg.trainer.get('max_epochs', 10)
+        self.weight_decay = cfg.trainer.get('weight_decay', 0.0)
+        self.num_workers = cfg.trainer.get('num_workers', 4)
+        self.use_bf16 = cfg.trainer.get('use_bf16', True)
+
+        # Training-specific config
+        self.training_stage = cfg.training.get('stage', 'pretrain')
+        self.flow_alpha = cfg.training.get('flow_alpha', 10.0)
+        self.pretrain_steps = cfg.training.get('pretrain_steps', 280000)
+        self.posttrain_steps = cfg.training.get('posttrain_steps', 80000)
+        self.integration_steps = cfg.training.get('integration_steps', 10)
+
+    def train(self, train_dataset, val_dataset=None) -> Any:
+        """
+        Train the Pi0.5 model with multi-stage approach.
+        """
+        # Create data loaders
+        train_dataloader = torch.utils.data.DataLoader(
+            train_dataset,
+            batch_size=self.batch_size,
+            shuffle=True,
+            num_workers=self.num_workers,
+            pin_memory=True
+        )
+
+        val_dataloader = None
+        if val_dataset:
+            val_dataloader = torch.utils.data.DataLoader(
+                val_dataset,
+                batch_size=self.batch_size,
+                shuffle=False,
+                num_workers=self.num_workers,
+                pin_memory=True
+            )
+
+        # Initialize trainer with config
+        trainer = Pi05Trainer(
+            model=self.policy,
+            dataloader=train_dataloader,
+            device=self.device,
+            lr=self.lr,
+            weight_decay=self.weight_decay,
+            num_epochs=self.max_epochs,
+            grad_accum=1.0,  # Gradient accumulation
+            output_dir='./output',  # TODO: Get from config
+            use_bf16=self.use_bf16,
+            flow_alpha=self.flow_alpha,
+            val_dataloader=val_dataloader,
+            eval_every=1
+        )
+
+        # Set the training stage on the model
+        self.policy.training_stage = self.training_stage
+
+        # Perform training based on stage
+        return trainer.fit()
+
+    def eval(self, eval_dataset) -> dict:
+        """
+        Evaluate the Pi0.5 model performance.
+        """
+        eval_dataloader = torch.utils.data.DataLoader(
+            eval_dataset,
+            batch_size=self.batch_size,
+            shuffle=False,
+            num_workers=self.num_workers,
+            pin_memory=True
+        )
+
+        # Initialize evaluator
+        evaluator = Pi05Evaluator(
+            model=self.policy,
+            dataloader=eval_dataloader,
+            device=self.device
+        )
+
+        # Perform evaluation
+        return evaluator.evaluate()