Skip to content

and-per-i/too-simplex

Repository files navigation

🚀 Simplicial Transformer with KV Cache Optimization

Python 3.8+ PyTorch License: MIT Code style: black

Efficient 2-Simplicial Transformer with Low-Rank KV Cache Compression – A research implementation for memory-efficient autoregressive generation.

📖 Overview

This repository implements a 2-Simplicial Transformer with optimized KV cache management, following the architecture from the Fast & Simplex paper. The project focuses on memory-efficient autoregressive generation through innovative cache compression techniques.

🔬 Research Contributions

  1. 2-Simplicial Attention: Implements the novel attention mechanism with (K₁, K₂, V₁, V₂) cache structure
  2. Low-Rank Compression: SVD-based compression of KV cache matrices for significant memory reduction
  3. Hybrid Selection: Combines L2-norm selection with low-rank compression for optimal quality-memory tradeoff
  4. Incremental Optimization: PyTorch vanilla → Triton kernels → compression techniques

🏗️ Architecture

Core Components

# 2-Simplicial Attention with KV Cache
Attention(K₁, K₂, V₁, V₂) = σ(Q·K₁) ⊙ σ(Q·K₂) · V₁ · V

Project Structure

simplicial-transformer/
├── simplicial/
│   ├── attention/              # 2-simplicial attention mechanisms
│   ├── cache/                  # KV cache with compression (K₁, K₂, V₁, V₂)
│   ├── layers/                 # Feedforward and simplicial blocks
│   ├── models/                 # Transformer implementations
│   ├── utils/                  # Utility functions (RoPE, sliding window)
│   └── validation/             # Correctness validation tools
├── training/                   # Training scripts and configs
├── scripts/                    # Inference and data preparation scripts
├── tests/                      # Comprehensive test suite
└── debug_tools/                # Debug and validation scripts

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/and-per-i/too-simplex.git
cd too-simplex

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Training

# Train with default config
python training/train.py

# Or use the CLI entry point
simplicial-train

Inference

# Generate text
python scripts/generate_text.py

# Or use the CLI entry point
simplicial-generate

☁️ Deployment su Vast.ai

I server Vast.ai utilizzano GPU Nvidia con processori x86 (AMD64), quindi le immagini Docker devono essere compilate per questa architettura.

Build per Vast.ai (AMD64)

# Build e push automatico su Docker Hub
./build-for-vast.sh

# Build locale senza push (per test)
./build-for-vast.sh --no-push

# Con tag personalizzato
./build-for-vast.sh --tag v1.0.0

⚠️ Problemi di architettura

Se stai usando un Mac con processore Apple Silicon (M1, M2, M3), Docker crea automaticamente immagini ARM64. Avviare un'immagine ARM su un host AMD64 fallisce con errori tipo "invalid argument".

Soluzione: Usa sempre build-for-vast.sh che forza l'architettura linux/amd64.

Build locale (ARM64/AMD64)

Per sviluppo locale sul tuo Mac:

# Build per architettura nativa
./build-local.sh

# Test locale
docker run --rm --gpus all -it too-simplex:local

Script disponibili

Script Architettura Uso
build-local.sh Nativa (ARM64/AMD64) Sviluppo locale
build-for-vast.sh AMD64 Deployment su Vast.ai

📝 License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors