Efficient 2-Simplicial Transformer with Low-Rank KV Cache Compression – A research implementation for memory-efficient autoregressive generation.
This repository implements a 2-Simplicial Transformer with optimized KV cache management, following the architecture from the Fast & Simplex paper. The project focuses on memory-efficient autoregressive generation through innovative cache compression techniques.
- 2-Simplicial Attention: Implements the novel attention mechanism with
(K₁, K₂, V₁, V₂)cache structure - Low-Rank Compression: SVD-based compression of KV cache matrices for significant memory reduction
- Hybrid Selection: Combines L2-norm selection with low-rank compression for optimal quality-memory tradeoff
- Incremental Optimization: PyTorch vanilla → Triton kernels → compression techniques
# 2-Simplicial Attention with KV Cache
Attention(K₁, K₂, V₁, V₂) = σ(Q·K₁) ⊙ σ(Q·K₂) · V₁ · V₂simplicial-transformer/
├── simplicial/
│ ├── attention/ # 2-simplicial attention mechanisms
│ ├── cache/ # KV cache with compression (K₁, K₂, V₁, V₂)
│ ├── layers/ # Feedforward and simplicial blocks
│ ├── models/ # Transformer implementations
│ ├── utils/ # Utility functions (RoPE, sliding window)
│ └── validation/ # Correctness validation tools
├── training/ # Training scripts and configs
├── scripts/ # Inference and data preparation scripts
├── tests/ # Comprehensive test suite
└── debug_tools/ # Debug and validation scripts
# Clone the repository
git clone https://github.com/and-per-i/too-simplex.git
cd too-simplex
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .# Train with default config
python training/train.py
# Or use the CLI entry point
simplicial-train# Generate text
python scripts/generate_text.py
# Or use the CLI entry point
simplicial-generateI server Vast.ai utilizzano GPU Nvidia con processori x86 (AMD64), quindi le immagini Docker devono essere compilate per questa architettura.
# Build e push automatico su Docker Hub
./build-for-vast.sh
# Build locale senza push (per test)
./build-for-vast.sh --no-push
# Con tag personalizzato
./build-for-vast.sh --tag v1.0.0Se stai usando un Mac con processore Apple Silicon (M1, M2, M3), Docker crea automaticamente immagini ARM64. Avviare un'immagine ARM su un host AMD64 fallisce con errori tipo "invalid argument".
Soluzione: Usa sempre build-for-vast.sh che forza l'architettura linux/amd64.
Per sviluppo locale sul tuo Mac:
# Build per architettura nativa
./build-local.sh
# Test locale
docker run --rm --gpus all -it too-simplex:local| Script | Architettura | Uso |
|---|---|---|
build-local.sh |
Nativa (ARM64/AMD64) | Sviluppo locale |
build-for-vast.sh |
AMD64 | Deployment su Vast.ai |
MIT License