Skip to content

Latest commit

 

History

History
231 lines (164 loc) · 8.26 KB

File metadata and controls

231 lines (164 loc) · 8.26 KB

Flash Examples

Auto-generated by /analyze-repos on 2026-02-22. Manual edits will be overwritten on next analysis.

Project Overview

Production-ready examples demonstrating Flash framework capabilities. Flat-file pattern: each worker is a standalone .py file with @Endpoint decorator, auto-discovered by flash run. 6 categories, 18 worker files. Root pyproject.toml declares only runpod-flash dependency; runtime deps declared inline via Endpoint(dependencies=[...]).

Architecture

Key Abstractions

  1. @Endpoint decorator (QB) -- Core pattern. async def marked with @Endpoint(name=..., gpu=..., ...) for queue-based remote execution.
  2. Endpoint routes (LB) -- Load-balanced pattern. api = Endpoint(...) with @api.get()/@api.post() route decorators for HTTP endpoints.
  3. @Endpoint decorator (class) -- Used on SimpleSD class (05_data_workflows). Class-based pattern for stateful workers.
  4. Cross-worker orchestration -- Pipeline files import from QB workers, chain with await. LB endpoint orchestrates QB workers.
  5. Flat-file discovery -- No FastAPI boilerplate, no routers, no main.py. flash run auto-generates routes from decorated functions.
  6. In-function imports -- Heavy libs (torch, transformers, etc.) imported inside @Endpoint body, only runpod_flash at module level.

Entry Points

All worker files across 6 categories. Each file is an independent entry point discovered by flash run.

Module Structure

01_getting_started/          # Fundamentals
  01_hello_world/            # Basic GPU worker
  02_cpu_worker/             # CPU-only worker
  03_mixed_workers/          # Cross-worker orchestration (CPU -> GPU -> LB)
  04_dependencies/           # Runtime dependency declaration
02_ml_inference/             # ML deployment
  01_text_to_speech/         # Qwen3-TTS model serving
03_advanced_workers/         # Advanced patterns
  05_load_balancer/          # LB endpoints with custom HTTP routes
04_scaling_performance/      # Autoscaling
  01_autoscaling/            # Scaling strategy examples
05_data_workflows/           # Data pipelines
  01_network_volumes/        # Network volume usage with @Endpoint class
06_real_world/               # Placeholder for production patterns

Worker File Patterns

Queue-based (function decorator):

from runpod_flash import Endpoint, GpuType

@Endpoint(
    name="my-worker",
    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
)
async def my_function(payload: dict) -> dict:
    """All runtime imports inside the function body."""
    import torch
    return {"status": "success"}

Load-balanced (route decorators):

from runpod_flash import Endpoint

api = Endpoint(name="my-api", cpu="cpu3c-1-2", workers=(1, 3))

@api.post("/process")
async def process(data: dict) -> dict:
    return {"result": data}

@api.get("/health")
async def health() -> dict:
    return {"status": "ok"}

Resource Configuration

GPU vs CPU is a parameter, not a class choice:

Config Syntax Use Case
GPU endpoint @Endpoint(name=..., gpu=GpuType.NVIDIA_GEFORCE_RTX_4090) GPU workers
CPU endpoint @Endpoint(name=..., cpu="cpu3c-1-2") CPU workers
GPU LB api = Endpoint(name=..., gpu=GpuType.NVIDIA_GEFORCE_RTX_4090); @api.post(...) GPU LB endpoints
CPU LB api = Endpoint(name=..., cpu="cpu3c-1-2"); @api.post(...) CPU LB endpoints

Cross-Worker Orchestration

Pipeline files import functions from other workers and chain them:

from cpu_worker import preprocess_text
from gpu_worker import gpu_inference
from runpod_flash import Endpoint

pipeline = Endpoint(name="pipeline", cpu="cpu3c-1-2", workers=(1, 3))

@pipeline.post("/classify")
async def classify(text: str) -> dict:
    result = await preprocess_text({"text": text})
    return await gpu_inference(result)

Public API Surface

All examples import from runpod_flash. Import frequency by symbol:

Symbol Files Using It Breakage Risk
Endpoint 18 ALL examples break
GpuType 7 GPU config breaks
CpuInstanceType 4 CPU config breaks
NetworkVolume 2 Volume examples break
ServerlessScalerType 1 Scaling example breaks

Cross-Repo Dependencies

Depends On

  • flash (runpod_flash package) -- all files import from it. Any breaking change to Endpoint constructor, enum values, or route decorator signature breaks examples at import time.

Depended On By

  • None. This is a leaf repo (documentation/examples only).

Interface Contracts

  • Endpoint(name=..., gpu=..., cpu=..., workers=...) constructor -- parameter rename/removal breaks all files
  • .get()/.post()/.put()/.delete()/.patch() route decorator signatures
  • GpuGroup, GpuType, CpuInstanceType enum values -- value removals break GPU/CPU configs
  • NetworkVolume constructor -- field changes break volume examples

Dependency Chain

flash-examples --> flash (runpod_flash) --> runpod-python (runpod)

Known Drift

  • No automated tests -- changes caught only at import time or flash run
  • No CI that validates examples against current flash version
  • Python version: inherits from flash (3.10+)

Development Commands

Setup

uv venv && source .venv/bin/activate
uv sync --all-groups

Testing

flash run                     # Start local dev server (localhost:8888)
# Visit http://localhost:8888/docs for interactive API docs
python gpu_worker.py          # Test a single worker directly (if __name__ == "__main__" block)

Quality

make quality-check            # REQUIRED BEFORE ALL COMMITS
make lint                     # Ruff linter
make format                   # Ruff formatter
make format-check             # Check formatting

Build and Deploy

flash build                   # Package build artifacts
flash deploy                  # Build + upload + provision endpoints
flash deploy --preview        # Local Docker Compose preview
flash build --use-local-flash # Use local flash library instead of PyPI

Code Health

High Severity

  • No test infrastructure at all. No conftest.py, no tests/ directory, no pytest config. Only if __name__ == "__main__" blocks for manual testing. Any flash API change is caught only at import time.

Medium Severity

  • Broad except Exception catches in 4 files -- swallows specific errors, makes debugging harder
  • Duplicated GPU inference logic in 04_scaling_performance -- 3 near-identical functions that should be extracted
  • No CI validation that examples work against the current flash version

Low Severity

  • Duplicated speakers/languages lists in 02_ml_inference/01_text_to_speech
  • Missing input validation in some workers (accepts arbitrary dict without schema)

Testing

Structure

No formal test infrastructure exists. Each worker has an optional if __name__ == "__main__" block for manual execution.

Coverage Gaps

  • 100% uncovered -- no test framework, no conftest, no pytest config
  • No smoke tests that verify examples import successfully
  • No integration tests that run flash run against examples

Patterns

To test manually:

cd 01_getting_started/01_hello_world
flash run                    # Starts dev server, auto-discovers workers
# Use http://localhost:8888/docs to invoke endpoints

Recommended Test Strategy

  1. Add tests/test_imports.py that imports every worker file (catches Endpoint signature drift)
  2. Add tests/test_configs.py that validates all resource configs construct without error
  3. Add CI job that runs flash run --check (dry-run mode) against each example category

Common Mistakes

  1. Accessing external scope in @Endpoint functions -- only local variables, parameters, and internal imports work. The function body is serialized and sent to a remote worker.
  2. Module-level imports of heavy libraries -- import torch, numpy, transformers, etc. inside the function body, not at module level.
  3. Missing if __name__ == "__main__" test block -- each worker should be independently testable.
  4. Mutable default arguments -- use None and initialize in function body.
  5. Importing from flash instead of runpod_flash -- the package name is runpod_flash.

Last analyzed: 2026-02-22