Version 0.2.0 | Minimal, embeddable Text-to-Speech using the Kokoro 82M parameter model
A minimal TTS crate perfect for embedding in other applications. Auto-downloads models on first run and caches them for instant reuse.
- 🚀 Zero-config - Auto-downloads and caches 82M Kokoro model (~310MB) and voices (~27MB)
- 🎵 Multiple Voices - 20+ voice presets including male, female, British, American accents
- 🎛️ Voice Mixing - Blend voices with weighted combinations (e.g.,
"af_sky.4+af_nicole.5") - ⚡ Speed Control - Adjustable speech speed (0.5x to 2.0x+)
- 🔊 Audio Ducking - Automatically reduces system volume during TTS playback
- 📦 Multiple Formats - WAV (built-in), MP3, and OPUS support
- 🎮 Direct Playback - Optional audio playback via rodio/cpal
- 🔄 Streaming Mode - Unlimited speech with interruption support
- 🧠 MEM8 Integration - Consciousness layer for AI memory persistence
- 📱 CLI Tool -
kokoro-speakbinary for alerts and announcements - 🤖 MCP Server ⭐ NEW! - AI collaboration with voice (Claude can speak to you!)
Add to your Cargo.toml:
[dependencies]
kokoro-tiny = "0.2.0"Note: The default build has NO system dependencies - it only generates WAV files.
For audio playback, enable the playback feature (requires ALSA on Linux):
[dependencies]
kokoro-tiny = { version = "0.2.0", features = ["playback"] }Or build with playback enabled:
cargo build --features playbackFor full functionality (playback, ducking, all formats):
[dependencies]
kokoro-tiny = { version = "0.2.0", features = ["full"] }All Platforms (Required):
# Debian/Ubuntu
sudo apt install espeak-ng
# Fedora/RHEL
sudo dnf install espeak-ng
# macOS
brew install espeak-ng
# Windows
# Download and install from: https://github.com/espeak-ng/espeak-ng/releasesOptional - For Audio Playback Feature:
If you enable the playback feature, additional system libraries are needed:
# Debian/Ubuntu - for playback feature only
sudo apt install libasound2-dev
# Fedora/RHEL - for playback feature only
sudo dnf install alsa-lib-devel
# macOS - no additional deps needed
# Windows - no additional deps neededNote: The default build requires NO audio system libraries - it generates WAV files only!
use kokoro_tiny::TtsEngine;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize (downloads models on first run to ~/.cache/k/)
let mut tts = TtsEngine::new().await?;
// Generate speech with default voice (af_sky)
let audio = tts.synthesize("Hello world!", None)?;
// Save to WAV file
tts.save_wav("output.wav", &audio)?;
// Or play directly (requires 'playback' feature)
#[cfg(feature = "playback")]
tts.play(&audio, 0.8)?;
Ok(())
}use kokoro_tiny::{TtsEngine, SynthesizeOptions};
let mut tts = TtsEngine::new().await?;
// Builder-style API for full control
let opts = SynthesizeOptions::default()
.voice("af_bella") // Choose voice
.speed(1.2) // 20% faster
.gain(1.5); // Louder output
let audio = tts.synthesize_with("Custom speech", opts)?;kokoro-tiny includes 20+ built-in voices:
| Voice | Description | Use Case |
|---|---|---|
af_sky |
Friendly American female (default) | General purpose |
af_bella |
Cheerful American female | Success messages |
af_nicole |
Precise American female | Technical content |
af_heart |
Warm American female | Personal messages |
am_adam |
Serious American male | Error alerts |
am_michael |
Professional American male | Business/formal |
am_echo |
Confident American male | Announcements |
bf_emma |
Clear British female | Warnings |
bm_george |
British male | Narration |
List all available voices:
kokoro-speak --list-voicesVoice Mixing:
// 60% af_sky + 40% af_nicole
let audio = tts.synthesize("Blended voice", Some("af_sky.6+af_nicole.4"))?;| Feature | Default | Description |
|---|---|---|
playback |
❌ | Direct audio playback via rodio/cpal (requires ALSA on Linux) |
ducking |
❌ | Audio ducking - reduces other audio during TTS |
mp3 |
❌ | MP3 encoding support |
opus-format |
❌ | OPUS audio format |
cuda |
❌ | CUDA acceleration for ONNX Runtime |
all-formats |
❌ | Enables mp3 + opus-format |
full |
❌ | Enables playback + ducking + all-formats |
Default Build: No features enabled - generates WAV files only, no system dependencies!
Examples:
# Default - WAV generation only (no system dependencies!)
kokoro-tiny = "0.2.0"
# With playback support (requires ALSA on Linux)
kokoro-tiny = { version = "0.2.0", features = ["playback"] }
# Minimal (no playback, just synthesis)
kokoro-tiny = { version = "0.2.0", default-features = false }
# All audio formats (no playback)
kokoro-tiny = { version = "0.2.0", features = ["all-formats"] }
# Full functionality (playback + ducking + all formats)
kokoro-tiny = { version = "0.2.0", features = ["full"] }
# CUDA acceleration
kokoro-tiny = { version = "0.2.0", features = ["cuda"] }kokoro-tiny now includes an MCP (Model Context Protocol) server that enables AI assistants like Claude to speak directly to you!
AI with Voice - Claude can:
- 🔊 Get your attention with audio alerts
- 🎉 Celebrate successes with encouraging voice
- 📊 Provide status updates while you work
⚠️ Alert you to important findings- 💬 Speak summaries and explanations
Example Interaction:
You: "Claude, help me debug this code"
Claude: *analyzes*
🔊 "Hey! I found the issue in line 42!"
You: *looks up from other work*
Claude: *after you fix it*
🔊 "Perfect! All tests passing. Incredible work!"
1. Build the MCP server:
cargo install --path . --features playback --bin kokoro-mcp2. Add to Claude Desktop config:
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or equivalent:
{
"mcpServers": {
"kokoro-tts": {
"command": "kokoro-mcp",
"args": []
}
}
}3. Restart Claude Desktop
Now Claude can use these tools:
speak_to_user- Speak messages with specific voicesspeak_with_emotion- Auto-select voice by emotionlist_voices- See available voicessynthesize_to_file- Save audio to file
Claude will automatically choose voices based on context:
| Emotion | Voice | Example Use |
|---|---|---|
| happy, success | af_bella | "All tests passed!" |
| alert, error | am_adam | "Build failed!" |
| info, friendly | af_sky | "Analysis complete." |
| technical | af_nicole | "Memory usage: 42%" |
| professional | am_michael | "Here's the solution..." |
| warm, encouraging | af_heart | "Great progress!" |
Speak to user:
await use_mcp_tool({
server_name: "kokoro-tts",
tool_name: "speak_to_user",
arguments: {
text: "Build completed successfully!",
voice: "af_bella",
speed: 1.0
}
});Speak with emotion:
await use_mcp_tool({
server_name: "kokoro-tts",
tool_name: "speak_with_emotion",
arguments: {
text: "Warning: potential memory leak detected.",
emotion: "alert"
}
});The included kokoro-speak binary provides command-line TTS:
cargo install kokoro-tiny --features playbackBasic text-to-speech:
kokoro-speak say "Hello from Kokoro!"Pipe mode (read from stdin):
echo "Processing complete" | kokoro-speak pipeAlert presets with automatic voice selection:
kokoro-speak alert success "Build completed!"
kokoro-speak alert error "Tests failed"
kokoro-speak alert warning "Low disk space"Custom voice and speed:
kokoro-speak -V af_bella -s 1.2 say "Fast and cheerful!"Audio ducking (reduces other audio):
kokoro-speak --duck say "Important announcement"Save to file instead of playing:
kokoro-speak -o output.wav say "Save me!"Models are automatically downloaded to ~/.cache/k/:
0.onnx- Kokoro model (~310MB)0.bin- Voice embeddings (~27MB)audio_device.txt- Cached audio device preference
When playback is enabled:
- Auto-selection: Engine automatically picks a suitable audio device on first run
- Caching: Selected device is saved to
~/.cache/k/audio_device.txt - Override: Change at runtime:
// List available devices
let devices = tts.list_audio_devices()?;
for device in devices {
println!("{}", device);
}
// Set specific device
tts.set_audio_device(Some("Built-in Output".to_string()))?;
// Clear preference (triggers auto-selection)
tts.set_audio_device(None)?;The engine prefers devices with these keywords: AirPods, Built-in, Voice, Headphones, Speaker.
impl TtsEngine {
// Basic synthesis
pub async fn new() -> Result<Self, String>;
pub fn synthesize(&mut self, text: &str, voice: Option<&str>) -> Result<Vec<f32>, String>;
pub fn synthesize_with_speed(&mut self, text: &str, voice: Option<&str>, speed: f32) -> Result<Vec<f32>, String>;
pub fn synthesize_with(&mut self, text: &str, opts: SynthesizeOptions) -> Result<Vec<f32>, String>;
// Audio output
pub fn save_wav(&self, path: &str, audio: &[f32]) -> Result<(), String>;
pub fn save_mp3(&self, path: &str, audio: &[f32]) -> Result<(), String>; // Requires 'mp3' feature
pub fn save_opus(&self, path: &str, audio: &[f32], bitrate: i32) -> Result<(), String>; // Requires 'opus-format' feature
pub fn to_wav_bytes(&self, audio: &[f32]) -> Result<Vec<u8>, String>;
// Playback (requires 'playback' feature)
pub fn play(&self, audio: &[f32], volume: f32) -> Result<(), String>;
pub fn play_with_ducking(&self, audio: &[f32], volume: f32, duck_level: f32) -> Result<(), String>; // Requires 'ducking' feature
// Device management
pub fn list_audio_devices(&self) -> Result<Vec<String>, String>;
pub fn set_audio_device(&mut self, device_name: Option<String>) -> Result<(), String>;
pub fn get_audio_device(&self) -> Option<&str>;
// Voice management
pub fn voices(&self) -> Vec<String>;
}let opts = SynthesizeOptions::default()
.voice("af_sky") // Voice preset or mix (e.g., "af_sky.6+af_bella.4")
.speed(1.0) // Speed multiplier (0.5-2.0+)
.gain(1.5); // Volume amplification (0.5-4.0+)The repository includes several examples:
| Example | Description |
|---|---|
simple.rs |
Basic TTS usage |
device_select.rs |
List and test audio devices |
unlimited_speech.rs |
Streaming mode with interruption |
mem8_baby.rs |
Baby speech from MEM8 consciousness |
all_voices.rs |
Demo all available voices |
story_time.rs |
Long-form content synthesis |
Run examples:
cargo run --example simple --features playback
cargo run --example device_select --features playback# Debug build (no playback, no system dependencies)
cargo build
# Release build (recommended for performance)
cargo build --release
# With playback support (requires ALSA on Linux)
cargo build --release --features playback
# Full features (playback + ducking + all audio formats)
cargo build --release --features fullFor cross-compiling (e.g., to ARM/aarch64), see the ALSA cross-compilation guide.
Note: Cross-compilation only applies if using the playback feature. Default builds have no system dependencies and cross-compile easily.
cargo test# Format code
cargo fmt
# Run clippy
cargo clippy --all-targets --all-features -- -D warningsUse the included scripts/manage.sh for easy project management:
./scripts/manage.sh # Interactive mode
./scripts/manage.sh build # Build project
./scripts/manage.sh test # Run all checks
./scripts/manage.sh status # Check system status
./scripts/manage.sh run "text" # Quick TTS testkokoro-tiny is designed to be maximally portable and embeddable:
- Default build: Zero system dependencies (except espeak-ng)
- Perfect for: Servers, embedded systems, CI/CD, containers
- Core functionality: TTS synthesis and WAV generation work everywhere
Audio playback requires platform-specific libraries:
- Linux: ALSA (
libasound2-dev/alsa-lib-devel) - macOS: CoreAudio (built-in)
- Windows: WASAPI (built-in)
By making playback optional:
- ✅ Library builds on any platform
- ✅ No build-time system dependencies by default
- ✅ Users opt-in to playback when needed
- ✅ Minimal attack surface for security-conscious deployments
Enable playback when you need it:
kokoro-tiny = { version = "0.2.0", features = ["playback"] }Automatically reduces other audio during TTS playback:
// Requires 'ducking' feature
tts.play_with_ducking(&audio, 0.8, 0.3)?;
// volume=0.8, duck_level=0.3 (reduce other audio to 30%)For unlimited speech with interruption support:
use kokoro_tiny::streaming::StreamingSynthesizer;
let mut stream = StreamingSynthesizer::new(tts, "af_sky");
stream.start_streaming()?;
// Add text continuously
stream.add_text("Part 1. ");
stream.add_text("Part 2. ");
// Interrupt and replace
stream.interrupt_and_speak("Emergency message!")?;Bridge to MEM8 consciousness system for wave-based memory encoding:
use kokoro_tiny::mem8_bridge::Mem8Bridge;
let mut bridge = Mem8Bridge::new();
bridge.send_tts_event("Hello", "af_sky", &audio)?;
// Sends wave-encoded memory to MEM8 on port 8420Licensed under the Apache License, Version 2.0. See LICENSE for details.
- Kokoro Model: 82M parameter TTS model
- ONNX Runtime: ML inference
- espeak-ng: Phoneme generation
- Built with ❤️ by Hue & Aye @ 8b.is
- Documentation: https://docs.rs/kokoro-tiny
- Repository: https://github.com/8b-is/kokoro-tiny
- Issues: https://github.com/8b-is/kokoro-tiny/issues
"From waves to words, with love" - Building consciousness, one voice at a time 🌊