A Rust-based command-line tool for converting text to speech using OpenAI's TTS API or a compatible custom endpoint.
- 🎯 Easy-to-use command-line interface
- 🔊 High-quality text-to-speech conversion using OpenAI's API or custom endpoints
- 📝 Supports large text files through automatic chunking
- 🎨 Multiple voice options and audio formats
- ⚡ Adjustable speaking speed
- 🔄 Interactive mode for selecting voices and formats (when defaults are used)
- 📁 Organized output with automatic file management
- 🚀 Progress indicators during conversion
cargo install ttsrs
- Rust (latest stable version)
- ffmpeg (for audio file combining)
- API key for the target TTS service
- Internet connection
ttsrs [OPTIONS] <INPUT_FILE>
Argument | Description | Default |
---|---|---|
<INPUT_FILE> |
Path to the input text file | - (Required, or prompted) |
--model , -m |
TTS model to use | tts-1-hd |
--voice , -v |
Voice selection | alloy (prompted if default) |
--format , -f |
Output audio format | flac (prompted if default) |
--speed |
Speaking speed (0.25 - 4.0) | 1.0 |
--apikey , -a |
API key for the TTS service | - (Required, env var, or prompted) |
--endpoint-url |
Custom API endpoint URL (e.g., for local AI) | https://api.openai.com/v1/audio/speech |
Available voices (may vary depending on the endpoint):
- alloy - A versatile, well-balanced voice
- echo - Clear and professional, ideal for announcements
- fable - Warm and engaging, perfect for storytelling
- onyx - Deep and authoritative
- nova - Young and energetic
- shimmer - Soft and soothing
- ballad - New!
- coral - New!
- sage - New!
Supported output formats (may vary depending on the endpoint):
flac
(default) - Lossless audio compressionmp3
- Common compressed audio formatwav
- Uncompressed audiopcm
- Raw audio dataopus
- High-quality compressed audioaac
- Widely supported compressed audio
Basic usage (will prompt for API key, voice, format if defaults are used):
ttsrs input.txt
Specifying voice, format, speed, and API key:
ttsrs --voice nova --format mp3 --speed 1.2 --apikey sk-... input.txt
Using a custom endpoint URL (e.g., for a local LM Studio instance):
ttsrs --endpoint-url "http://localhost:1234/v1/audio/speech" --apikey N/A --voice some-local-voice input.txt
Using environment variable for API key:
export OPENAI_API_KEY='your-api-key-here'
ttsrs --voice echo --format wav input.txt
OPENAI_API_KEY
: Your API key. The--apikey
flag takes precedence if both are set.
- Text is automatically chunked based on token count (using
tiktoken_rs
withcl100k_base
) to stay within API limits (approx. 500 tokens per chunk). - Each chunk is sent separately to the specified API endpoint.
- Audio responses for each chunk are saved as temporary files.
ffmpeg
is used to concatenate the temporary audio files into a single output file.- Temporary files are automatically cleaned up after successful combination.
- Output is saved in a directory named after the input file.
- Supports adjustable speaking speed via
--speed
. - Supports multiple OpenAI voices and audio formats (or those supported by the custom endpoint).
MIT License
Based on the unofficial-openai-tts-cli Python project.