Skip to content

Commit 43b0dd1

Browse files
committed
improve logging and error handling
also merge README
1 parent 7a98841 commit 43b0dd1

File tree

3 files changed

+237
-238
lines changed

3 files changed

+237
-238
lines changed

plugins/elevenlabs/README.md

Lines changed: 205 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,218 @@ SPDX-FileCopyrightText: 2024 LiveKit, Inc.
33
44
SPDX-License-Identifier: Apache-2.0
55
-->
6-
# ElevenLabs plugin for LiveKit Agents
6+
# ElevenLabs Plugin for LiveKit Agents
77

88
The Agents Framework is designed for building realtime, programmable
99
participants that run on servers. Use it to create conversational, multi-modal
1010
voice agents that can see, hear, and understand.
1111

12-
This package contains the ElevenLabs plugin, which allows for voice synthesis.
12+
This package contains the ElevenLabs plugin, which provides:
13+
- **Text-to-Speech (TTS)**: High-quality voice synthesis with multiple voices and models
14+
- **Speech-to-Text (STT)**: Real-time and batch transcription with Scribe API
15+
1316
Refer to the [documentation](https://docs.livekit.io/agents/overview/) for
1417
information on how to use it, or browse the [API
1518
reference](https://docs.livekit.io/agents-js/modules/plugins_agents_plugin_elevenlabs.html).
1619
See the [repository](https://github.com/livekit/agents-js) for more information
1720
about the framework as a whole.
21+
22+
## Installation
23+
24+
```bash
25+
pnpm add @livekit/agents-plugin-elevenlabs
26+
```
27+
28+
Set your ElevenLabs API key:
29+
```bash
30+
export ELEVEN_API_KEY=your_api_key_here
31+
```
32+
33+
---
34+
35+
## Text-to-Speech (TTS)
36+
37+
For TTS documentation, refer to the [API reference](https://docs.livekit.io/agents-js/modules/plugins_agents_plugin_elevenlabs.html).
38+
39+
### Quick Example
40+
41+
```typescript
42+
import { TTS } from '@livekit/agents-plugin-elevenlabs';
43+
44+
const tts = new TTS();
45+
// Use tts for voice synthesis
46+
```
47+
48+
---
49+
50+
## Speech-to-Text (STT)
51+
52+
### Features
53+
54+
- **Multiple Model Support**: Choose between Scribe v1, v2, and v2 realtime
55+
- **Streaming & Non-Streaming**: Support for both batch and real-time transcription
56+
- **Multi-Language**: Supports 35+ languages with automatic language detection
57+
- **Audio Event Tagging**: Optional tagging of non-speech audio events (laughter, footsteps, etc.)
58+
- **VAD Configuration**: Customizable voice activity detection for streaming mode
59+
60+
### Supported Models
61+
62+
#### Scribe v1 (`scribe_v1`)
63+
- **Type**: Non-streaming
64+
- **Method**: HTTP POST
65+
- **Use Case**: Batch transcription of pre-recorded audio
66+
- **Features**: Audio event tagging, language detection
67+
68+
#### Scribe v2 (`scribe_v2`)
69+
- **Type**: Non-streaming
70+
- **Method**: HTTP POST
71+
- **Use Case**: Improved accuracy for batch transcription
72+
- **Features**: Enhanced model, language detection
73+
74+
#### Scribe v2 Realtime (`scribe_v2_realtime`)
75+
- **Type**: Streaming (default)
76+
- **Method**: WebSocket
77+
- **Use Case**: Real-time conversation transcription
78+
- **Features**: Interim results, VAD-based segmentation, manual commit support
79+
80+
### Quick Start
81+
82+
#### Non-Streaming (Scribe v1)
83+
84+
```typescript
85+
import { STT } from '@livekit/agents-plugin-elevenlabs';
86+
87+
const stt = new STT({
88+
apiKey: process.env.ELEVEN_API_KEY, // or set ELEVEN_API_KEY env var
89+
model: 'scribe_v1',
90+
languageCode: 'en',
91+
tagAudioEvents: true,
92+
});
93+
```
94+
95+
#### Streaming (Scribe v2 Realtime)
96+
97+
```typescript
98+
import { STT } from '@livekit/agents-plugin-elevenlabs';
99+
import { SpeechEventType } from '@livekit/agents';
100+
101+
const stt = new STT({
102+
model: 'scribe_v2_realtime', // default
103+
sampleRate: 16000,
104+
languageCode: 'en',
105+
commitStrategy: 'vad', // auto-commit on speech end
106+
vadSilenceThresholdSecs: 1.0,
107+
});
108+
```
109+
110+
### Configuration Options
111+
112+
#### Common Options
113+
114+
| Option | Type | Default | Description |
115+
|--------|------|---------|-------------|
116+
| `apiKey` | `string` | `process.env.ELEVEN_API_KEY` | ElevenLabs API key |
117+
| `baseURL` | `string` | `https://api.elevenlabs.io/v1` | API base URL |
118+
| `model` | `STTModels` | `'scribe_v2_realtime'` | Model to use |
119+
| `languageCode` | `string` | `undefined` | Language code (auto-detected if not set) |
120+
121+
#### Non-Streaming Options (v1, v2)
122+
123+
| Option | Type | Default | Description |
124+
|--------|------|---------|-------------|
125+
| `tagAudioEvents` | `boolean` | `true` | Tag non-speech events like (laughter) |
126+
127+
#### Streaming Options (v2_realtime)
128+
129+
| Option | Type | Default | Description |
130+
|--------|------|---------|-------------|
131+
| `sampleRate` | `number` | `16000` | Audio sample rate in Hz (16000, 22050, or 44100) |
132+
| `numChannels` | `number` | `1` | Number of audio channels |
133+
| `commitStrategy` | `'vad' \| 'manual'` | `'vad'` | How to commit transcripts |
134+
| `vadSilenceThresholdSecs` | `number` | `undefined` | VAD silence threshold (0.3-3.0 seconds) |
135+
| `vadThreshold` | `number` | `undefined` | VAD threshold (0.1-0.9) |
136+
| `minSpeechDurationMs` | `number` | `undefined` | Minimum speech duration (50-2000 ms) |
137+
| `minSilenceDurationMs` | `number` | `undefined` | Minimum silence duration (50-2000 ms) |
138+
139+
### Supported Languages
140+
141+
The STT plugin supports 35+ languages including:
142+
143+
English (`en`), Spanish (`es`), French (`fr`), German (`de`), Italian (`it`), Portuguese (`pt`), Polish (`pl`), Dutch (`nl`), Swedish (`sv`), Finnish (`fi`), Danish (`da`), Norwegian (`no`), Czech (`cs`), Romanian (`ro`), Slovak (`sk`), Ukrainian (`uk`), Greek (`el`), Turkish (`tr`), Russian (`ru`), Bulgarian (`bg`), Croatian (`hr`), Serbian (`sr`), Hungarian (`hu`), Lithuanian (`lt`), Latvian (`lv`), Estonian (`et`), Japanese (`ja`), Chinese (`zh`), Korean (`ko`), Hindi (`hi`), Arabic (`ar`), Persian (`fa`), Hebrew (`he`), Indonesian (`id`), Malay (`ms`), Thai (`th`), Vietnamese (`vi`), Tamil (`ta`), Urdu (`ur`)
144+
145+
### Advanced Usage
146+
147+
#### Custom VAD Parameters
148+
149+
Fine-tune voice activity detection for your use case:
150+
151+
```typescript
152+
const stt = new STT({
153+
model: 'scribe_v2_realtime',
154+
commitStrategy: 'vad',
155+
156+
// Longer silence before committing (good for thoughtful speakers)
157+
vadSilenceThresholdSecs: 2.0,
158+
159+
// Higher threshold = more strict about what's considered speech
160+
vadThreshold: 0.7,
161+
162+
// Ignore very short speech bursts (reduce false positives)
163+
minSpeechDurationMs: 200,
164+
165+
// Require longer silence to end speech (reduce fragmentation)
166+
minSilenceDurationMs: 500,
167+
});
168+
```
169+
170+
#### Multi-Language Support
171+
172+
Let ElevenLabs auto-detect the language:
173+
174+
```typescript
175+
const stt = new STT({
176+
model: 'scribe_v1',
177+
// Don't set languageCode - will auto-detect
178+
});
179+
180+
const event = await stt.recognize(audioBuffer);
181+
console.log('Detected language:', event.alternatives[0].language);
182+
console.log('Text:', event.alternatives[0].text);
183+
```
184+
185+
Or specify a language:
186+
187+
```typescript
188+
const stt = new STT({
189+
model: 'scribe_v2_realtime',
190+
languageCode: 'es', // Spanish
191+
});
192+
```
193+
194+
### Model Comparison
195+
196+
| Feature | Scribe v1 | Scribe v2 | Scribe v2 Realtime |
197+
|---------|-----------|-----------|-------------------|
198+
| **Type** | Non-streaming | Non-streaming | Streaming |
199+
| **Latency** | High (batch) | High (batch) | Low (real-time) |
200+
| **Interim Results** ||||
201+
| **Audio Event Tagging** ||||
202+
| **VAD Configuration** ||||
203+
| **Manual Commit** ||||
204+
| **Best For** | Batch jobs with event detection | High-accuracy batch | Real-time conversations |
205+
206+
---
207+
208+
## Resources
209+
210+
- [ElevenLabs TTS Documentation](https://elevenlabs.io/docs/api-reference/text-to-speech)
211+
- [ElevenLabs STT Documentation](https://elevenlabs.io/docs/api-reference/speech-to-text)
212+
- [Scribe v2 Streaming Guide](https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming)
213+
- [LiveKit Agents Documentation](https://docs.livekit.io/agents/)
214+
- [LiveKit Agents JS Repository](https://github.com/livekit/agents-js)
215+
216+
## License
217+
218+
Copyright 2025 LiveKit, Inc.
219+
220+
Licensed under the Apache License, Version 2.0.

0 commit comments

Comments
 (0)