@@ -3,15 +3,218 @@ SPDX-FileCopyrightText: 2024 LiveKit, Inc.
33
44SPDX-License-Identifier: Apache-2.0
55-->
6- # ElevenLabs plugin for LiveKit Agents
6+ # ElevenLabs Plugin for LiveKit Agents
77
88The Agents Framework is designed for building realtime, programmable
99participants that run on servers. Use it to create conversational, multi-modal
1010voice agents that can see, hear, and understand.
1111
12- This package contains the ElevenLabs plugin, which allows for voice synthesis.
12+ This package contains the ElevenLabs plugin, which provides:
13+ - ** Text-to-Speech (TTS)** : High-quality voice synthesis with multiple voices and models
14+ - ** Speech-to-Text (STT)** : Real-time and batch transcription with Scribe API
15+
1316Refer to the [ documentation] ( https://docs.livekit.io/agents/overview/ ) for
1417information on how to use it, or browse the [ API
1518reference] ( https://docs.livekit.io/agents-js/modules/plugins_agents_plugin_elevenlabs.html ) .
1619See the [ repository] ( https://github.com/livekit/agents-js ) for more information
1720about the framework as a whole.
21+
22+ ## Installation
23+
24+ ``` bash
25+ pnpm add @livekit/agents-plugin-elevenlabs
26+ ```
27+
28+ Set your ElevenLabs API key:
29+ ``` bash
30+ export ELEVEN_API_KEY=your_api_key_here
31+ ```
32+
33+ ---
34+
35+ ## Text-to-Speech (TTS)
36+
37+ For TTS documentation, refer to the [ API reference] ( https://docs.livekit.io/agents-js/modules/plugins_agents_plugin_elevenlabs.html ) .
38+
39+ ### Quick Example
40+
41+ ``` typescript
42+ import { TTS } from ' @livekit/agents-plugin-elevenlabs' ;
43+
44+ const tts = new TTS ();
45+ // Use tts for voice synthesis
46+ ```
47+
48+ ---
49+
50+ ## Speech-to-Text (STT)
51+
52+ ### Features
53+
54+ - ** Multiple Model Support** : Choose between Scribe v1, v2, and v2 realtime
55+ - ** Streaming & Non-Streaming** : Support for both batch and real-time transcription
56+ - ** Multi-Language** : Supports 35+ languages with automatic language detection
57+ - ** Audio Event Tagging** : Optional tagging of non-speech audio events (laughter, footsteps, etc.)
58+ - ** VAD Configuration** : Customizable voice activity detection for streaming mode
59+
60+ ### Supported Models
61+
62+ #### Scribe v1 (` scribe_v1 ` )
63+ - ** Type** : Non-streaming
64+ - ** Method** : HTTP POST
65+ - ** Use Case** : Batch transcription of pre-recorded audio
66+ - ** Features** : Audio event tagging, language detection
67+
68+ #### Scribe v2 (` scribe_v2 ` )
69+ - ** Type** : Non-streaming
70+ - ** Method** : HTTP POST
71+ - ** Use Case** : Improved accuracy for batch transcription
72+ - ** Features** : Enhanced model, language detection
73+
74+ #### Scribe v2 Realtime (` scribe_v2_realtime ` )
75+ - ** Type** : Streaming (default)
76+ - ** Method** : WebSocket
77+ - ** Use Case** : Real-time conversation transcription
78+ - ** Features** : Interim results, VAD-based segmentation, manual commit support
79+
80+ ### Quick Start
81+
82+ #### Non-Streaming (Scribe v1)
83+
84+ ``` typescript
85+ import { STT } from ' @livekit/agents-plugin-elevenlabs' ;
86+
87+ const stt = new STT ({
88+ apiKey: process .env .ELEVEN_API_KEY , // or set ELEVEN_API_KEY env var
89+ model: ' scribe_v1' ,
90+ languageCode: ' en' ,
91+ tagAudioEvents: true ,
92+ });
93+ ```
94+
95+ #### Streaming (Scribe v2 Realtime)
96+
97+ ``` typescript
98+ import { STT } from ' @livekit/agents-plugin-elevenlabs' ;
99+ import { SpeechEventType } from ' @livekit/agents' ;
100+
101+ const stt = new STT ({
102+ model: ' scribe_v2_realtime' , // default
103+ sampleRate: 16000 ,
104+ languageCode: ' en' ,
105+ commitStrategy: ' vad' , // auto-commit on speech end
106+ vadSilenceThresholdSecs: 1.0 ,
107+ });
108+ ```
109+
110+ ### Configuration Options
111+
112+ #### Common Options
113+
114+ | Option | Type | Default | Description |
115+ | --------| ------| ---------| -------------|
116+ | ` apiKey ` | ` string ` | ` process.env.ELEVEN_API_KEY ` | ElevenLabs API key |
117+ | ` baseURL ` | ` string ` | ` https://api.elevenlabs.io/v1 ` | API base URL |
118+ | ` model ` | ` STTModels ` | ` 'scribe_v2_realtime' ` | Model to use |
119+ | ` languageCode ` | ` string ` | ` undefined ` | Language code (auto-detected if not set) |
120+
121+ #### Non-Streaming Options (v1, v2)
122+
123+ | Option | Type | Default | Description |
124+ | --------| ------| ---------| -------------|
125+ | ` tagAudioEvents ` | ` boolean ` | ` true ` | Tag non-speech events like (laughter) |
126+
127+ #### Streaming Options (v2_realtime)
128+
129+ | Option | Type | Default | Description |
130+ | --------| ------| ---------| -------------|
131+ | ` sampleRate ` | ` number ` | ` 16000 ` | Audio sample rate in Hz (16000, 22050, or 44100) |
132+ | ` numChannels ` | ` number ` | ` 1 ` | Number of audio channels |
133+ | ` commitStrategy ` | ` 'vad' \| 'manual' ` | ` 'vad' ` | How to commit transcripts |
134+ | ` vadSilenceThresholdSecs ` | ` number ` | ` undefined ` | VAD silence threshold (0.3-3.0 seconds) |
135+ | ` vadThreshold ` | ` number ` | ` undefined ` | VAD threshold (0.1-0.9) |
136+ | ` minSpeechDurationMs ` | ` number ` | ` undefined ` | Minimum speech duration (50-2000 ms) |
137+ | ` minSilenceDurationMs ` | ` number ` | ` undefined ` | Minimum silence duration (50-2000 ms) |
138+
139+ ### Supported Languages
140+
141+ The STT plugin supports 35+ languages including:
142+
143+ English (` en ` ), Spanish (` es ` ), French (` fr ` ), German (` de ` ), Italian (` it ` ), Portuguese (` pt ` ), Polish (` pl ` ), Dutch (` nl ` ), Swedish (` sv ` ), Finnish (` fi ` ), Danish (` da ` ), Norwegian (` no ` ), Czech (` cs ` ), Romanian (` ro ` ), Slovak (` sk ` ), Ukrainian (` uk ` ), Greek (` el ` ), Turkish (` tr ` ), Russian (` ru ` ), Bulgarian (` bg ` ), Croatian (` hr ` ), Serbian (` sr ` ), Hungarian (` hu ` ), Lithuanian (` lt ` ), Latvian (` lv ` ), Estonian (` et ` ), Japanese (` ja ` ), Chinese (` zh ` ), Korean (` ko ` ), Hindi (` hi ` ), Arabic (` ar ` ), Persian (` fa ` ), Hebrew (` he ` ), Indonesian (` id ` ), Malay (` ms ` ), Thai (` th ` ), Vietnamese (` vi ` ), Tamil (` ta ` ), Urdu (` ur ` )
144+
145+ ### Advanced Usage
146+
147+ #### Custom VAD Parameters
148+
149+ Fine-tune voice activity detection for your use case:
150+
151+ ``` typescript
152+ const stt = new STT ({
153+ model: ' scribe_v2_realtime' ,
154+ commitStrategy: ' vad' ,
155+
156+ // Longer silence before committing (good for thoughtful speakers)
157+ vadSilenceThresholdSecs: 2.0 ,
158+
159+ // Higher threshold = more strict about what's considered speech
160+ vadThreshold: 0.7 ,
161+
162+ // Ignore very short speech bursts (reduce false positives)
163+ minSpeechDurationMs: 200 ,
164+
165+ // Require longer silence to end speech (reduce fragmentation)
166+ minSilenceDurationMs: 500 ,
167+ });
168+ ```
169+
170+ #### Multi-Language Support
171+
172+ Let ElevenLabs auto-detect the language:
173+
174+ ``` typescript
175+ const stt = new STT ({
176+ model: ' scribe_v1' ,
177+ // Don't set languageCode - will auto-detect
178+ });
179+
180+ const event = await stt .recognize (audioBuffer );
181+ console .log (' Detected language:' , event .alternatives [0 ].language );
182+ console .log (' Text:' , event .alternatives [0 ].text );
183+ ```
184+
185+ Or specify a language:
186+
187+ ``` typescript
188+ const stt = new STT ({
189+ model: ' scribe_v2_realtime' ,
190+ languageCode: ' es' , // Spanish
191+ });
192+ ```
193+
194+ ### Model Comparison
195+
196+ | Feature | Scribe v1 | Scribe v2 | Scribe v2 Realtime |
197+ | ---------| -----------| -----------| -------------------|
198+ | ** Type** | Non-streaming | Non-streaming | Streaming |
199+ | ** Latency** | High (batch) | High (batch) | Low (real-time) |
200+ | ** Interim Results** | ❌ | ❌ | ✅ |
201+ | ** Audio Event Tagging** | ✅ | ❌ | ❌ |
202+ | ** VAD Configuration** | ❌ | ❌ | ✅ |
203+ | ** Manual Commit** | ❌ | ❌ | ✅ |
204+ | ** Best For** | Batch jobs with event detection | High-accuracy batch | Real-time conversations |
205+
206+ ---
207+
208+ ## Resources
209+
210+ - [ ElevenLabs TTS Documentation] ( https://elevenlabs.io/docs/api-reference/text-to-speech )
211+ - [ ElevenLabs STT Documentation] ( https://elevenlabs.io/docs/api-reference/speech-to-text )
212+ - [ Scribe v2 Streaming Guide] ( https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming )
213+ - [ LiveKit Agents Documentation] ( https://docs.livekit.io/agents/ )
214+ - [ LiveKit Agents JS Repository] ( https://github.com/livekit/agents-js )
215+
216+ ## License
217+
218+ Copyright 2025 LiveKit, Inc.
219+
220+ Licensed under the Apache License, Version 2.0.
0 commit comments