Skip to content

Commit fd63e11

Browse files
authored
[whisper] Add OpenAI API compatibility (openhab#17921)
* [whisper] Add OpenAI API compatibility Signed-off-by: Gwendal Roulleau <[email protected]>
1 parent d0bac82 commit fd63e11

File tree

5 files changed

+303
-97
lines changed

5 files changed

+303
-97
lines changed

bundles/org.openhab.voice.whisperstt/README.md

+33-8
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ It also uses [libfvad](https://github.com/dpirch/libfvad) for voice activity det
55

66
[Whisper.cpp](https://github.com/ggerganov/whisper.cpp) is a high-optimized lightweight c++ implementation of [whisper](https://github.com/openai/whisper) that allows to easily integrate it in different platforms and applications.
77

8+
Alternatively, if you do not want to perform speech-to-text on the computer hosting openHAB, this add-on can consume an OpenAI/Whisper compatible transcription API.
9+
810
Whisper enables speech recognition for multiple languages and dialects:
911

1012
english, chinese, german, spanish, russian, korean, french, japanese, portuguese, turkish, polish, catalan, dutch, arabic, swedish,
@@ -15,9 +17,11 @@ marathi, punjabi, sinhala, khmer, shona, yoruba, somali, afrikaans, occitan, geo
1517
uzbek, faroese, haitian, pashto, turkmen, nynorsk, maltese, sanskrit, luxembourgish, myanmar, tibetan, tagalog, malagasy, assamese, tatar, lingala,
1618
hausa, bashkir, javanese and sundanese.
1719

18-
## Supported platforms
20+
## Local mode (offline)
21+
22+
### Supported platforms
1923

20-
This add-on uses some native binaries to work.
24+
This add-on uses some native binaries to work when performing offline recognition.
2125
You can find here the used [whisper.cpp Java wrapper](https://github.com/GiviMAD/whisper-jni) and [libfvad Java wrapper](https://github.com/GiviMAD/libfvad-jni).
2226

2327
The following platforms are supported:
@@ -28,7 +32,7 @@ The following platforms are supported:
2832

2933
The native binaries for those platforms are included in this add-on provided with the openHAB distribution.
3034

31-
## CPU compatibility
35+
### CPU compatibility
3236

3337
To use this binding it's recommended to use a device at least as powerful as the RaspberryPI 5 with a modern CPU.
3438
The execution times on Raspberry PI 4 are x2, so just the tiny model can be run on under 5 seconds.
@@ -40,18 +44,18 @@ You can check those flags on Windows using a program like `CPU-Z`.
4044
If you are going to use the binding in a `arm64` host the CPU should support the flags: `fphp`.
4145
You can check those flags on linux using the terminal with `lscpu`.
4246

43-
## Transcription time
47+
### Transcription time
4448

4549
On a Raspberry PI 5, the approximate transcription times are:
4650

4751
| model | exec time |
48-
| ---------- | --------: |
52+
|------------|----------:|
4953
| tiny.bin | 1.5s |
5054
| base.bin | 3s |
5155
| small.bin | 8.5s |
5256
| medium.bin | 17s |
5357

54-
## Configuring the model
58+
### Configuring the model
5559

5660
Before you can use this service you should configure your model.
5761

@@ -64,7 +68,7 @@ You should place the downloaded .bin model in '\<openHAB userdata\>/whisper/' so
6468

6569
Remember to check that you have enough RAM to load the model, estimated RAM consumption can be checked on the huggingface link.
6670

67-
## Using alternative whisper.cpp library
71+
### Using alternative whisper.cpp library
6872

6973
It's possible to use your own build of the whisper.cpp shared library with this add-on.
7074

@@ -76,7 +80,7 @@ In the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) README you can fi
7680

7781
Note: You need to restart openHAB to reload the library.
7882

79-
## Grammar
83+
### Grammar
8084

8185
The whisper.cpp library allows to define a grammar to alter the transcription results without fine-tuning the model.
8286

@@ -99,6 +103,14 @@ tv_channel ::= ("set ")? "tv channel to " [0-9]+
99103

100104
You can provide the grammar and enable its usage using the binding configuration.
101105

106+
## API mode
107+
108+
You can also use this add-on with a remote API that is compatible with the 'transcription' API from OpenAI. Online services exposing such an API may require an API key (paid services, such as OpenAI).
109+
110+
You can host you own compatible service elsewhere on your network, with third-party software such as faster-whisper-server.
111+
112+
Please note that API mode also uses libvfad for voice activity detection, and that grammar parameters are not available.
113+
102114
## Configuration
103115

104116
Use your favorite configuration UI to edit the Whisper settings:
@@ -107,6 +119,7 @@ Use your favorite configuration UI to edit the Whisper settings:
107119

108120
General options.
109121

122+
- **Mode : LOCAL or API** - Choose either local computation or remote API use.
110123
- **Model Name** - Model name. The 'ggml-' prefix and '.bin' extension are optional here but required on the filename. (ex: tiny.en -> ggml-tiny.en.bin)
111124
- **Preload Model** - Keep whisper model loaded.
112125
- **Single Utterance Mode** - When enabled recognition stops listening after a single utterance.
@@ -139,6 +152,13 @@ Configure whisper options.
139152
- **Initial Prompt** - Initial prompt for whisper.
140153
- **OpenVINO Device** - Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
141154
- **Use GPU** - Enables GPU usage. (built-in binaries do not support GPU usage, this has no effect)
155+
- **Language** - If specified, speed up recognition by avoiding auto-detection. Default to system locale.
156+
157+
### API Configuration
158+
159+
- **API key** - Optional use of an API key for online services requiring it.
160+
- **API url** - You may use your own service and define its URL here. Default set to OpenAI transcription API.
161+
- **API model name** - Your hosted service may have other models. Default to OpenAI only model 'whisper-1'.
142162

143163
### Grammar Configuration
144164

@@ -199,7 +219,9 @@ In case you would like to set up the service via a text file, create a new file
199219
Its contents should look similar to:
200220

201221
```ini
222+
org.openhab.voice.whisperstt:mode=LOCAL
202223
org.openhab.voice.whisperstt:modelName=tiny
224+
org.openhab.voice.whisperstt:language=en
203225
org.openhab.voice.whisperstt:initSilenceSeconds=0.3
204226
org.openhab.voice.whisperstt:removeSilence=true
205227
org.openhab.voice.whisperstt:stepSeconds=0.3
@@ -229,6 +251,9 @@ org.openhab.voice.whisperstt:useGPU=false
229251
org.openhab.voice.whisperstt:useGrammar=false
230252
org.openhab.voice.whisperstt:grammarPenalty=80.0
231253
org.openhab.voice.whisperstt:grammarLines=
254+
org.openhab.voice.whisperstt:apiKey=mykeyaaaa
255+
org.openhab.voice.whisperstt:apiUrl=https://api.openai.com/v1/audio/transcriptions
256+
org.openhab.voice.whisperstt:apiModelName=whisper-1
232257
```
233258

234259
### Default Speech-to-Text Configuration

bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTConfiguration.java

+25
Original file line numberDiff line numberDiff line change
@@ -146,4 +146,29 @@ public class WhisperSTTConfiguration {
146146
* Print whisper.cpp library logs as binding debug logs.
147147
*/
148148
public boolean enableWhisperLog;
149+
/**
150+
* local to use embedded whisper or openaiapi to use an external API
151+
*/
152+
public Mode mode = Mode.LOCAL;
153+
/**
154+
* If mode set to openaiapi, then use this URL
155+
*/
156+
public String apiUrl = "https://api.openai.com/v1/audio/transcriptions";
157+
/**
158+
* if mode set to openaiapi, use this api key to access apiUrl
159+
*/
160+
public String apiKey = "";
161+
/**
162+
* If specified, speed up recognition by avoiding auto-detection
163+
*/
164+
public String language = "";
165+
/**
166+
* Model name (API only)
167+
*/
168+
public String apiModelName = "whisper-1";
169+
170+
public static enum Mode {
171+
LOCAL,
172+
API;
173+
}
149174
}

0 commit comments

Comments
 (0)