You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: bundles/org.openhab.voice.whisperstt/README.md
+33-8
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,8 @@ It also uses [libfvad](https://github.com/dpirch/libfvad) for voice activity det
5
5
6
6
[Whisper.cpp](https://github.com/ggerganov/whisper.cpp) is a high-optimized lightweight c++ implementation of [whisper](https://github.com/openai/whisper) that allows to easily integrate it in different platforms and applications.
7
7
8
+
Alternatively, if you do not want to perform speech-to-text on the computer hosting openHAB, this add-on can consume an OpenAI/Whisper compatible transcription API.
9
+
8
10
Whisper enables speech recognition for multiple languages and dialects:
This add-on uses some native binaries to work when performing offline recognition.
21
25
You can find here the used [whisper.cpp Java wrapper](https://github.com/GiviMAD/whisper-jni) and [libfvad Java wrapper](https://github.com/GiviMAD/libfvad-jni).
22
26
23
27
The following platforms are supported:
@@ -28,7 +32,7 @@ The following platforms are supported:
28
32
29
33
The native binaries for those platforms are included in this add-on provided with the openHAB distribution.
30
34
31
-
## CPU compatibility
35
+
###CPU compatibility
32
36
33
37
To use this binding it's recommended to use a device at least as powerful as the RaspberryPI 5 with a modern CPU.
34
38
The execution times on Raspberry PI 4 are x2, so just the tiny model can be run on under 5 seconds.
@@ -40,18 +44,18 @@ You can check those flags on Windows using a program like `CPU-Z`.
40
44
If you are going to use the binding in a `arm64` host the CPU should support the flags: `fphp`.
41
45
You can check those flags on linux using the terminal with `lscpu`.
42
46
43
-
## Transcription time
47
+
###Transcription time
44
48
45
49
On a Raspberry PI 5, the approximate transcription times are:
46
50
47
51
| model | exec time |
48
-
|----------|--------: |
52
+
|------------|----------:|
49
53
| tiny.bin | 1.5s |
50
54
| base.bin | 3s |
51
55
| small.bin | 8.5s |
52
56
| medium.bin | 17s |
53
57
54
-
## Configuring the model
58
+
###Configuring the model
55
59
56
60
Before you can use this service you should configure your model.
57
61
@@ -64,7 +68,7 @@ You should place the downloaded .bin model in '\<openHAB userdata\>/whisper/' so
64
68
65
69
Remember to check that you have enough RAM to load the model, estimated RAM consumption can be checked on the huggingface link.
66
70
67
-
## Using alternative whisper.cpp library
71
+
###Using alternative whisper.cpp library
68
72
69
73
It's possible to use your own build of the whisper.cpp shared library with this add-on.
70
74
@@ -76,7 +80,7 @@ In the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) README you can fi
76
80
77
81
Note: You need to restart openHAB to reload the library.
78
82
79
-
## Grammar
83
+
###Grammar
80
84
81
85
The whisper.cpp library allows to define a grammar to alter the transcription results without fine-tuning the model.
You can provide the grammar and enable its usage using the binding configuration.
101
105
106
+
## API mode
107
+
108
+
You can also use this add-on with a remote API that is compatible with the 'transcription' API from OpenAI. Online services exposing such an API may require an API key (paid services, such as OpenAI).
109
+
110
+
You can host you own compatible service elsewhere on your network, with third-party software such as faster-whisper-server.
111
+
112
+
Please note that API mode also uses libvfad for voice activity detection, and that grammar parameters are not available.
113
+
102
114
## Configuration
103
115
104
116
Use your favorite configuration UI to edit the Whisper settings:
@@ -107,6 +119,7 @@ Use your favorite configuration UI to edit the Whisper settings:
107
119
108
120
General options.
109
121
122
+
-**Mode : LOCAL or API** - Choose either local computation or remote API use.
110
123
-**Model Name** - Model name. The 'ggml-' prefix and '.bin' extension are optional here but required on the filename. (ex: tiny.en -> ggml-tiny.en.bin)
111
124
-**Preload Model** - Keep whisper model loaded.
112
125
-**Single Utterance Mode** - When enabled recognition stops listening after a single utterance.
@@ -139,6 +152,13 @@ Configure whisper options.
139
152
-**Initial Prompt** - Initial prompt for whisper.
140
153
-**OpenVINO Device** - Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
141
154
-**Use GPU** - Enables GPU usage. (built-in binaries do not support GPU usage, this has no effect)
155
+
-**Language** - If specified, speed up recognition by avoiding auto-detection. Default to system locale.
156
+
157
+
### API Configuration
158
+
159
+
-**API key** - Optional use of an API key for online services requiring it.
160
+
-**API url** - You may use your own service and define its URL here. Default set to OpenAI transcription API.
161
+
-**API model name** - Your hosted service may have other models. Default to OpenAI only model 'whisper-1'.
142
162
143
163
### Grammar Configuration
144
164
@@ -199,7 +219,9 @@ In case you would like to set up the service via a text file, create a new file
Copy file name to clipboardexpand all lines: bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTConfiguration.java
+25
Original file line number
Diff line number
Diff line change
@@ -146,4 +146,29 @@ public class WhisperSTTConfiguration {
146
146
* Print whisper.cpp library logs as binding debug logs.
147
147
*/
148
148
publicbooleanenableWhisperLog;
149
+
/**
150
+
* local to use embedded whisper or openaiapi to use an external API
0 commit comments