Android: Switch default library used for Whisper voice typing #11881

personalizedrefrigerator · 2025-02-24T08:07:39Z

Summary

This pull request switches from onnx-runtime to whisper.cpp. Rather than add whisper.cpp as a submodule (as many of the language bindings/libraries that use it do), its code is copied to app-mobile/android/vendor/.

This library has a few benefits over onnx-runtime:

Smaller APK size: The built APK is now 150 MB (compared to 224 MB previously).
May fix a crash on 32-bit devices: Crashes related to onnx-runtime-extensions when starting voice typing have been observed on 32-bit Android devices.
Model size: Allows users to use larger models than tiny (must change the model URL in settings).
Custom prompts/post-processing: Per-locale prompts and postprocessing replacements are now included in the model. Specifying a custom model URL allows users to customize the prompt. See customize the model's prompt/postprocessing.
Custom fine-tuning: A notebook is available that can be forked to fine-tune Joplin-compatible Whisper models for different languages/tasks.

Note

At present, this pull request:

Only supports Android.
Only supports transcribing new audio from the microphone (and not existing attachements/recordings).

To-do

For this pull request:

Evaluate accuracy/performance of the model with non-English locales.
- Changing the initial prompt can affect accuracy. Currently, a prompt is set for English and Spanish, but for no other locales.
- If not accurate enough in non-English languages, it's possible to fine-tune Whisper. Fine-tuned French models compatible with this pull request can be found in this GitHub release.
Documentation: Document using a fine-tuned custom Whisper model with Joplin.
- Documented in this repository, but it should also be documented in the main Joplin repository.
Bug fix: After downloading the model, it's necessary to close and re-open the voice typing dialog.
- This should be fixed by this commit. However, an automated test for this part of the UI would be useful (to help prevent regressions).
Migration: Delete old .onnx files when the user deletes and redownloads local models.
Testing: Add automated tests for JavaScript.
Testing: Add automated tests for the silence-detection code.

Optional/for a follow-up pull request:

Downloading models: Currently, models are downloaded from https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases. Should these instead be downloaded from a different location?
Auto-select the best model: The ggml-tiny-q8_0 model is currently used on all devices. On faster devices, the ggml-base-q8_0 model should have better performance.
Thread count: The upstream whisper.cpp Android example has custom logic for determining the number of threads used to run Whisper. Should Joplin do something similar?

Important

Huge pull request diff: At present, this pull request includes the code of whisper.cpp in a vendor/whisper.cpp folder. This is instead of adding whisper.cpp as a submodule.

Testing

Automated tests

This pull request includes automated tests for:

findLongestSilence.cpp, in findLongestSilence_test.cpp:
- Silence detection: These tests run on Android, in development mode, after opening the voice typing dialog.
whisper.ts, in whisper.test.ts:
- Clearing old models: A test is present to verify that legacy whisper_tiny.onnx models (used by Joplin before this pull requests) are cleared when deleting and re-downloading models.
- Post-processing: A test is present to verify that post-processing operations included with the model file are applied.

…ty detector)

…isper-lib

…g models

than hardcoded with Joplin)

personalizedrefrigerator · 2025-02-25T00:27:20Z

readme/dev/spec/voice_typing.md

+
+### Downloading the models
+
+By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the **Voice typing language files (URL)** in from the "Note" tab of the configuration screen.


Suggested change

By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the **Voice typing language files (URL)** in from the "Note" tab of the configuration screen.

By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the **Voice typing language files (URL)** in from the "Note" tab of the configuration screen.

It may make sense to change this URL to something under the joplin GitHub organization.

Yes if we could move them to a repository under github.com/joplin before the final 3.3 release that would be better. If you need access for this please let me know

If you need access for this please let me know

I currently don't have the ability to create new repositories in https://github.com/joplin/.

I've added you to the org now and created the repository https://github.com/joplin/voice-typing-models Let me know if you're able to create releases there as maybe I need to add you to the repository too

Thanks for creating that!

maybe I need to add you to the repository too

At present, I don't seem to have permission to commit to or create releases in joplin/voice-typing-models.

I've changed you to admin, please give it another try

personalizedrefrigerator added 30 commits February 18, 2025 15:34

Android: Whisper voice typing with whisper.cpp

b129cad

Attempting to improve segment joining with prompting

4401ffc

Refactoring

40a4167

WIP: Trying to change how Whisper is broken into chunks (voice activi…

a985bec

…ty detector)

WIP: Draft implementation of splitting on silence

1581129

Mostly working: Splitting text into chunks (paragraphs) based on silence

28a46eb

Attempting to improve silence-finding logic

cddb54f

Adjusting model and parameters

565c8bd

Refactoring

5ecb521

Vendor whisper.cpp

d0cae87

Adjusting voice typing parameters

42a33c7

Merge remote-tracking branch 'upstream/dev' into pr/android/switch-wh…

9b2f663

…isper-lib

Default to the tiny model

09b94b5

Accuracy: Customize prompt for English, French, & Spanish

0e0ba23

Update licenses

453e1d2

Convert indentation to tabs

a2bebf1

Safer communication with C++

77a77f6

Remove TODO

ba89f95

Remove unused submodule

9fe1d85

Better error handling

017a80d

spaces -> tabs

15d90f9

Merge remote-tracking branch 'upstream/dev' into pr/android/switch-wh…

6d8b6af

…isper-lib

Remove prompt for French

122512c

Don't override the default number of threads

c84106b

Improve silence detection

93e3a06

Tests for silence detection

5d23174

Re-enable temperature increase when decoding fails

7ab7833

Make silence detection less sensitive

74f26cb

Clear old .onnx models (along with old ggml models) when redownloadin…

340d5a4

…g models

Include prompts, word replacements with the downloaded model (rather

317a008

than hardcoded with Joplin)

personalizedrefrigerator added 3 commits February 24, 2025 14:19

Testing: Verify that custom post-processing replacements are applied

b5deaa4

Improve error handling

0909f08

Update voice_typing.md with information about whisper.cpp

cd6f518

personalizedrefrigerator commented Feb 25, 2025

View reviewed changes

personalizedrefrigerator and others added 4 commits February 24, 2025 21:39

Disable timestamps to simplify creating custom Whisper models

f5db141

Update default model URL

02cc4ee

Document the whisper.cpp directory

f9bea0d

Merge branch 'dev' into pr/android/switch-whisper-lib

744e304

personalizedrefrigerator marked this pull request as ready for review February 26, 2025 22:32

personalizedrefrigerator added the android label Feb 27, 2025

personalizedrefrigerator changed the title ~~Android: Switch library used for Whisper voice typing backend~~ Android: Switch default library used for Whisper voice typing Feb 27, 2025

laurent22 merged commit 7f51712 into laurent22:dev Feb 27, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android: Switch default library used for Whisper voice typing #11881

Android: Switch default library used for Whisper voice typing #11881

personalizedrefrigerator commented Feb 24, 2025 •

edited

Loading

personalizedrefrigerator Feb 25, 2025

laurent22 Feb 27, 2025

personalizedrefrigerator Feb 27, 2025

laurent22 Feb 28, 2025

personalizedrefrigerator Feb 28, 2025

laurent22 Feb 28, 2025


		### Downloading the models

		By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the Voice typing language files (URL) in from the "Note" tab of the configuration screen.

Android: Switch default library used for Whisper voice typing #11881

Android: Switch default library used for Whisper voice typing #11881

Conversation

personalizedrefrigerator commented Feb 24, 2025 • edited Loading

Summary

To-do

Testing

Automated tests

personalizedrefrigerator Feb 25, 2025

Choose a reason for hiding this comment

laurent22 Feb 27, 2025

Choose a reason for hiding this comment

personalizedrefrigerator Feb 27, 2025

Choose a reason for hiding this comment

laurent22 Feb 28, 2025

Choose a reason for hiding this comment

personalizedrefrigerator Feb 28, 2025

Choose a reason for hiding this comment

laurent22 Feb 28, 2025

Choose a reason for hiding this comment

personalizedrefrigerator commented Feb 24, 2025 •

edited

Loading