Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android: Switch default library used for Whisper voice typing #11881

Merged

Conversation

personalizedrefrigerator
Copy link
Collaborator

@personalizedrefrigerator personalizedrefrigerator commented Feb 24, 2025

Summary

whisper.cpp library | demo APK

This pull request switches from onnx-runtime to whisper.cpp. Rather than add whisper.cpp as a submodule (as many of the language bindings/libraries that use it do), its code is copied to app-mobile/android/vendor/.

This library has a few benefits over onnx-runtime:

  • Smaller APK size: The built APK is now 150 MB (compared to 224 MB previously).
  • May fix a crash on 32-bit devices: Crashes related to onnx-runtime-extensions when starting voice typing have been observed on 32-bit Android devices.
  • Model size: Allows users to use larger models than tiny (must change the model URL in settings).
  • Custom prompts/post-processing: Per-locale prompts and postprocessing replacements are now included in the model. Specifying a custom model URL allows users to customize the prompt. See customize the model's prompt/postprocessing.
  • Custom fine-tuning: A notebook is available that can be forked to fine-tune Joplin-compatible Whisper models for different languages/tasks.

Note

At present, this pull request:

  • Only supports Android.
  • Only supports transcribing new audio from the microphone (and not existing attachements/recordings).

To-do

For this pull request:

  • Evaluate accuracy/performance of the model with non-English locales.
    • Changing the initial prompt can affect accuracy. Currently, a prompt is set for English and Spanish, but for no other locales.
    • If not accurate enough in non-English languages, it's possible to fine-tune Whisper. Fine-tuned French models compatible with this pull request can be found in this GitHub release.
  • Documentation: Document using a fine-tuned custom Whisper model with Joplin.
    • Documented in this repository, but it should also be documented in the main Joplin repository.
  • Bug fix: After downloading the model, it's necessary to close and re-open the voice typing dialog.
    • This should be fixed by this commit. However, an automated test for this part of the UI would be useful (to help prevent regressions).
  • Migration: Delete old .onnx files when the user deletes and redownloads local models.
  • Testing: Add automated tests for JavaScript.
  • Testing: Add automated tests for the silence-detection code.

Optional/for a follow-up pull request:

Important

Huge pull request diff: At present, this pull request includes the code of whisper.cpp in a vendor/whisper.cpp folder. This is instead of adding whisper.cpp as a submodule.

Testing

Automated tests

This pull request includes automated tests for:

  • findLongestSilence.cpp, in findLongestSilence_test.cpp:
    • Silence detection: These tests run on Android, in development mode, after opening the voice typing dialog.
  • whisper.ts, in whisper.test.ts:
    • Clearing old models: A test is present to verify that legacy whisper_tiny.onnx models (used by Joplin before this pull requests) are cleared when deleting and re-downloading models.
    • Post-processing: A test is present to verify that post-processing operations included with the model file are applied.


### Downloading the models

By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the **Voice typing language files (URL)** in from the "Note" tab of the configuration screen.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the **Voice typing language files (URL)** in from the "Note" tab of the configuration screen.
By default, Joplin downloads Whisper models from [this GitHub repository](https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases). It's possible to download models from a custom location by changing the **Voice typing language files (URL)** in from the "Note" tab of the configuration screen.

It may make sense to change this URL to something under the joplin GitHub organization.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if we could move them to a repository under github.com/joplin before the final 3.3 release that would be better. If you need access for this please let me know

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need access for this please let me know

I currently don't have the ability to create new repositories in https://github.com/joplin/.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added you to the org now and created the repository https://github.com/joplin/voice-typing-models Let me know if you're able to create releases there as maybe I need to add you to the repository too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating that!

maybe I need to add you to the repository too

At present, I don't seem to have permission to commit to or create releases in joplin/voice-typing-models.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed you to admin, please give it another try

@personalizedrefrigerator personalizedrefrigerator marked this pull request as ready for review February 26, 2025 22:32
@personalizedrefrigerator personalizedrefrigerator changed the title Android: Switch library used for Whisper voice typing backend Android: Switch default library used for Whisper voice typing Feb 27, 2025
@laurent22 laurent22 merged commit 7f51712 into laurent22:dev Feb 27, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants