Skip to content

Add MiMo-V2.5-ASR STT support#719

Open
ailuntx wants to merge 7 commits into
Blaizzy:mainfrom
ailuntx:feat/mimo-v25-asr
Open

Add MiMo-V2.5-ASR STT support#719
ailuntx wants to merge 7 commits into
Blaizzy:mainfrom
ailuntx:feat/mimo-v25-asr

Conversation

@ailuntx

@ailuntx ailuntx commented May 12, 2026

Copy link
Copy Markdown

Add MiMo-V2.5-ASR support to mlx-audio STT.

Changes:

  • add a new mimo_v2_asr model family
  • detect MiMo checkpoints in mlx_audio.stt.load()
  • resolve the external MiMo-Audio-Tokenizer dependency from mlx_manifest.json
  • support both raw HF tokenizer weights and the MLX tokenizer export layout
  • add README and docs index entries

Validation:

  • intention.wav -> Intention.
  • conversational_a.wav -> expected coffee / Kaldi paragraph
  • CLI JSON output path works for MiMo

Closes #718.

@ailuntx ailuntx marked this pull request as ready for review May 12, 2026 11:15

@Blaizzy Blaizzy left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the "MLX" from the name?

The model repo already has tags to identify it as a supported model :)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a separate file for this.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this into sanitize method, like we do for every other model

Comment thread mlx_audio/stt/utils.py Outdated
Comment on lines +30 to +43
def _looks_like_mimo(model_name: list[str] | None, config: dict | None) -> bool:
if config:
architectures = config.get("architectures") or []
if isinstance(architectures, list) and any(
"MiMoV2ASR" in str(arch) for arch in architectures
):
return True
for part in model_name or []:
lowered = part.lower()
if "mimo" in lowered:
return True
return False


Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this

Comment thread mlx_audio/stt/utils.py

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert all changes to this file except line 16

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model will be auto discovered with the simple mapping

@Blaizzy Blaizzy left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall just a few nits before we can merge

@ailuntx

ailuntx commented May 12, 2026

Copy link
Copy Markdown
Author

Addressed.

Changes in this update:

  • removed the separate weight_loader.py
  • reverted the custom mlx_audio/stt/utils.py loading path and kept a simple mimo remapping entry
  • changed the displayed model name to MiMo-V2.5-ASR

I also added a small generic load_weights_from_path hook in base_load_model.
Without that hook, the default merged-weight path was spiking memory on the sharded MiMo checkpoint and hanging during local validation. The MiMo model now loads its main shards sequentially while still using the standard load() entry point.

Re-validated locally after the change:

  • load(.../MiMo-V2.5-ASR-MLX)
  • intention.wav -> Intention.
  • CLI JSON output still works

Blaizzy added 2 commits May 30, 2026 20:02
Signed-off-by: Prince Canuma <prince.gdt@gmail.com>
@Blaizzy

Blaizzy commented May 30, 2026

Copy link
Copy Markdown
Owner

Thanks @ailuntx for the patience!

To make it easier, I will make the changes and you can look at what to improve in future PRs.

I would suggest asking the agent to follow the format of other existing models and not edit core files like the utils and loading paths.

@ailuntx

ailuntx commented Jun 1, 2026

Copy link
Copy Markdown
Author

Updated the branch to keep the MiMo changes within the model integration path.

Changes since your comment:

  • removed the base_load_model / core loading-path hook
  • moved MiMo key mapping into mimo_v2_asr.Model.sanitize
  • added model_quant_predicate so the existing loader handles quantization
  • ran pre-commit run --all-files successfully
  • ran python -m compileall -q mlx_audio/stt/models/mimo_v2_asr mlx_audio/utils.py mlx_audio/stt/utils.py

The latest workflow runs are currently marked action_required, so they need approval before GitHub runs them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support MiMo-V2.5-ASR

2 participants