Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Some voice files report errors when using the MDXC model #189

Open
1 task done
Yale-7 opened this issue Mar 12, 2025 · 0 comments
Open
1 task done

[Bug]: Some voice files report errors when using the MDXC model #189

Yale-7 opened this issue Mar 12, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Yale-7
Copy link

Yale-7 commented Mar 12, 2025

Describe the bug

When I use some models in MDXC, such as "model_bs_roformer_ep_368_sdr_12.9628.ckpt"、
"mel_band_roformer_kim_ft2_bleedless_unwa.ckpt" , I get errors when running. I think it is related to the sampling rate, because I have set the loading model before, and the sampling rate is 16000, and many models have errors. After changing to the default sampling rate, some models will be executed successfully, such as "MDX23C-8KFFT-InstVoc_HQ.ckpt", but some models will also have errors. The attachment is the original video example.

Have you searched for existing issues? 🔎

  • I have searched and found no existing issues.

Screenshots or Videos

https://github.com/user-attachments/assets/7b583840-517f-4b5f-beea-38d641979f0a
https://github.com/user-attachments/assets/b8ead5d2-4c34-448b-a53c-4531c50c9529

Logs

2025-03-12 14:39:37.255 | INFO     | __main__:extract_audio_from_video:25 - Extracting audio from /workspace/a.mp4: ffmpeg -i /workspace/a.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 2 ./output/audio_20250312_143937_253147.wav
2025-03-12 14:39:37.319 | INFO     | __main__:extract_audio_from_video:27 - Extracting successfully: ./output/audio_20250312_143937_253147.wav
2025-03-12 14:39:37,321 - INFO - separator - Separator version 0.30.1 instantiating with output_dir: ./output/, output_format: WAV
2025-03-12 14:39:37,321 - INFO - separator - Using model directory from model_file_dir parameter: /workspace/Audio-Tools/audio-separator-models
2025-03-12 14:39:37,322 - INFO - separator - Operating System: Linux #144-Ubuntu SMP Fri Feb 7 20:47:38 UTC 2025
2025-03-12 14:39:37,326 - INFO - separator - System: Linux Node: f82d4d9b4fed3-drtbm Release: 5.15.0-133-generic Machine: x86_64 Proc: x86_64
2025-03-12 14:39:37,326 - INFO - separator - Python Version: 3.10.13
2025-03-12 14:39:37,326 - INFO - separator - PyTorch Version: 2.6.0+cu124
2025-03-12 14:39:37,364 - INFO - separator - FFmpeg installed: ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
2025-03-12 14:39:37,366 - INFO - separator - ONNX Runtime GPU package installed with version: 1.21.0
2025-03-12 14:39:37,454 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2025-03-12 14:39:37,455 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2025-03-12 14:39:37,455 - INFO - separator - Loading model mel_band_roformer_kim_ft2_bleedless_unwa.ckpt...
2025-03-12 14:39:44,810 - INFO - mdxc_separator - MDXC Separator initialisation complete
2025-03-12 14:39:44,811 - INFO - separator - Load model duration: 00:00:07
2025-03-12 14:39:44,811 - INFO - separator - Starting separation process for audio_file_path: /workspace/a.mp4
  0%|                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/workspace/Audio-Tools/test.py", line 113, in <module>
    vocals_file_path, instrumental_file_path = separate_audio_pipeline(file_path, output_dir)
  File "/workspace/Audio-Tools/test.py", line 103, in separate_audio_pipeline
    vocals_file_path, instrumental_file_path = separate_audio(file_path, output_dir, model_name, output_type)
  File "/workspace/Audio-Tools/test.py", line 81, in separate_audio
    output_files = separator.separate(audio_path, output_names)
  File "/opt/conda/envs/audio-tools/lib/python3.10/site-packages/audio_separator/separator/separator.py", line 769, in separate
    output_files = self.model_instance.separate(audio_file_path, custom_output_names)
  File "/opt/conda/envs/audio-tools/lib/python3.10/site-packages/audio_separator/separator/architectures/mdxc_separator.py", line 138, in separate
    source = self.demix(mix=mix)
  File "/opt/conda/envs/audio-tools/lib/python3.10/site-packages/audio_separator/separator/architectures/mdxc_separator.py", line 258, in demix
    result = self.overlap_add(result, x, window, result.shape[-1] - chunk_size, length)
  File "/opt/conda/envs/audio-tools/lib/python3.10/site-packages/audio_separator/separator/architectures/mdxc_separator.py", line 194, in overlap_add
    result[..., start : start + length] += x[..., :length] * weights[:length]
RuntimeError: The size of tensor a (288414) must match the size of tensor b (352800) at non-singleton dimension 1

System Info

Operating System: ubuntu-2204
Python version: python 3.10
Other...

Additional Information

No response

@Yale-7 Yale-7 added the bug Something isn't working label Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant