Skip to content

clblast gpu support error #1738

Open
Open
@ningpengtao-coder

Description

@ningpengtao-coder

windows tiny:
(base) PS F:\githubsources\whisper.cpp> .\build\bin\Release\main.exe -m F:\Downloads\ggml-tiny.en.bin -l auto F:\githubsources\whisper.cpp\samples\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'F:\Downloads\ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce GTX 1650'
ggml_opencl: device FP16 support: false
whisper_model_load: CPU buffer size = 77.18 MB
whisper_model_load: model size = 77.11 MB
whisper_init_state: kv self size = 8.26 MB
whisper_init_state: kv cross size = 9.22 MB
whisper_init_state: compute buffer (conv) = 12.17 MB
whisper_init_state: compute buffer (encode) = 64.92 MB
whisper_init_state: compute buffer (cross) = 4.01 MB
whisper_init_state: compute buffer (decode) = 96.02 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

main: WARNING: model is not multilingual, ignoring language and translation options
main: processing 'F:\githubsources\whisper.cpp\samples\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:07.960] And so my fellow Americans ask not what your country can do for you
[00:00:07.960 --> 00:00:10.760] ask what you can do for your country.

whisper_print_timings: load time = 1354.68 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 12.33 ms
whisper_print_timings: sample time = 103.26 ms / 139 runs ( 0.74 ms per run)
whisper_print_timings: encode time = 395.21 ms / 1 runs ( 395.21 ms per run)
whisper_print_timings: decode time = 7.05 ms / 2 runs ( 3.52 ms per run)
whisper_print_timings: batchd time = 166.18 ms / 133 runs ( 1.25 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 2045.60 ms

windows small:
(base) PS F:\githubsources\whisper.cpp> .\build\bin\Release\main.exe -m .\models\ggml-small-q4_k.bin -l auto F:\githubsources\whisper.cpp\samples\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '.\models\ggml-small-q4_k.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 12
whisper_model_load: qntvr = 2
whisper_model_load: type = 3 (small)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce GTX 1650'
ggml_opencl: device FP16 support: false
whisper_model_load: CPU buffer size = 145.05 MB
whisper_model_load: model size = 144.86 MB
whisper_init_state: kv self size = 49.55 MB
whisper_init_state: kv cross size = 55.30 MB
whisper_init_state: compute buffer (conv) = 20.23 MB
whisper_init_state: compute buffer (encode) = 128.14 MB
whisper_init_state: compute buffer (cross) = 6.31 MB
whisper_init_state: compute buffer (decode) = 97.40 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'F:\githubsources\whisper.cpp\samples\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: en (p = 0.472618)

[00:00:00.000 --> 00:00:30.000] .

whisper_print_timings: load time = 1398.95 ms
whisper_print_timings: fallbacks = 1 p / 1 h
whisper_print_timings: mel time = 12.33 ms
whisper_print_timings: sample time = 398.21 ms / 312 runs ( 1.28 ms per run)
whisper_print_timings: encode time = 4430.49 ms / 2 runs ( 2215.24 ms per run)
whisper_print_timings: decode time = 1839.86 ms / 227 runs ( 8.11 ms per run)
whisper_print_timings: batchd time = 456.34 ms / 82 runs ( 5.57 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 8543.37 ms

Only the results of tiny.en in windows nvidia clblast are correct, and the results of any model running on android are incorrect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions