Description
Hi,
I have been learning more about ML, and also Rust, recently and love Candle for giving people the opportunity to use Rust directly.
I have been testing a couple of models and hit a couple of issues - mainly with Quantised Nemo 2407 models as a Q8 Nemo model seems to be the extent my device can handle.
At first I tried writing my own code from the Mistral examples as they mentioned 2407, until I realised from another issue I can't find immediately that it was recommended to use the 'quantized' example instead as the Mistral example was built for a very specific set of models.
The error I get running it either via my own code emulating the quantized example or simply running the 'quantized' example directly is the exact same:
Error: shape mismatch in reshape, lhs: [1, 11, 4096], rhs: [1, 11, 32, 160]
I tested with multiple Nemo 2407 models, notably TheBloke's one, based on the error being the same both in my code and running the example I am guessing it's because 2407 isn't supported with quantization.
Which your readme and this line in the example confirm:
https://github.com/huggingface/candle/blob/main/candle-examples/examples/mistral/main.rs#L266
Unfortunately I don't know enough about tensors and how to deconstruct a GGUF model to figure out what the fix is on my own without some guidance.