Mistral Nemo Quantized Support

Hi,

I have been learning more about ML, and also Rust, recently and love Candle for giving people the opportunity to use Rust directly.

I have been testing a couple of models and hit a couple of issues - mainly with Quantised Nemo 2407 models as a Q8 Nemo model seems to be the extent my device can handle.

At first I tried writing my own code from the Mistral examples as they mentioned 2407, until I realised from another issue I can't find immediately that it was recommended to use the 'quantized' example instead as the Mistral example was built for a very specific set of models.

The error I get running it either via my own code emulating the quantized example or simply running the 'quantized' example directly is the exact same:

```
Error: shape mismatch in reshape, lhs: [1, 11, 4096], rhs: [1, 11, 32, 160]
```

I tested with multiple Nemo 2407 models, notably TheBloke's one, based on the error being the same both in my code and running the example I am guessing it's because 2407 isn't supported with quantization.

Which your readme and this line in the example confirm:
https://github.com/huggingface/candle/blob/main/candle-examples/examples/mistral/main.rs#L266

Unfortunately I don't know enough about tensors and how to deconstruct a GGUF model to figure out what the fix is on my own without some guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mistral Nemo Quantized Support #2727

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mistral Nemo Quantized Support #2727

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions