Skip to content

Mistral Nemo Quantized Support #2727

Open
@leflambeur

Description

@leflambeur

Hi,

I have been learning more about ML, and also Rust, recently and love Candle for giving people the opportunity to use Rust directly.

I have been testing a couple of models and hit a couple of issues - mainly with Quantised Nemo 2407 models as a Q8 Nemo model seems to be the extent my device can handle.

At first I tried writing my own code from the Mistral examples as they mentioned 2407, until I realised from another issue I can't find immediately that it was recommended to use the 'quantized' example instead as the Mistral example was built for a very specific set of models.

The error I get running it either via my own code emulating the quantized example or simply running the 'quantized' example directly is the exact same:

Error: shape mismatch in reshape, lhs: [1, 11, 4096], rhs: [1, 11, 32, 160]

I tested with multiple Nemo 2407 models, notably TheBloke's one, based on the error being the same both in my code and running the example I am guessing it's because 2407 isn't supported with quantization.

Which your readme and this line in the example confirm:
https://github.com/huggingface/candle/blob/main/candle-examples/examples/mistral/main.rs#L266

Unfortunately I don't know enough about tensors and how to deconstruct a GGUF model to figure out what the fix is on my own without some guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions