Scaleway catalog

Multimodal models (chat + vision)

More details to be added.

Chat models

Provider	Model identifier	Documentation	License
Allen AI	`molmo-72b-0924`	View Details	Apache 2.0 license
Deepseek	`deepseek-r1-distill-llama-70b`	View Details	MIT license
Deepseek	`deepseek-r1-distill-llama-8b`	View Details	MIT license
Meta	`llama-3-70b-instruct`	View Details	Llama 3 license
Meta	`llama-3-8b-instruct`	View Details	Llama 3 license
Meta	`llama-3.1-70b-instruct`	View Details	Llama 3.1 community license
Meta	`llama-3.1-8b-instruct`	View Details	Llama 3.1 license
Meta	`llama-3.3-70b-instruct`	View Details	Llama 3.3 license
Nvidia	`llama-3.1-nemotron-70b-instruct`	View Details	Llama 3.1 community license
Mistral	`mixtral-8x7b-instruct-v0.1`	View Details	Apache 2.0 license
Mistral	`mistral-7b-instruct-v0.3`	View Details	Apache 2.0 license
Mistral	`mistral-nemo-instruct-2407`	View Details	Apache 2.0 license
Mistral	`mistral-small-24b-instruct-2501`	View Details	Apache 2.0 license
Mistral	`pixtral-12b-2409`	View Details	Apache 2.0 license
Qwen	`qwen2.5-coder-32b-instruct`	View Details	Apache 2.0 license

Vision models

More details to be added.

Embedding models

Provider	Model identifier	Documentation	License
BAAI	`bge-multilingual-gemma2`	View Details	Gemma Terms of Use
Sentence Transformers	`sentence-t5-xxl`	View Details	Apache 2.0 license

Custom models

Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).

Prerequisites

We recommend starting with a variation of a supported model from the Scaleway catalog. For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.

To deploy a custom model via Hugging Face, ensure the following:

Access requirements

You must have access to the model using your Hugging Face credentials.
For gated models, request access through your Hugging Face account.
Credentials are not stored, but we recommend using read or fine-grained access tokens.

Required files

Your model repository must include:

A config.json file containig:
- An architectures array (see supported architectures for the exact list of supported values).
- max_position_embeddings
Model weights in the .safetensors format
A chat template included in either:
- tokenizer_config.json as a chat_template field, or
- chat_template.json as a chat_template field

Supported model types

Your model must be one of the following types:

chat
vision
multimodal (chat + vision)
embedding

**Security Notice**
Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.

API support

Depending on the model type, specific endpoints and features will be supported.

Chat models

The Chat API will be exposed for this model under /v1/chat/completions endpoint. Structured outputs or Function calling are not yet supported for custom models.

Vision models

Chat API will be exposed for this model under /v1/chat/completions endpoint. Structured outputs or Function calling are not yet supported for custom models.

Multimodal models

These models will be treated similarly to both Chat and Vision models.

Embedding models

Embeddings API will be exposed for this model under /v1/embeddings endpoint.

Custom model lifecycle

Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.

Licensing

When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.

Supported model architectures

Custom models must conform to one of the architectures listed below. Click to expand full list.

## Supported custom model architectures Custom model deployment currently supports the following model architectures: * `AquilaModel` * `AquilaForCausalLM` * `ArcticForCausalLM` * `BaiChuanForCausalLM` * `BaichuanForCausalLM` * `BloomForCausalLM` * `CohereForCausalLM` * `Cohere2ForCausalLM` * `DbrxForCausalLM` * `DeciLMForCausalLM` * `DeepseekForCausalLM` * `DeepseekV2ForCausalLM` * `DeepseekV3ForCausalLM` * `ExaoneForCausalLM` * `FalconForCausalLM` * `Fairseq2LlamaForCausalLM` * `GemmaForCausalLM` * `Gemma2ForCausalLM` * `GlmForCausalLM` * `GPT2LMHeadModel` * `GPTBigCodeForCausalLM` * `GPTJForCausalLM` * `GPTNeoXForCausalLM` * `GraniteForCausalLM` * `GraniteMoeForCausalLM` * `GritLM` * `InternLMForCausalLM` * `InternLM2ForCausalLM` * `InternLM2VEForCausalLM` * `InternLM3ForCausalLM` * `JAISLMHeadModel` * `JambaForCausalLM` * `LlamaForCausalLM` * `LLaMAForCausalLM` * `MambaForCausalLM` * `FalconMambaForCausalLM` * `MiniCPMForCausalLM` * `MiniCPM3ForCausalLM` * `MistralForCausalLM` * `MixtralForCausalLM` * `QuantMixtralForCausalLM` * `MptForCausalLM` * `MPTForCausalLM` * `NemotronForCausalLM` * `OlmoForCausalLM` * `Olmo2ForCausalLM` * `OlmoeForCausalLM` * `OPTForCausalLM` * `OrionForCausalLM` * `PersimmonForCausalLM` * `PhiForCausalLM` * `Phi3ForCausalLM` * `Phi3SmallForCausalLM` * `PhiMoEForCausalLM` * `Qwen2ForCausalLM` * `Qwen2MoeForCausalLM` * `RWForCausalLM` * `StableLMEpochForCausalLM` * `StableLmForCausalLM` * `Starcoder2ForCausalLM` * `SolarForCausalLM` * `TeleChat2ForCausalLM` * `XverseForCausalLM` * `BartModel` * `BartForConditionalGeneration` * `Florence2ForConditionalGeneration` * `BertModel` * `RobertaModel` * `RobertaForMaskedLM` * `XLMRobertaModel` * `DeciLMForCausalLM` * `Gemma2Model` * `GlmForCausalLM` * `GritLM` * `InternLM2ForRewardModel` * `JambaForSequenceClassification` * `LlamaModel` * `MistralModel` * `Phi3ForCausalLM` * `Qwen2Model` * `Qwen2ForCausalLM` * `Qwen2ForRewardModel` * `Qwen2ForProcessRewardModel` * `TeleChat2ForCausalLM` * `LlavaNextForConditionalGeneration` * `Phi3VForCausalLM` * `Qwen2VLForConditionalGeneration` * `Qwen2ForSequenceClassification` * `BertForSequenceClassification` * `RobertaForSequenceClassification` * `XLMRobertaForSequenceClassification` * `AriaForConditionalGeneration` * `Blip2ForConditionalGeneration` * `ChameleonForConditionalGeneration` * `ChatGLMModel` * `ChatGLMForConditionalGeneration` * `DeepseekVLV2ForCausalLM` * `FuyuForCausalLM` * `H2OVLChatModel` * `InternVLChatModel` * `Idefics3ForConditionalGeneration` * `LlavaForConditionalGeneration` * `LlavaNextForConditionalGeneration` * `LlavaNextVideoForConditionalGeneration` * `LlavaOnevisionForConditionalGeneration` * `MantisForConditionalGeneration` * `MiniCPMO` * `MiniCPMV` * `MolmoForCausalLM` * `NVLM_D` * `PaliGemmaForConditionalGeneration` * `Phi3VForCausalLM` * `PixtralForConditionalGeneration` * `QWenLMHeadModel` * `Qwen2VLForConditionalGeneration` * `Qwen2_5_VLForConditionalGeneration` * `Qwen2AudioForConditionalGeneration` * `UltravoxModel` * `MllamaForConditionalGeneration` * `WhisperForConditionalGeneration` * `EAGLEModel` * `MedusaModel` * `MLPSpeculatorPreTrainedModel`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supported-models.mdx

supported-models.mdx

Scaleway catalog

Multimodal models (chat + vision)

Chat models

Vision models

Embedding models

Custom models

Prerequisites

Access requirements

Required files

Supported model types

API support

Chat models

Vision models

Multimodal models

Embedding models

Custom model lifecycle

Licensing

Supported model architectures

Files

supported-models.mdx

Latest commit

History

supported-models.mdx

File metadata and controls

Scaleway catalog

Multimodal models (chat + vision)

Chat models

Vision models

Embedding models

Custom models

Prerequisites

Access requirements

Required files

Supported model types

API support

Chat models

Vision models

Multimodal models

Embedding models

Custom model lifecycle

Licensing

Supported model architectures