Skip to content

docs(inference): update supported model information #4817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions menu/navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -860,6 +860,10 @@
"label": "OpenAI API compatibility",
"slug": "openai-compatibility"
},
{
"label": "Supported models in Managed Inference",
"slug": "supported-models"
},
{
"label": "Support for function calling in Scaleway Managed Inference",
"slug": "function-calling-support"
Expand Down
7 changes: 5 additions & 2 deletions pages/managed-inference/how-to/create-deployment.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ content:
paragraph: This page explains how to deploy a model on Scaleway Managed Inference
tags: managed-inference ai-data creating dedicated
dates:
validation: 2025-04-01
validation: 2025-04-09
posted: 2024-03-06
---

Expand All @@ -19,7 +19,10 @@ dates:
1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
2. Click **Deploy a model** to launch the model deployment wizard.
3. Provide the necessary information:
- Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
- Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
<Message type="important">
Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
</Message>
<Message type="note">
Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
</Message>
Expand Down
5 changes: 4 additions & 1 deletion pages/managed-inference/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ Here are some of the key features of Scaleway Managed Inference:
1. Navigate to the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
2. Click **Create deployment** to launch the deployment creation wizard.
3. Provide the necessary information:
- Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
- Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
<Message type="important">
Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
</Message>
<Message type="note">
Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
</Message>
Expand Down
269 changes: 269 additions & 0 deletions pages/managed-inference/reference-content/supported-models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
---
meta:
title: Supported models in Managed Inference
description: Explore all AI models supported by Managed Inference
content:
h1: Supported models in Managed Inference
paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models.
tags: support models custom catalog
dates:
validation: 2025-04-08
posted: 2025-04-08
categories:
- ai-data
---

Scaleway Managed Inference allows you to deploy various AI models, either from:

* [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
* [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.

## Scaleway catalog

### Multimodal models (chat + vision)

_More details to be added._

### Chat models

| Provider | Model identifier | Documentation | License |
|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------|
| Allen AI | `molmo-72b-0924` | [View Details](/managed-inference/reference-content/molmo-72b-0924/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
| Deepseek | `deepseek-r1-distill-llama-70b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
| Deepseek | `deepseek-r1-distill-llama-8b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
| Meta | `llama-3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) |
| Meta | `llama-3-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) |
| Meta | `llama-3.1-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/) | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
| Meta | `llama-3.1-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 license](https://www.llama.com/llama3_1/license/) |
| Meta | `llama-3.3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 license](https://www.llama.com/llama3_3/license/) |
| Nvidia | `llama-3.1-nemotron-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
| Mistral | `mixtral-8x7b-instruct-v0.1` | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
| Mistral | `mistral-7b-instruct-v0.3` | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
| Mistral | `mistral-nemo-instruct-2407` | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
| Mistral | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
| Mistral | `pixtral-12b-2409` | [View Details](/managed-inference/reference-content/pixtral-12b-2409/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
| Qwen | `qwen2.5-coder-32b-instruct` | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/) | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) |

### Vision models

_More details to be added._

### Embedding models

| Provider | Model identifier | Documentation | License |
|----------|------------------|----------------|---------|
| BAAI | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
| Sentence Transformers | `sentence-t5-xxl` | [View Details](/managed-inference/reference-content/sentence-t5-xxl/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |


## Custom models

<Message type="note">
Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
</Message>

### Prerequisites

<Message type="tip">
We recommend starting with a variation of a supported model from the Scaleway catalog.
For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit).
If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.
</Message>

To deploy a custom model via Hugging Face, ensure the following:

#### Access requirements

* You must have access to the model using your Hugging Face credentials.
* For gated models, request access through your Hugging Face account.
* Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).

#### Required files

Your model repository must include:

* A `config.json` file containig:
* An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values).
* `max_position_embeddings`
* Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
* A chat template included in either:
* `tokenizer_config.json` as a `chat_template` field, or
* `chat_template.json` as a `chat_template` field

#### Supported model types

Your model must be one of the following types:

* `chat`
* `vision`
* `multimodal` (chat + vision)
* `embedding`

<Message type="important">
**Security Notice**<br />
Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.
</Message>

## API support

Depending on the model type, specific endpoints and features will be supported.

### Chat models

The Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
**Structured outputs** or **Function calling** are not yet supported for custom models.

### Vision models

Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
**Structured outputs** or **Function calling** are not yet supported for custom models.

### Multimodal models

These models will be treated similarly to both Chat and Vision models.

### Embedding models

Embeddings API will be exposed for this model under `/v1/embeddings` endpoint.


## Custom model lifecycle

Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments.
In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**.

## Licensing

When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.

## Supported model architectures

Custom models must conform to one of the architectures listed below. Click to expand full list.

<Concept>
## Supported custom model architectures
Custom model deployment currently supports the following model architectures:
* `AquilaModel`
* `AquilaForCausalLM`
* `ArcticForCausalLM`
* `BaiChuanForCausalLM`
* `BaichuanForCausalLM`
* `BloomForCausalLM`
* `CohereForCausalLM`
* `Cohere2ForCausalLM`
* `DbrxForCausalLM`
* `DeciLMForCausalLM`
* `DeepseekForCausalLM`
* `DeepseekV2ForCausalLM`
* `DeepseekV3ForCausalLM`
* `ExaoneForCausalLM`
* `FalconForCausalLM`
* `Fairseq2LlamaForCausalLM`
* `GemmaForCausalLM`
* `Gemma2ForCausalLM`
* `GlmForCausalLM`
* `GPT2LMHeadModel`
* `GPTBigCodeForCausalLM`
* `GPTJForCausalLM`
* `GPTNeoXForCausalLM`
* `GraniteForCausalLM`
* `GraniteMoeForCausalLM`
* `GritLM`
* `InternLMForCausalLM`
* `InternLM2ForCausalLM`
* `InternLM2VEForCausalLM`
* `InternLM3ForCausalLM`
* `JAISLMHeadModel`
* `JambaForCausalLM`
* `LlamaForCausalLM`
* `LLaMAForCausalLM`
* `MambaForCausalLM`
* `FalconMambaForCausalLM`
* `MiniCPMForCausalLM`
* `MiniCPM3ForCausalLM`
* `MistralForCausalLM`
* `MixtralForCausalLM`
* `QuantMixtralForCausalLM`
* `MptForCausalLM`
* `MPTForCausalLM`
* `NemotronForCausalLM`
* `OlmoForCausalLM`
* `Olmo2ForCausalLM`
* `OlmoeForCausalLM`
* `OPTForCausalLM`
* `OrionForCausalLM`
* `PersimmonForCausalLM`
* `PhiForCausalLM`
* `Phi3ForCausalLM`
* `Phi3SmallForCausalLM`
* `PhiMoEForCausalLM`
* `Qwen2ForCausalLM`
* `Qwen2MoeForCausalLM`
* `RWForCausalLM`
* `StableLMEpochForCausalLM`
* `StableLmForCausalLM`
* `Starcoder2ForCausalLM`
* `SolarForCausalLM`
* `TeleChat2ForCausalLM`
* `XverseForCausalLM`
* `BartModel`
* `BartForConditionalGeneration`
* `Florence2ForConditionalGeneration`
* `BertModel`
* `RobertaModel`
* `RobertaForMaskedLM`
* `XLMRobertaModel`
* `DeciLMForCausalLM`
* `Gemma2Model`
* `GlmForCausalLM`
* `GritLM`
* `InternLM2ForRewardModel`
* `JambaForSequenceClassification`
* `LlamaModel`
* `MistralModel`
* `Phi3ForCausalLM`
* `Qwen2Model`
* `Qwen2ForCausalLM`
* `Qwen2ForRewardModel`
* `Qwen2ForProcessRewardModel`
* `TeleChat2ForCausalLM`
* `LlavaNextForConditionalGeneration`
* `Phi3VForCausalLM`
* `Qwen2VLForConditionalGeneration`
* `Qwen2ForSequenceClassification`
* `BertForSequenceClassification`
* `RobertaForSequenceClassification`
* `XLMRobertaForSequenceClassification`
* `AriaForConditionalGeneration`
* `Blip2ForConditionalGeneration`
* `ChameleonForConditionalGeneration`
* `ChatGLMModel`
* `ChatGLMForConditionalGeneration`
* `DeepseekVLV2ForCausalLM`
* `FuyuForCausalLM`
* `H2OVLChatModel`
* `InternVLChatModel`
* `Idefics3ForConditionalGeneration`
* `LlavaForConditionalGeneration`
* `LlavaNextForConditionalGeneration`
* `LlavaNextVideoForConditionalGeneration`
* `LlavaOnevisionForConditionalGeneration`
* `MantisForConditionalGeneration`
* `MiniCPMO`
* `MiniCPMV`
* `MolmoForCausalLM`
* `NVLM_D`
* `PaliGemmaForConditionalGeneration`
* `Phi3VForCausalLM`
* `PixtralForConditionalGeneration`
* `QWenLMHeadModel`
* `Qwen2VLForConditionalGeneration`
* `Qwen2_5_VLForConditionalGeneration`
* `Qwen2AudioForConditionalGeneration`
* `UltravoxModel`
* `MllamaForConditionalGeneration`
* `WhisperForConditionalGeneration`
* `EAGLEModel`
* `MedusaModel`
* `MLPSpeculatorPreTrainedModel`
</Concept>