meta | content | tags | dates | categories | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
support models custom catalog |
|
|
Scaleway Managed Inference allows you to deploy various AI models, either from:
- Scaleway catalog: A curated set of ready-to-deploy models available through the Scaleway console or the Managed Inference models API
- Custom models: Models that you import, typically from sources like Hugging Face.
More details to be added.
Provider | Model identifier | Documentation | License |
---|---|---|---|
Allen AI | molmo-72b-0924 |
View Details | Apache 2.0 license |
Deepseek | deepseek-r1-distill-llama-70b |
View Details | MIT license |
Deepseek | deepseek-r1-distill-llama-8b |
View Details | MIT license |
Meta | llama-3-70b-instruct |
View Details | Llama 3 license |
Meta | llama-3-8b-instruct |
View Details | Llama 3 license |
Meta | llama-3.1-70b-instruct |
View Details | Llama 3.1 community license |
Meta | llama-3.1-8b-instruct |
View Details | Llama 3.1 license |
Meta | llama-3.3-70b-instruct |
View Details | Llama 3.3 license |
Nvidia | llama-3.1-nemotron-70b-instruct |
View Details | Llama 3.1 community license |
Mistral | mixtral-8x7b-instruct-v0.1 |
View Details | Apache 2.0 license |
Mistral | mistral-7b-instruct-v0.3 |
View Details | Apache 2.0 license |
Mistral | mistral-nemo-instruct-2407 |
View Details | Apache 2.0 license |
Mistral | mistral-small-24b-instruct-2501 |
View Details | Apache 2.0 license |
Mistral | pixtral-12b-2409 |
View Details | Apache 2.0 license |
Qwen | qwen2.5-coder-32b-instruct |
View Details | Apache 2.0 license |
More details to be added.
Provider | Model identifier | Documentation | License |
---|---|---|---|
BAAI | bge-multilingual-gemma2 |
View Details | Gemma Terms of Use |
Sentence Transformers | sentence-t5-xxl |
View Details | Apache 2.0 license |
To deploy a custom model via Hugging Face, ensure the following:
- You must have access to the model using your Hugging Face credentials.
- For gated models, request access through your Hugging Face account.
- Credentials are not stored, but we recommend using read or fine-grained access tokens.
Your model repository must include:
- A
config.json
file containig:- An
architectures
array (see supported architectures for the exact list of supported values). max_position_embeddings
- An
- Model weights in the
.safetensors
format - A chat template included in either:
tokenizer_config.json
as achat_template
field, orchat_template.json
as achat_template
field
Your model must be one of the following types:
chat
vision
multimodal
(chat + vision)embedding
Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.
Depending on the model type, specific endpoints and features will be supported.
The Chat API will be exposed for this model under /v1/chat/completions
endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Chat API will be exposed for this model under /v1/chat/completions
endpoint.
Structured outputs or Function calling are not yet supported for custom models.
These models will be treated similarly to both Chat and Vision models.
Embeddings API will be exposed for this model under /v1/embeddings
endpoint.
Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
Custom models must conform to one of the architectures listed below. Click to expand full list.
## Supported custom model architectures Custom model deployment currently supports the following model architectures: * `AquilaModel` * `AquilaForCausalLM` * `ArcticForCausalLM` * `BaiChuanForCausalLM` * `BaichuanForCausalLM` * `BloomForCausalLM` * `CohereForCausalLM` * `Cohere2ForCausalLM` * `DbrxForCausalLM` * `DeciLMForCausalLM` * `DeepseekForCausalLM` * `DeepseekV2ForCausalLM` * `DeepseekV3ForCausalLM` * `ExaoneForCausalLM` * `FalconForCausalLM` * `Fairseq2LlamaForCausalLM` * `GemmaForCausalLM` * `Gemma2ForCausalLM` * `GlmForCausalLM` * `GPT2LMHeadModel` * `GPTBigCodeForCausalLM` * `GPTJForCausalLM` * `GPTNeoXForCausalLM` * `GraniteForCausalLM` * `GraniteMoeForCausalLM` * `GritLM` * `InternLMForCausalLM` * `InternLM2ForCausalLM` * `InternLM2VEForCausalLM` * `InternLM3ForCausalLM` * `JAISLMHeadModel` * `JambaForCausalLM` * `LlamaForCausalLM` * `LLaMAForCausalLM` * `MambaForCausalLM` * `FalconMambaForCausalLM` * `MiniCPMForCausalLM` * `MiniCPM3ForCausalLM` * `MistralForCausalLM` * `MixtralForCausalLM` * `QuantMixtralForCausalLM` * `MptForCausalLM` * `MPTForCausalLM` * `NemotronForCausalLM` * `OlmoForCausalLM` * `Olmo2ForCausalLM` * `OlmoeForCausalLM` * `OPTForCausalLM` * `OrionForCausalLM` * `PersimmonForCausalLM` * `PhiForCausalLM` * `Phi3ForCausalLM` * `Phi3SmallForCausalLM` * `PhiMoEForCausalLM` * `Qwen2ForCausalLM` * `Qwen2MoeForCausalLM` * `RWForCausalLM` * `StableLMEpochForCausalLM` * `StableLmForCausalLM` * `Starcoder2ForCausalLM` * `SolarForCausalLM` * `TeleChat2ForCausalLM` * `XverseForCausalLM` * `BartModel` * `BartForConditionalGeneration` * `Florence2ForConditionalGeneration` * `BertModel` * `RobertaModel` * `RobertaForMaskedLM` * `XLMRobertaModel` * `DeciLMForCausalLM` * `Gemma2Model` * `GlmForCausalLM` * `GritLM` * `InternLM2ForRewardModel` * `JambaForSequenceClassification` * `LlamaModel` * `MistralModel` * `Phi3ForCausalLM` * `Qwen2Model` * `Qwen2ForCausalLM` * `Qwen2ForRewardModel` * `Qwen2ForProcessRewardModel` * `TeleChat2ForCausalLM` * `LlavaNextForConditionalGeneration` * `Phi3VForCausalLM` * `Qwen2VLForConditionalGeneration` * `Qwen2ForSequenceClassification` * `BertForSequenceClassification` * `RobertaForSequenceClassification` * `XLMRobertaForSequenceClassification` * `AriaForConditionalGeneration` * `Blip2ForConditionalGeneration` * `ChameleonForConditionalGeneration` * `ChatGLMModel` * `ChatGLMForConditionalGeneration` * `DeepseekVLV2ForCausalLM` * `FuyuForCausalLM` * `H2OVLChatModel` * `InternVLChatModel` * `Idefics3ForConditionalGeneration` * `LlavaForConditionalGeneration` * `LlavaNextForConditionalGeneration` * `LlavaNextVideoForConditionalGeneration` * `LlavaOnevisionForConditionalGeneration` * `MantisForConditionalGeneration` * `MiniCPMO` * `MiniCPMV` * `MolmoForCausalLM` * `NVLM_D` * `PaliGemmaForConditionalGeneration` * `Phi3VForCausalLM` * `PixtralForConditionalGeneration` * `QWenLMHeadModel` * `Qwen2VLForConditionalGeneration` * `Qwen2_5_VLForConditionalGeneration` * `Qwen2AudioForConditionalGeneration` * `UltravoxModel` * `MllamaForConditionalGeneration` * `WhisperForConditionalGeneration` * `EAGLEModel` * `MedusaModel` * `MLPSpeculatorPreTrainedModel`