Skip to content

feat(genapi): add billing free tiers example to faq #4819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion pages/generative-apis/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,43 @@ Our Generative APIs support a range of popular models, including:

## How does the free tier work?
The free tier allows you to process up to 1,000,000 tokens without incurring any costs. After reaching this limit, you will be charged per million tokens processed. Free tier usage is calculated by adding all input and output tokens consumed from all models used.
For more information, refer to our [pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#generative-apis).
For more information, refer to our [pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#generative-apis) or access your bills by token types and models in [billing section from Scaleway Console](https://console.scaleway.com/billing/payment) (past and provisional bills for the current month).

Note that when your consumption exceeds the free tier, you will be billed for each additional token consumed by model and token type. The minimum billing unit is 1 million tokens. Here are two examples for low volume consumption:

Example 1: Free Tier only

| Model | Token type | Tokens consumed | Price | Bill |
|-----------------|-----------------|-----------------|-----------------|-----------------|
| `llama-3.3-70b-instruct` | Input | 500k | 0.90€/million tokens | 0.00€ |
| `llama-3.3-70b-instruct` | Output | 200k | 0.90€/million tokens | 0.00€ |
| `mistral-small-3.1-24b-instruct-2503` | Input | 100k | 0.15€/million tokens | 0.00€ |
| `mistral-small-3.1-24b-instruct-2503` | Output | 100k | 0.35€/million tokens | 0.00€ |

Total tokens consumed: `900k`
Total bill: `0.00€`

Example 2: Exceeding Free Tier

| Model | Token type | Tokens consumed | Price | Billed consumption | Bill |
|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| `llama-3.3-70b-instruct` | Input | 800k | 0.90€/million tokens | 1 million tokens | 0.00€ (Free Tier application)|
| `llama-3.3-70b-instruct` | Output | 2 500k | 0.90€/million tokens | 3 million tokens | 2.70€ |
| `mistral-small-3.1-24b-instruct-2503` | Input | 100k | 0.15€/million tokens | 1 million tokens | 0.15€ |
| `mistral-small-3.1-24b-instruct-2503` | Output | 100k | 0.35€/million tokens | 1 million tokens | 0.35€ |

Total tokens consumed: `900k`
Total billed consumption: `6 million tokens`
Total bill: `3.20€`

Note that in this example, the first line where the free tier applies will not display in your current Scaleway bills by model but will instead be listed under `Generative APIs Free Tier - First 1M tokens for free`.

## What is a token and how are they counted?
A token is the minimum unit of content that is seen and processed by a model. Hence, token definitions depend on input types:
- For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long)
- For images, `1` token corresponds to a square of pixels. For example, [pixtral-12b-2409 model](https://www.scaleway.com/en/docs/managed-inference/reference-content/pixtral-12b-2409/#frequently-asked-questions) image tokens of `16x16` pixels (16-pixel height, and 16-pixel width, hence `256` pixels in total).

The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model documentation (for instance in [`pixtral-12b-2409` size limit documentation](https://www.scaleway.com/en/docs/managed-inference/reference-content/pixtral-12b-2409/#frequently-asked-questions)). Otherwise, when the model is open, you can find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.

## How can I monitor my token consumption?
You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
Expand Down