Skip to content

Allow specifying database connection character set and collation #7883

@melton-jason

Description

@melton-jason

Is your feature request related to a problem? Please describe.
Currently, our version of Django (4.2.27) implicitly sets the connection's character set to utf8mb3, regardless of whatever character set and collation is defined for the database/tables/columns. This means Django will really only support up to 3-byte UTF-8 characters.

This results in errors like the following, where the database and table support the utf8mb4 character set:

utf8_4_bytes.mov

😀 is a 4-byte UTF-8 character, resulting the error.

Below is a video demonstrating the behavior with a 3-byte UTF-8 character, ⏏:

utf8_3_bytes.mov

You can review three and four byte UTF-8 characters at the following sites:
3 Byte UTF-8: https://design215.com/toolbox/utf8-3byte-characters.php
4 Byte UTF-8: https://design215.com/toolbox/utf8-4byte-characters.php

Describe the solution you'd like
It would be nice if we allow the user to specify the Database's character set and collation via environment variables, which can be passed to Django.
Specifically, Django controls the database connection character set through the charset option in the database options.

This would require passing a charset to the connection's OPTIONS setting:

'OPTIONS': DATABASE_OPTIONS,

To support the more modern utf8mb4 character set for instance:

'OPTIONS': {
            'charset': 'utf8mb4'
        },

I will note that the default connection character set was changed to utf8mb4 starting in 5.2:
https://docs.djangoproject.com/en/6.0/releases/5.2/#database-backends

Metadata

Metadata

Assignees

No one assigned

    Labels

    1 - EnhancementImprovements or extensions to existing behavior

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions