-
Notifications
You must be signed in to change notification settings - Fork 44
Allow specifying database connection character set and collation #7883
Description
Is your feature request related to a problem? Please describe.
Currently, our version of Django (4.2.27) implicitly sets the connection's character set to utf8mb3, regardless of whatever character set and collation is defined for the database/tables/columns. This means Django will really only support up to 3-byte UTF-8 characters.
This results in errors like the following, where the database and table support the utf8mb4 character set:
utf8_4_bytes.mov
😀 is a 4-byte UTF-8 character, resulting the error.
Below is a video demonstrating the behavior with a 3-byte UTF-8 character, ⏏:
utf8_3_bytes.mov
You can review three and four byte UTF-8 characters at the following sites:
3 Byte UTF-8: https://design215.com/toolbox/utf8-3byte-characters.php
4 Byte UTF-8: https://design215.com/toolbox/utf8-4byte-characters.php
Describe the solution you'd like
It would be nice if we allow the user to specify the Database's character set and collation via environment variables, which can be passed to Django.
Specifically, Django controls the database connection character set through the charset option in the database options.
This would require passing a charset to the connection's OPTIONS setting:
specify7/specifyweb/settings/__init__.py
Line 50 in 576e02b
| 'OPTIONS': DATABASE_OPTIONS, |
To support the more modern utf8mb4 character set for instance:
'OPTIONS': {
'charset': 'utf8mb4'
},
I will note that the default connection character set was changed to utf8mb4 starting in 5.2:
https://docs.djangoproject.com/en/6.0/releases/5.2/#database-backends