Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for UTF8 characters in table definition and errors #467

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

michal-josef-spacek
Copy link
Contributor

@michal-josef-spacek michal-josef-spacek commented Mar 18, 2025

Add tests for UTF8 characters in table definition and errors
The tests are taken from DBD::MariaDB and adopted to DBD::mysql.

Testing is provided on Japanese and Chinese characters.

There are two test files:

  • one for testing of utf8/utf8mb4 identifiers
  • one for testing of utf8 error messages

There are three runs of all tests in both test files:

  • one without mysql_enable_utf8 option
  • one with enabled mysql_enable_utf8 option
  • one with enabled mysql_enable_utf8mb4 option

There are XXX comments for situations which not follow
mysql_enable_utf8/mb4 option.

@michal-josef-spacek
Copy link
Contributor Author

Still WIP

@michal-josef-spacek michal-josef-spacek force-pushed the add_utf8_jp_test branch 2 times, most recently from 02a3357 to fe8cae9 Compare March 31, 2025 12:18
@michal-josef-spacek michal-josef-spacek changed the title Add test for Japanese characters in table definition and errors Add tests for UTF8 characters in table definition and errors Mar 31, 2025
The tests are taken from DBD::MariaDB and adopted to DBD::mysql.

Testing is provided on Japanese and Chinese characters.

There are two test files:
* one for testing of utf8/utf8mb4 identifiers
* one for testing of utf8 error messages

There are three runs of all tests in both test files:
* one without mysql_enable_utf8 option
* one with enabled mysql_enable_utf8 option
* one with enabled mysql_enable_utf8mb4 option

There are XXX comments for situations which not follow
mysql_enable_utf8/mb4 option.
@michal-josef-spacek
Copy link
Contributor Author

PR is finished, any feedback?

Copy link
Collaborator

@dveeden dveeden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me on first glance. Is there a bigger plan or are you only adding tests?

@dveeden dveeden added tests utf8 Unicode and UTF-8 handling labels Apr 3, 2025
@michal-josef-spacek
Copy link
Contributor Author

@dveeden I have a bigger plan, because the situation isn't clean for me.

  1. This test is about testing identifiers and errors. Actual sitution.
  2. I think that we need to improve other two tests about utf8. If is this PR ok, I could update them to same way.
  3. I created analysis of documentation about unicode_utf8 flags. I am requeting a review and I am planning to create issue about it. I don't like this piece of doc. It is misleading from my side.
  4. Question is what about commits, which were reverted in 4.043. The two commits about fixing of this issue are IMHO independent to the issue which was in 4.041. But I don't understand root cause of the issue in 4.041 still.
  5. I am going step by step in utf8 issues in DBD::mysql. Actually I am looking to JSON type.
    ....

btw: I am for creating releases after some changes to check situation in real world.

@dveeden
Copy link
Collaborator

dveeden commented Apr 3, 2025

@michal-josef-spacek

Things to consider when working on UTF-8:

  • Compatibility with existing scripts etc is important. This might need a feature flag and a reasonable upgrade path for users.
  • In MySQL there is utf8mb3 and utf8mb4. And there is utf8 which is currently a alias for utf8mb3. This is confusing as outside of MySQL nobody really uses 3-byte UTF-8. The utf8 alias might change at some point to point to utf8mb4 and utf8mb3 might be removed at some point. Where possible I would like to have a single flag to enable UTF-8 in DBD::mysql instead of multiple.

@dveeden
Copy link
Collaborator

dveeden commented Apr 3, 2025

cc @Grinnz

@michal-josef-spacek
Copy link
Contributor Author

@dveeden

I read this. It's misleading from my side, too. The usage of utf8::upgrade and utf8::downgrade is a very problematic thing at all.

  • I'm all for updating tests and documentation

yes, I am for. Working on it.

I know.

  • My understanding about the big revert is that the changes made in 4.041 were breaking existing scripts for too many people and then basically the whole release was reverted including things that might have been unrelated. This then lead to the DBD::MariaDB / DBD::mysql split etc.

Yes, I know. Too many changes at one release are the biggest issue.
Still, I don't see a report that defines the issue (still in progress from my side). My actual understanding is the usage of bad techniques to create utf8 characters. sic!

Things to consider when working on UTF-8:

  • Compatibility with existing scripts etc is important. This might need a feature flag and a reasonable upgrade path for users.

Yes, I know.

  • In MySQL there is utf8mb3 and utf8mb4. And there is utf8 which is currently a alias for utf8mb3. This is confusing as outside of MySQL nobody really uses 3-byte UTF-8. The utf8 alias might change at some point to point to utf8mb4 and utf8mb3 might be removed at some point. Where possible I would like to have a single flag to enable UTF-8 in DBD::mysql instead of multiple.

Yes, I know. This is a bit confusing, but I think this is possible.
There are bigger issues, in code and doc too.

@dveeden dveeden merged commit 045fd5f into perl5-dbi:master Apr 7, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests utf8 Unicode and UTF-8 handling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants