Skip to content

Conversation

@lachlangh
Copy link

Fixes #960

This PR changes varchar() to measure the input length in bytes (nchar(x, type = "bytes")) rather than characters.
This matches SQL Server’s definition of VARCHAR(n) as the number of bytes, and prevents truncation when inserting multibyte UTF-8 strings.

Counting bytes instead of characters should not adversely affect other database backends.

Includes a minimal test verifying the multibyte case.

@detule
Copy link
Collaborator

detule commented Oct 29, 2025

Thanks again for your submission.

I see our varchar method gets used for NetezzaSQL as well. Off the top of my head I can't imagine your change impacting their workflow adversely so feels like fine to merge without doing extra work to check there.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

varchar() measures characters instead of bytes, causing truncation with multibyte UTF-8 strings (SQL Server)

2 participants