You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataFusion's initcap behaves differently than Spark's. While both do "upper-case the first letter of each word and lowercase others", Spark considers as words anything separated by whitespace (' '), while DataFusion considers anything separated by non-ascii-alphanumeric as words. (DF's code would also fail to uppercase or lowercase non-ascii chars, but that doesn't materialize as a separate issue as it considers them separators already in the first place.)
#1051 shows the problem by adding two cases to the test, one using a dash and one using non-ascii letters (from Finnish).
Describe the bug
DataFusion's initcap behaves differently than Spark's. While both do "upper-case the first letter of each word and lowercase others", Spark considers as words anything separated by whitespace (' '), while DataFusion considers anything separated by non-ascii-alphanumeric as words. (DF's code would also fail to uppercase or lowercase non-ascii chars, but that doesn't materialize as a separate issue as it considers them separators already in the first place.)
#1051 shows the problem by adding two cases to the test, one using a dash and one using non-ascii letters (from Finnish).
Steps to reproduce
Call initcap with an input containing non-ascii-alphanumeric non-whitespace characters
Expected behavior
Match Spark
Additional context
No response
The text was updated successfully, but these errors were encountered: