You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As documented in [this pandas issue](pandas-dev/pandas#15585), `is_string_type` for pandas is not strict and will characterize a whole bunch of things as strings that aren't. For our purposes, this is problematic because basically all subclasses of `ExtensionDType` will be classified as strings by that function. This is definitely not appropriate, so I modified our version of `is_string_dtype` to explicitly reject all of our extension dtypes (previously it was only excluding categorical types). I'm not 100% confident that no other parts of the code base rely on the current (erroneous) behavior, but the cudf tests all passed for me locally and my attempt to trace all calls of `utils.is_string_dtype` all look to be places where the change gives more correct behavior, so I think our best bet is to just move forward with this change. Any problems that result from this change in the future due to other code relying on the current behavior should probably be characterized as bugs in the calling code and fixed there. The same goes for for external codes that relied on this behavior; this change is potentially breaking for them as well, but again is something that they should be addressing.
Authors:
- Vyas Ramasubramani (@vyasr)
Approvers:
- Keith Kraus (@kkraus14)
URL: #7710
0 commit comments