Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Can only compare identically-labeled Series objects (string vs. object) #61099

Open
3 tasks done
wahsmail opened this issue Mar 10, 2025 · 3 comments · May be fixed by #61199
Open
3 tasks done

BUG: Can only compare identically-labeled Series objects (string vs. object) #61099

wahsmail opened this issue Mar 10, 2025 · 3 comments · May be fixed by #61199
Assignees
Labels
Bug Needs Discussion Requires discussion from core team before further action Strings String extension data type and string data
Milestone

Comments

@wahsmail
Copy link

wahsmail commented Mar 10, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
s2.index = s2.index.astype('string')

s1 < s2  # fails

s1, s2 = s1.align(s2)
s1 < s2  # also fails

s1 = s1.reindex(s2.index)
s1 < s2  # succeeds

Issue Description

When a series (or dataframe) with otherwise identical indices are compared, but the indexes are technically dtype(object) and dtype(string), element-wise comparison fails. In the debugger, it looks like the ExtensionArray StringArray.equals is False when comparing to a python list of strings, causing Series._indexed_same to return False.

Expected Behavior

Ideally the string and object dtype would be comparable. This in-between state for Pandas dtypes has been quite awkward, with some libraries porting over to numpy-nullable / pyarrow dtype backends, but the Pandas library defaults not using them yet.

Installed Versions

Replace this line with the output of pd.show_versions()

@wahsmail wahsmail added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 10, 2025
@rhshadrach rhshadrach added Strings String extension data type and string data Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 11, 2025
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Mar 14, 2025
@sanggon6107
Copy link

Hi @wahsmail,
I think this should work since Index.equals() doc stated that dtype is not compared.

The dtype is *not* compared
>>> int64_idx = pd.Index([1, 2, 3], dtype="int64")
>>> int64_idx
Index([1, 2, 3], dtype='int64')
>>> uint64_idx = pd.Index([1, 2, 3], dtype="uint64")
>>> uint64_idx
Index([1, 2, 3], dtype='uint64')
>>> int64_idx.equals(uint64_idx)
True
"""

Also confirmed that the comparison doesn't raise when Index.equals() inside the Series._indexed_same() returns True.

@sanggon6107
Copy link

take

MayurKishorKumar added a commit to MayurKishorKumar/pandas that referenced this issue Mar 29, 2025
MayurKishorKumar added a commit to MayurKishorKumar/pandas that referenced this issue Mar 29, 2025
MayurKishorKumar added a commit to MayurKishorKumar/pandas that referenced this issue Mar 29, 2025
MayurKishorKumar added a commit to MayurKishorKumar/pandas that referenced this issue Mar 29, 2025
@MayurKishorKumar
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Discussion Requires discussion from core team before further action Strings String extension data type and string data
Projects
None yet
5 participants