Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Found Bug add_column #51758

Closed
ivanthewebber opened this issue Mar 27, 2025 · 1 comment · Fixed by #51794
Closed

[Data] Found Bug add_column #51758

ivanthewebber opened this issue Mar 27, 2025 · 1 comment · Fixed by #51794
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@ivanthewebber
Copy link
Contributor

ivanthewebber commented Mar 27, 2025

What happened + What you expected to happen

See reproduction which broke when updating from 2.40.0 to 2.44.0. It seems ray.data.Dataset.add_column is not setting for all elements properly.

Versions / Dependencies

Ray 2.44.0

Reproduction script

import unittest
import ray
import ray.data
import pandas as pd

class MyTest(unittest.TestCase):
    def test_add_column(self):
        def set_x(df: pd.DataFrame) -> pd.Series:
            return pd.Series([1] * len(df))
                
        df = pd.DataFrame({"a": [1, 2, 3]})

        # error does not present with 1 for override_num_blocks
        # second two are nan with 3 for override_num_blocks
        ds = ray.data.from_pandas(df, 2)

        for row in ds.add_column("x", set_x, batch_format="pandas").take(3):
            self.assertEqual(row["x"], 1) # last one is nan and fails

if __name__ == "__main__":
    unittest.main()

Issue Severity

High: It blocks me from completing my task.

@ivanthewebber ivanthewebber added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 27, 2025
@Bye-legumes
Copy link
Contributor

I can reproduce on my side

FAIL: test_add_column (__main__.MyTest.test_add_column)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/zhilong/test_issue.py", line 18, in test_add_column
   self.assertEqual(row["x"], 1) # last one is nan and fails
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: nan != 1

----------------------------------------------------------------------
Ran 1 test in 3.777s

FAILED (failures=1)```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants