Skip to content

Conversation

albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Oct 26, 2022

This PR:

  • Unpins dill to allow installing dill>=0.3.6
  • Removes the fix on dill for >=0.3.6 because they implemented a deterministic mode (to be confirmed by @anivegesana)
  • Pins dill<0.3.7 to allow latest dill 0.3.6
  • Implements a fix for dill save_function for dill 0.3.6
  • Additionally had to implement a fix for dill save_code and _save_regex for dill 0.3.6
  • Fixes the CI so that the latest dill version is tested (besides the minimum 0.3.1.1 required by apache-beam 2.42.0)

Fix #5162.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 26, 2022

The documentation is not available anymore as the PR was closed or merged.

@albertvillanova albertvillanova marked this pull request as ready for review October 26, 2022 08:42
@lhoestq
Copy link
Member

lhoestq commented Oct 26, 2022

I think it hasn't been merged ? uqfoundation/dill#501

Though I can see that the CI is green because it uses dill 0.3.1.1 - we should probably fix the dill version in both CIs:

  • use 0.3.1.1 for the CI with the minimum requirements
  • use latest for the CI with the latest requirements

@albertvillanova
Copy link
Member Author

I have noticed our CI uses dill-0.3.1.1, so not really testing dill 0.3.6...

@albertvillanova albertvillanova marked this pull request as draft October 26, 2022 09:38
@albertvillanova
Copy link
Member Author

albertvillanova commented Oct 26, 2022

The dill version in our CI is due to apache-beam...

@albertvillanova
Copy link
Member Author

albertvillanova commented Oct 26, 2022

I've tested locally: we need a specific fix for 0.3.6 (different from the previous ones)...

@lhoestq
Copy link
Member

lhoestq commented Oct 26, 2022

I think we can force the version of dill to be whatever we want in the CI - no matter what beam says. The alternative would be to run beam tests separately but it's more work

@albertvillanova
Copy link
Member Author

albertvillanova commented Oct 27, 2022

@lhoestq I tried the easiest solution: force dill==0.3.6 ignoring the requirement of apache-beam. But it doesn't work:

  • For example, for tests/test_builder.py::test_beam_based_builder_download_and_prepare_as_parquet:
    @dill.dill.register(dill.dill.ModuleType)
    def save_module(pickler, obj):
      if dill.dill.is_dill(pickler) and obj is pickler._main:
        return old_save_module(pickler, obj)
      else:
>       dill.dill.log.info('M2: %s' % obj)
E       AttributeError: module 'dill._dill' has no attribute 'log'

venv/lib/python3.9/site-packages/apache_beam/internal/dill_pickler.py:170: AttributeError
  • Apache Beam registers some dill functions (save_module) which are incompatible with dill 0.3.6 (in 0.3.6 'dill._dill' has no attribute 'log' but 'logger')
  • This has an impact in CI tests using either Apache Beam or multiprocess (even without using Apache Beam!):
FAILED tests/test_beam.py::BeamBuilderTest::test_download_and_prepare - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_beam.py::BeamBuilderTest::test_nested_features - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_filter_multiprocessing_in_memory - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_filter_multiprocessing_on_disk - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_builder.py::test_beam_based_download_and_prepare - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_map_caching_in_memory - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_map_caching_on_disk - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_builder.py::test_beam_based_as_dataset - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_map_multiprocessing_in_memory - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_map_multiprocessing_on_disk - AttributeError: module 'dill._dill' has no attribute 'log'
FAILED tests/test_builder.py::test_beam_based_builder_download_and_prepare_as_parquet - AttributeError: module 'dill._dill' has no attribute 'log'

I guess we should implement the other option: run beam tests separately.

I'm opening another PR for the CI refactoring.

@lhoestq
Copy link
Member

lhoestq commented Oct 27, 2022

Ah crap >< maybe only install apache_beam for the "minimum requirements" CI

@albertvillanova
Copy link
Member Author

albertvillanova commented Oct 27, 2022

@lhoestq if we install apache-beam only in the "minimum requirements" CI, then this other PR should be merged first:

Otherwise, our CI for "latest" will fail because it will try to run the beam tests (because PyTorch is installed but indeed apache-beam is not installed).

@albertvillanova albertvillanova changed the title Unpin dill and remove fix for dill>=0.3.6 Support dill 0.3.6 Oct 27, 2022
@lhoestq
Copy link
Member

lhoestq commented Oct 27, 2022

One of the test is failing because we set

# google colab doesn't allow to pickle loggers
# so we want to make sure each tests passes without pickling the logger
def reduce_ex(self):
    raise pickle.PicklingError()

datasets.arrow_dataset.logger.__reduce_ex__ = reduce_ex

in test_arrow_dataset.py to avoid pickling the logger because it used to fail on google colab.

Now pickling the logger seems to be working on google colab again - so you can remove it, and it should fix some tests

@albertvillanova
Copy link
Member Author

For the other 2 errors:

  • FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_map_caching_in_memory - _pickle.PicklingError: Can't pickle <class 'unittest.mock.MagicMock'>: it's not the same object as unittest.mock.MagicMock
  • FAILED tests/test_arrow_dataset.py::BaseDatasetTest::test_map_caching_on_disk - _pickle.PicklingError: Can't pickle <class 'unittest.mock.MagicMock'>: it's not the same object as unittest.mock.MagicMock

I have implemented a pickable MagicMock.

@albertvillanova albertvillanova marked this pull request as ready for review October 27, 2022 14:36
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pip-compile: Could not find a version that matches dill<0.3.6,>=0.3.6

3 participants