Skip to content

Caching does not work when using python3.14 #725

@intexcor

Description

@intexcor

Describe the bug
Traceback (most recent call last):
File "/workspace/ctn.py", line 8, in
ds = load_dataset(f"naver-clova-ix/synthdog-{lang}") # или "synthdog-zh" для китайского
File "/workspace/.venv/lib/python3.14/site-packages/datasets/load.py", line 1397, in load_dataset
builder_instance = load_dataset_builder(
path=path,
...<10 lines>...
**config_kwargs,
)
File "/workspace/.venv/lib/python3.14/site-packages/datasets/load.py", line 1185, in load_dataset_builder
builder_instance._use_legacy_cache_dir_if_possible(dataset_module)

File "/workspace/.venv/lib/python3.14/site-packages/datasets/builder.py", line 612, in _use_legacy_cache_dir_if_possible
self._check_legacy_cache2(dataset_module) or self._check_legacy_cache() or None
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/datasets/builder.py", line 485, in _check_legacy_cache2
config_id = self.config.name + "-" + Hasher.hash({"data_files": self.config.data_files})
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/datasets/fingerprint.py", line 188, in hash
return cls.hash_bytes(dumps(value))
~~~~~^^^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/datasets/utils/_dill.py", line 120, in dumps
dump(obj, file)
~~~~^^^^^^^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/datasets/utils/_dill.py", line 114, in dump
Pickler(file, recurse=True).dump(obj)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/dill/_dill.py", line 428, in dump
StockPickler.dump(self, obj)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^
File "/usr/lib/python3.14/pickle.py", line 498, in dump
self.save(obj)
~~~~~~~~~^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/datasets/utils/_dill.py", line 70, in save
dill.Pickler.save(self, obj, save_persistent_id=save_persistent_id)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/dill/_dill.py", line 422, in save
StockPickler.save(self, obj, save_persistent_id)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.14/pickle.py", line 572, in save
f(self, obj) # Call unbound method with explicit self
~^^^^^^^^^^^
File "/workspace/.venv/lib/python3.14/site-packages/dill/_dill.py", line 1262, in save_module_dict
StockPickler.save_dict(pickler, obj)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/usr/lib/python3.14/pickle.py", line 1064, in save_dict
self._batch_setitems(obj.items(), obj)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
TypeError: Pickler._batch_setitems() takes 2 positional arguments but 3 were given

Steps to reproduce the bug
ds_train = ds["train"].map(lambda x: {**x, "lang": lang})

Expected behavior
Fixed bugs

Environment info
datasets version: 4.2.0
Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.39
Python version: 3.14.0
huggingface_hub version: 0.35.3
PyArrow version: 21.0.0
Pandas version: 2.3.3
fsspec version: 2025.9.0

https://github.com/huggingface/datasets/issues/7813

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions