Skip to content

trying to finetune I get EOFError: Ran out of input #285

@fmuntean

Description

@fmuntean

Bug report checklist
I am using the sample provided in this repo to fine tune the model based on some history data I have.

Describe the bug
Getting error when running the fine tune:

C:\GIT\MyPython\training>python train.py --config chronos-bolt-tiny.yaml --model-id amazon/chronos-bolt-tiny --no-random-init --max-steps 1000 --learning-rate 0.001
2025-02-25 09:47:32,958 - C:\GIT\MyPython\training\train.py - INFO - Using SEED: 1733703252
2025-02-25 09:47:33,007 - C:\GIT\MyPython\training\train.py - INFO - Logging dir: output\run-0
2025-02-25 09:47:33,007 - C:\GIT\MyPython\training\train.py - INFO - Loading and filtering 4 datasets for training: ['series1-monthly.arrow', 'series1-weekly.arrow', 'series2-monthly.arrow', 'series2-weekly.arrow']
2025-02-25 09:47:33,007 - C:\GIT\MyPython\training\train.py - INFO - Mixing probabilities: [0.9, 0.7, 0.5, 0.1]
2025-02-25 09:47:33,012 - C:\GIT\MyPython\training\train.py - INFO - Initializing model
2025-02-25 09:47:33,012 - C:\GIT\MyPython\training\train.py - INFO - Using pretrained initialization from amazon/chronos-bolt-tiny
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
2025-02-25 09:47:39,536 - C:\GIT\MyPython\training\train.py - INFO - Training
  0%|                                                                                                            | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\GIT\MyPython\training\train.py", line 702, in <module>
    app()
  File "C:\GIT\MyPython\.venv\Lib\site-packages\typer\main.py", line 340, in __call__
    raise e
  File "C:\GIT\MyPython\.venv\Lib\site-packages\typer\main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\click\core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\typer\core.py", line 680, in main
    return _main(
           ^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\typer\core.py", line 198, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\click\core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\click\core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\typer\main.py", line 698, in wrapper
    return callback(**use_params)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\typer_config\decorators.py", line 96, in wrapped
    return cmd(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\training\train.py", line 689, in main
    trainer.train()
  File "C:\GIT\MyPython\.venv\Lib\site-packages\transformers\trainer.py", line 2241, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\transformers\trainer.py", line 2500, in _inner_training_loop
    batch_samples, num_items_in_batch = self.get_batch_samples(epoch_iterator, num_batches)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\transformers\trainer.py", line 5180, in get_batch_samples
    batch_samples += [next(epoch_iterator)]
                      ^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\accelerate\data_loader.py", line 792, in __iter__
    main_iterator = self.base_dataloader.__iter__()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 491, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 422, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GIT\MyPython\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1146, in __init__
    w.start()
  File "C:\Python311\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 94, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Python311\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "<stringsource>", line 2, in pyarrow.lib._RecordBatchFileReader.__reduce_cython__      
TypeError: no default __reduce__ due to non-trivial __cinit__
  0%|                                                                                                            | 0/1000 [00:05<?, ?it/s]
PS C:\GIT\MyPython\training> Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python311\Lib\multiprocessing\spawn.py", line 120, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\multiprocessing\spawn.py", line 130, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

Expected behavior
I expect to succeed

To reproduce

I generated 4 files using the sample code provided.
each file contains 4 series of numbers extracted from a data frame:
data.to_numpy().T

and converted to arrow files using the convert_to_arrow method

Environment description
Operating system: Windows
Python version: 3.11.3
CUDA version: 12.6
PyTorch version: 2.6.0
HuggingFace transformers version: 4.49.0
HuggingFace accelerate version:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingwindowsConcerns running code on Windows

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions