[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

DIaacKr · 2025-04-06T10:38:18Z

Describe the bug

encounter an ArrowInvalid error while saving experiment tracker.
The most process of evaluation is done, but error occur when saving.
The error info is as follow:

[2025-04-06 18:27:14,942] [�[32m    INFO�[0m]: Saving experiment tracker (evaluation_tracker.py:180)�[0m
|    Task     |Version|Metric|Value |   |Stderr|
|-------------|------:|------|-----:|---|-----:|
|all          |       |em    |0.0102|±  |0.0020|
|             |       |qem   |0.0110|±  |0.0021|
|             |       |pem   |0.1925|±  |0.0078|
|             |       |pqem  |0.3937|±  |0.0097|
|             |       |acc   |0.2578|±  |0.0087|
|helm:med_qa:0|      0|em    |0.0102|±  |0.0020|
|             |       |qem   |0.0110|±  |0.0021|
|             |       |pem   |0.1925|±  |0.0078|
|             |       |pqem  |0.3937|±  |0.0097|
|             |       |acc   |0.2578|±  |0.0087|
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/main_v │
│ llm.py:163 in vllm                                                           │
│                                                                              │
│   160 │                                                                      │
│   161 │   results = pipeline.get_results()                                   │
│   162 │                                                                      │
│ ❱ 163 │   pipeline.save_and_push_results()                                   │
│   164 │                                                                      │
│   165 │   return results                                                     │
│   166                                                                        │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/pipeli │
│ ne.py:536 in save_and_push_results                                           │
│                                                                              │
│   533 │   def save_and_push_results(self):                                   │
│   534 │   │   logger.info("--- SAVING AND PUSHING RESULTS ---")              │
│   535 │   │   if self.is_main_process():                                     │
│ ❱ 536 │   │   │   self.evaluation_tracker.save()                             │
│   537 │                                                                      │
│   538 │   def _init_final_dict(self):                                        │
│   539 │   │   if self.is_main_process():                                     │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/loggin │
│ g/evaluation_tracker.py:201 in save                                          │
│                                                                              │
│   198 │   │   details_datasets: dict[str, Dataset] = {}                      │
│   199 │   │   for task_name, task_details in self.details_logger.details.ite │
│   200 │   │   │   # Create a dataset from the dictionary - we force cast to  │
│ ❱ 201 │   │   │   dataset = Dataset.from_list([asdict(detail) for detail in  │
│   202 │   │   │                                                              │
│   203 │   │   │   # We don't keep 'id' around if it's there                  │
│   204 │   │   │   column_names = dataset.column_names                        │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:986 in from_list                                                   │
│                                                                              │
│    983 │   │   """                                                           │
│    984 │   │   # for simplicity and consistency wrt OptimizedTypedSequence w │
│    985 │   │   mapping = {k: [r.get(k) for r in mapping] for k in mapping[0] │
│ ❱  986 │   │   return cls.from_dict(mapping, features, info, split)          │
│    987 │                                                                     │
│    988 │   @staticmethod                                                     │
│    989 │   def from_csv(                                                     │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:940 in from_dict                                                   │
│                                                                              │
│    937 │   │   │   │   )                                                     │
│    938 │   │   │   arrow_typed_mapping[col] = data                           │
│    939 │   │   mapping = arrow_typed_mapping                                 │
│ ❱  940 │   │   pa_table = InMemoryTable.from_pydict(mapping=mapping)         │
│    941 │   │   if info is None:                                              │
│    942 │   │   │   info = DatasetInfo()                                      │
│    943 │   │   info.features = features                                      │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/table.p │
│ y:758 in from_pydict                                                         │
│                                                                              │
│    755 │   │   Returns:                                                      │
│    756 │   │   │   `datasets.table.Table`                                    │
│    757 │   │   """                                                           │
│ ❱  758 │   │   return cls(pa.Table.from_pydict(*args, **kwargs))             │
│    759 │                                                                     │
│    760 │   @classmethod                                                      │
│    761 │   def from_pylist(cls, mapping, *args, **kwargs):                   │
│                                                                              │
│ in pyarrow.lib._Tabular.from_pydict:1968                                     │
│                                                                              │
│ in pyarrow.lib._from_pydict:6337                                             │
│                                                                              │
│ in pyarrow.lib.asarray:402                                                   │
│                                                                              │
│ in pyarrow.lib.array:252                                                     │
│                                                                              │
│ in pyarrow.lib._handle_arrow_array_protocol:114                              │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_w │
│ riter.py:229 in __arrow_array__                                              │
│                                                                              │
│   226 │   │   │   │   out = list_of_np_array_to_pyarrow_listarray(data)      │
│   227 │   │   │   else:                                                      │
│   228 │   │   │   │   trying_cast_to_python_objects = True                   │
│ ❱ 229 │   │   │   │   out = pa.array(cast_to_python_objects(data, only_1d_fo │
│   230 │   │   │   # use smaller integer precisions if possible               │
│   231 │   │   │   if self.trying_int_optimization:                           │
│   232 │   │   │   │   if pa.types.is_int64(out.type):                        │
│                                                                              │
│ in pyarrow.lib.array:372                                                     │
│                                                                              │
│ in pyarrow.lib._sequence_to_array:42                                         │
│                                                                              │
│ in pyarrow.lib.pyarrow_internal_check_status:155                             │
│                                                                              │
│ in pyarrow.lib.check_status:92                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
ArrowInvalid: cannot mix list and non-list, non-null values

To Reproduce

I executed the following command to eval Qwen2.5-0.5B-Instruct with med_qa benchmark, but got error.

NAMESPACE=Qwen
MODEL_NAME=Qwen2.5-0.5B-Instruct #
MODEL=$NAMESPACE/$MODEL_NAME
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=2048,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:2048,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

TASK=med_qa
LOG_FILE=logs/evals/${TASK}_${MODEL_NAME}.log
CUDA_VISIBLE_DEVICES=0 nohup lighteval vllm $MODEL_ARGS "helm|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR \
    > ${LOG_FILE} 2>&1 &

Expected behavior

Save results of evaluation successfully.

Version info

Ubuntu 24.04
lighteval 0.8.1
cuda 12.4
vllm 0.8.3
torch 2.6.0

The text was updated successfully, but these errors were encountered:

alvin319 · 2025-04-15T01:48:05Z

Ran into a similiar issue as well!

RH-MikkelWerling · 2025-04-23T13:18:44Z

I think the problem comes about from "version = 0". Removing that from the specification of the task in "default_tasks.py" worked.

alvin319 · 2025-05-04T23:14:05Z

bumping this again since it is quite disruptive to run LightEval with the newest version @NathanHB

alvin319 · 2025-05-11T04:25:17Z

With the new release of LightEval 0.9.2, running the following command will still incur the same error

NAMESPACE=Qwen
MODEL_NAME=Qwen2.5-0.5B-Instruct #
MODEL=$NAMESPACE/$MODEL_NAME
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,generation_parameters={max_new_tokens:2048,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

TASK=med_qa
LOG_FILE=logs/evals/${TASK}_${MODEL_NAME}.log
CUDA_VISIBLE_DEVICES=0 uv run lighteval accelerate $MODEL_ARGS "helm|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

I'm using uv here and running uv pip list yields

Package                  Version     Editable project location
------------------------ ----------- -------------------------------------------------
absl-py                  2.2.2
accelerate               1.6.0
aenum                    3.1.15
aiobotocore              2.21.1
aiohappyeyeballs         2.6.1
aiohttp                  3.11.18
aioitertools             0.12.0
aiosignal                1.3.2
annotated-types          0.7.0
antlr4-python3-runtime   4.9.3
anyio                    4.9.0
attrs                    25.3.0
boto3                    1.37.1
botocore                 1.37.1
certifi                  2025.4.26
chardet                  5.2.0
charset-normalizer       3.4.1
click                    8.1.8
colorama                 0.4.6
colorlog                 6.9.0
dataproperty             1.1.0
datasets                 3.5.1
dill                     0.3.8
docker-pycreds           0.4.0
docstring-parser         0.16
fasteners                0.19
filelock                 3.18.0
frozenlist               1.6.0
fsspec                   2025.3.0
gitdb                    4.0.12
gitpython                3.1.44
h11                      0.16.0
hf-xet                   1.1.0
httpcore                 1.0.9
httpx                    0.27.2
huggingface-hub          0.30.2
idna                     3.10
importlib-resources      6.5.2
jinja2                   3.1.6
jmespath                 1.0.1
joblib                   1.4.2
jsonargparse             4.32.1
latex2sympy2-extended    1.10.1
lighteval                0.9.2
lightning                2.5.1.post0
lightning-utilities      0.14.3
litdata                  0.2.45
litgpt                   0.5.5
lxml                     5.4.0
markdown-it-py           3.0.0
markupsafe               3.0.2
mbstrdecoder             1.1.4
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.4.3
multiprocess             0.70.16
networkx                 3.4.2
nltk                     3.9.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.8.93
nvidia-nvtx-cu12         12.1.105
omegaconf                2.3.0
packaging                24.2
pandas                   2.2.3
pathvalidate             3.2.3
pillow                   11.2.1
platformdirs             4.3.7
portalocker              3.1.1
propcache                0.3.1
protobuf                 3.20.3
psutil                   7.0.0
pyarrow                  20.0.0
pycountry                24.6.1
pydantic                 2.11.4
pydantic-core            2.33.2
pygments                 2.19.1
pytablewriter            1.2.1
python-dateutil          2.9.0.post0
pytorch-lightning        2.5.1.post0
pytz                     2025.2
pyyaml                   6.0.2
regex                    2024.11.6
requests                 2.32.3
rich                     14.0.0
rouge-score              0.1.2
s3fs                     2025.3.0
s3transfer               0.11.3
sacrebleu                2.5.1
safetensors              0.5.3
scikit-learn             1.6.1
scipy                    1.15.2
sentencepiece            0.2.0
sentry-sdk               2.27.0
setproctitle             1.3.6
setuptools               80.1.0
six                      1.17.0
smmap                    5.0.2
sniffio                  1.3.1
sympy                    1.13.1
tabledata                1.3.4
tabulate                 0.9.0
tcolorpy                 0.1.7
termcolor                2.3.0
threadpoolctl            3.6.0
tifffile                 2025.3.30
tokenizers               0.21.1
torch                    2.5.1+cu121
torchao                  0.10.0
torchmetrics             1.7.1
torchvision              0.20.1
tqdm                     4.67.1
transformers             4.51.3
triton                   3.1.0
typepy                   1.3.4
typer                    0.9.4
typeshed-client          2.7.0
typing-extensions        4.13.2
typing-inspection        0.4.0
tzdata                   2025.2
urllib3                  2.4.0
wandb                    0.19.10
wrapt                    1.17.2
xxhash                   3.5.0
yarl                     1.20.0
zstd                     1.5.6.7

DIaacKr added the bug Something isn't working label Apr 6, 2025

alvin319 linked a pull request May 11, 2025 that will close this issue

Revert the task detail serialization to be compliant with PyArrow #715

Open

NathanHB linked a pull request May 15, 2025 that will close this issue

Revert the task detail serialization to be compliant with PyArrow #715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

DIaacKr commented Apr 6, 2025

alvin319 commented Apr 15, 2025

RH-MikkelWerling commented Apr 23, 2025

alvin319 commented May 4, 2025

alvin319 commented May 11, 2025

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

Comments

DIaacKr commented Apr 6, 2025

Describe the bug

To Reproduce

Expected behavior

Version info

alvin319 commented Apr 15, 2025

RH-MikkelWerling commented Apr 23, 2025

alvin319 commented May 4, 2025

alvin319 commented May 11, 2025