Skip to content

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DIaacKr opened this issue Apr 6, 2025 · 4 comments · May be fixed by #715
Open

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

DIaacKr opened this issue Apr 6, 2025 · 4 comments · May be fixed by #715
Labels
bug Something isn't working

Comments

@DIaacKr
Copy link

DIaacKr commented Apr 6, 2025

Describe the bug

encounter an ArrowInvalid error while saving experiment tracker.
The most process of evaluation is done, but error occur when saving.
The error info is as follow:

[2025-04-06 18:27:14,942] [�[32m    INFO�[0m]: Saving experiment tracker (evaluation_tracker.py:180)�[0m
|    Task     |Version|Metric|Value |   |Stderr|
|-------------|------:|------|-----:|---|-----:|
|all          |       |em    |0.0102|±  |0.0020|
|             |       |qem   |0.0110|±  |0.0021|
|             |       |pem   |0.1925|±  |0.0078|
|             |       |pqem  |0.3937|±  |0.0097|
|             |       |acc   |0.2578|±  |0.0087|
|helm:med_qa:0|      0|em    |0.0102|±  |0.0020|
|             |       |qem   |0.0110|±  |0.0021|
|             |       |pem   |0.1925|±  |0.0078|
|             |       |pqem  |0.3937|±  |0.0097|
|             |       |acc   |0.2578|±  |0.0087|
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/main_v │
│ llm.py:163 in vllm                                                           │
│                                                                              │
│   160 │                                                                      │
│   161 │   results = pipeline.get_results()                                   │
│   162 │                                                                      │
│ ❱ 163 │   pipeline.save_and_push_results()                                   │
│   164 │                                                                      │
│   165 │   return results                                                     │
│   166                                                                        │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/pipeli │
│ ne.py:536 in save_and_push_results                                           │
│                                                                              │
│   533 │   def save_and_push_results(self):                                   │
│   534 │   │   logger.info("--- SAVING AND PUSHING RESULTS ---")              │
│   535 │   │   if self.is_main_process():                                     │
│ ❱ 536 │   │   │   self.evaluation_tracker.save()                             │
│   537 │                                                                      │
│   538 │   def _init_final_dict(self):                                        │
│   539 │   │   if self.is_main_process():                                     │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/loggin │
│ g/evaluation_tracker.py:201 in save                                          │
│                                                                              │
│   198 │   │   details_datasets: dict[str, Dataset] = {}                      │
│   199 │   │   for task_name, task_details in self.details_logger.details.ite │
│   200 │   │   │   # Create a dataset from the dictionary - we force cast to  │
│ ❱ 201 │   │   │   dataset = Dataset.from_list([asdict(detail) for detail in  │
│   202 │   │   │                                                              │
│   203 │   │   │   # We don't keep 'id' around if it's there                  │
│   204 │   │   │   column_names = dataset.column_names                        │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:986 in from_list                                                   │
│                                                                              │
│    983 │   │   """                                                           │
│    984 │   │   # for simplicity and consistency wrt OptimizedTypedSequence w │
│    985 │   │   mapping = {k: [r.get(k) for r in mapping] for k in mapping[0] │
│ ❱  986 │   │   return cls.from_dict(mapping, features, info, split)          │
│    987 │                                                                     │
│    988 │   @staticmethod                                                     │
│    989 │   def from_csv(                                                     │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:940 in from_dict                                                   │
│                                                                              │
│    937 │   │   │   │   )                                                     │
│    938 │   │   │   arrow_typed_mapping[col] = data                           │
│    939 │   │   mapping = arrow_typed_mapping                                 │
│ ❱  940 │   │   pa_table = InMemoryTable.from_pydict(mapping=mapping)         │
│    941 │   │   if info is None:                                              │
│    942 │   │   │   info = DatasetInfo()                                      │
│    943 │   │   info.features = features                                      │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/table.p │
│ y:758 in from_pydict                                                         │
│                                                                              │
│    755 │   │   Returns:                                                      │
│    756 │   │   │   `datasets.table.Table`                                    │
│    757 │   │   """                                                           │
│ ❱  758 │   │   return cls(pa.Table.from_pydict(*args, **kwargs))             │
│    759 │                                                                     │
│    760 │   @classmethod                                                      │
│    761 │   def from_pylist(cls, mapping, *args, **kwargs):                   │
│                                                                              │
│ in pyarrow.lib._Tabular.from_pydict:1968                                     │
│                                                                              │
│ in pyarrow.lib._from_pydict:6337                                             │
│                                                                              │
│ in pyarrow.lib.asarray:402                                                   │
│                                                                              │
│ in pyarrow.lib.array:252                                                     │
│                                                                              │
│ in pyarrow.lib._handle_arrow_array_protocol:114                              │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_w │
│ riter.py:229 in __arrow_array__                                              │
│                                                                              │
│   226 │   │   │   │   out = list_of_np_array_to_pyarrow_listarray(data)      │
│   227 │   │   │   else:                                                      │
│   228 │   │   │   │   trying_cast_to_python_objects = True                   │
│ ❱ 229 │   │   │   │   out = pa.array(cast_to_python_objects(data, only_1d_fo │
│   230 │   │   │   # use smaller integer precisions if possible               │
│   231 │   │   │   if self.trying_int_optimization:                           │
│   232 │   │   │   │   if pa.types.is_int64(out.type):                        │
│                                                                              │
│ in pyarrow.lib.array:372                                                     │
│                                                                              │
│ in pyarrow.lib._sequence_to_array:42                                         │
│                                                                              │
│ in pyarrow.lib.pyarrow_internal_check_status:155                             │
│                                                                              │
│ in pyarrow.lib.check_status:92                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
ArrowInvalid: cannot mix list and non-list, non-null values

To Reproduce

I executed the following command to eval Qwen2.5-0.5B-Instruct with med_qa benchmark, but got error.

NAMESPACE=Qwen
MODEL_NAME=Qwen2.5-0.5B-Instruct #
MODEL=$NAMESPACE/$MODEL_NAME
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=2048,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:2048,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

TASK=med_qa
LOG_FILE=logs/evals/${TASK}_${MODEL_NAME}.log
CUDA_VISIBLE_DEVICES=0 nohup lighteval vllm $MODEL_ARGS "helm|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR \
    > ${LOG_FILE} 2>&1 &

Expected behavior

Save results of evaluation successfully.

Version info

Ubuntu 24.04
lighteval 0.8.1
cuda 12.4
vllm 0.8.3
torch 2.6.0

@DIaacKr DIaacKr added the bug Something isn't working label Apr 6, 2025
@alvin319
Copy link
Contributor

Ran into a similiar issue as well!

@RH-MikkelWerling
Copy link

I think the problem comes about from "version = 0". Removing that from the specification of the task in "default_tasks.py" worked.

@alvin319
Copy link
Contributor

alvin319 commented May 4, 2025

bumping this again since it is quite disruptive to run LightEval with the newest version @NathanHB

@alvin319
Copy link
Contributor

With the new release of LightEval 0.9.2, running the following command will still incur the same error

NAMESPACE=Qwen
MODEL_NAME=Qwen2.5-0.5B-Instruct #
MODEL=$NAMESPACE/$MODEL_NAME
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,generation_parameters={max_new_tokens:2048,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

TASK=med_qa
LOG_FILE=logs/evals/${TASK}_${MODEL_NAME}.log
CUDA_VISIBLE_DEVICES=0 uv run lighteval accelerate $MODEL_ARGS "helm|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

I'm using uv here and running uv pip list yields

Package                  Version     Editable project location
------------------------ ----------- -------------------------------------------------
absl-py                  2.2.2
accelerate               1.6.0
aenum                    3.1.15
aiobotocore              2.21.1
aiohappyeyeballs         2.6.1
aiohttp                  3.11.18
aioitertools             0.12.0
aiosignal                1.3.2
annotated-types          0.7.0
antlr4-python3-runtime   4.9.3
anyio                    4.9.0
attrs                    25.3.0
boto3                    1.37.1
botocore                 1.37.1
certifi                  2025.4.26
chardet                  5.2.0
charset-normalizer       3.4.1
click                    8.1.8
colorama                 0.4.6
colorlog                 6.9.0
dataproperty             1.1.0
datasets                 3.5.1
dill                     0.3.8
docker-pycreds           0.4.0
docstring-parser         0.16
fasteners                0.19
filelock                 3.18.0
frozenlist               1.6.0
fsspec                   2025.3.0
gitdb                    4.0.12
gitpython                3.1.44
h11                      0.16.0
hf-xet                   1.1.0
httpcore                 1.0.9
httpx                    0.27.2
huggingface-hub          0.30.2
idna                     3.10
importlib-resources      6.5.2
jinja2                   3.1.6
jmespath                 1.0.1
joblib                   1.4.2
jsonargparse             4.32.1
latex2sympy2-extended    1.10.1
lighteval                0.9.2
lightning                2.5.1.post0
lightning-utilities      0.14.3
litdata                  0.2.45
litgpt                   0.5.5
lxml                     5.4.0
markdown-it-py           3.0.0
markupsafe               3.0.2
mbstrdecoder             1.1.4
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.4.3
multiprocess             0.70.16
networkx                 3.4.2
nltk                     3.9.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.8.93
nvidia-nvtx-cu12         12.1.105
omegaconf                2.3.0
packaging                24.2
pandas                   2.2.3
pathvalidate             3.2.3
pillow                   11.2.1
platformdirs             4.3.7
portalocker              3.1.1
propcache                0.3.1
protobuf                 3.20.3
psutil                   7.0.0
pyarrow                  20.0.0
pycountry                24.6.1
pydantic                 2.11.4
pydantic-core            2.33.2
pygments                 2.19.1
pytablewriter            1.2.1
python-dateutil          2.9.0.post0
pytorch-lightning        2.5.1.post0
pytz                     2025.2
pyyaml                   6.0.2
regex                    2024.11.6
requests                 2.32.3
rich                     14.0.0
rouge-score              0.1.2
s3fs                     2025.3.0
s3transfer               0.11.3
sacrebleu                2.5.1
safetensors              0.5.3
scikit-learn             1.6.1
scipy                    1.15.2
sentencepiece            0.2.0
sentry-sdk               2.27.0
setproctitle             1.3.6
setuptools               80.1.0
six                      1.17.0
smmap                    5.0.2
sniffio                  1.3.1
sympy                    1.13.1
tabledata                1.3.4
tabulate                 0.9.0
tcolorpy                 0.1.7
termcolor                2.3.0
threadpoolctl            3.6.0
tifffile                 2025.3.30
tokenizers               0.21.1
torch                    2.5.1+cu121
torchao                  0.10.0
torchmetrics             1.7.1
torchvision              0.20.1
tqdm                     4.67.1
transformers             4.51.3
triton                   3.1.0
typepy                   1.3.4
typer                    0.9.4
typeshed-client          2.7.0
typing-extensions        4.13.2
typing-inspection        0.4.0
tzdata                   2025.2
urllib3                  2.4.0
wandb                    0.19.10
wrapt                    1.17.2
xxhash                   3.5.0
yarl                     1.20.0
zstd                     1.5.6.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants