-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
Models to be tested
meta-llama/Llama-3.1-8B-Instruct GSM8k Eval Accuracy
mistralai/Mistral-7B-Instruct-v0.3 GSM8k Eval Accuracy
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct GSM8k Eval Accuracy
google/gemma-2-27b-it GSM8k Eval Accuracy
meta-llama/Llama-3.1-70B-Instruct GSM8k Eval Accuracy
mistralai/Mixtral-8x7B-Instruct-v0.1 GSM8k Eval Accuracy
Qwen/Qwen2-57B-A14B-Instruct GSM8k Eval Accuracy
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 GSM8k Eval Accuracy
neuralmagic/Mistral-7B-Instruct-v0.3-FP8 GSM8k Eval Accuracy
neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8 GSM8k Eval Accuracy
neuralmagic/gemma-2-2b-it-FP8 GSM8k Eval AccuracySemicinalysis
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 GSM8k Eval Accuracy
neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 GSM8k Eval Accuracy
neuralmagic/Qwen2-72B-Instruct-FP8 GSM8k Eval Accuracy
neuralmagic/Qwen2-57B-A14B-Instruct-FP8 GSM8k Eval Accuracy
🐛 Describe the bug
RTX 4090
# VLLM_USE_V1=1 \
# VLLM_RPC_TIMEOUT=18000 \
# SAFETENSORS_FAST_GPU=1 \
# lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct,tensor_parallel_size=1,max_model_len=10000 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto \
# >> pr_gsm8k-meta-llama_Llama-3.1-8B-Instruct-v1-aiter.log 2>&1
vllm (pretrained=meta-llama/Llama-3.1-8B-Instruct,tensor_parallel_size=1,max_model_len=10000,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|<E2><86><91> |0.7763|<C2><B1> |0.0115|
| | |strict-match | 5|exact_match|<E2><86><91> |0.7536|<C2><B1> |0.0119|
# VLLM_USE_V1=1 \
# VLLM_RPC_TIMEOUT=18000 \
# SAFETENSORS_FAST_GPU=1 \
# lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct,tensor_parallel_size=1,max_model_len=10000 --trust_remote_code --apply_chat_template --fewshot_as_multiturn --tasks gsm8k_cot_llama --batch_size 4 \
# >> pr_gsm8k-meta-llama_Llama-3.1-8B-Instruct-v1-aiter.log 2>&1
vllm (pretrained=meta-llama/Llama-3.1-8B-Instruct,tensor_parallel_size=1,max_model_len=10000,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 4
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot_llama| 3|flexible-extract| 8|exact_match|↑ |0.8544|± |0.0097|
| | |strict-match | 8|exact_match|↑ |0.8522|± |0.0098|
vllm (pretrained=mistralai/Mistral-7B-Instruct-v0.3,tensor_parallel_size=1,max_model_len=10000,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.4845|± |0.0138|
| | |strict-match | 5|exact_match|↑ |0.4822|± |0.0138|
vllm (pretrained=neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8,tensor_parallel_size=1,max_model_len=10000,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.7513|± |0.0119|
| | |strict-match | 5|exact_match|↑ |0.7400|± |0.0121|
vllm (pretrained=neuralmagic/Mistral-7B-Instruct-v0.3-FP8,tensor_parallel_size=1,max_model_len=10000,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.4837|± |0.0138|
| | |strict-match | 5|exact_match|↑ |0.4822|± |0.0138|
vllm (pretrained=neuralmagic/gemma-2-2b-it-FP8,tensor_parallel_size=1,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.4390|± |0.0137|
| | |strict-match | 5|exact_match|↑ |0.4306|± |0.0136|
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.