This guide covers evaluating Ettin decoder models on generative language tasks using the EleutherAI evaluation harness (commit 867413f8677f00f6a817262727cbb041bf36192a).
Ettin decoder models excel at generative tasks and should be evaluated using the standard EleutherAI lm-evaluation-harness. This provides comprehensive evaluation across a wide range of language understanding and generation benchmarks.
# Clone the specific commit of lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
git checkout 867413f8677f00f6a817262727cbb041bf36192a
pip install -e .# Evaluate Ettin decoder on core tasks
lm_eval --model hf \
--model_args "pretrained=jhu-clsp/ettin-decoder-150m,add_bos_token=True" \
--tasks hellaswag,arc_easy,arc_challenge,winogrande \
--device cuda:0 \
--batch_size 8 \
--output_path results/ettin-decoder-150m#!/bin/bash
# evaluate_all_decoders.sh
MODELS=(
"jhu-clsp/ettin-decoder-17m"
"jhu-clsp/ettin-decoder-32m"
"jhu-clsp/ettin-decoder-68m"
"jhu-clsp/ettin-decoder-150m"
"jhu-clsp/ettin-decoder-400m"
"jhu-clsp/ettin-decoder-1b"
)
TASKS="hellaswag,arc_easy,arc_challenge,winogrande,piqa,boolq"
for model in "${MODELS[@]}"; do
echo "Evaluating $model..."
output_dir="results/$(basename $model)"
lm_eval --model hf \
--model_args "pretrained=$model,add_bos_token=True" \
--tasks $TASKS \
--device cuda:0 \
--batch_size 8 \
--output_path $output_dir \
--log_samples
done# Evaluate specific training checkpoints
lm_eval --model hf \
--model_args "pretrained=jhu-clsp/ettin-decoder-400m,revision=step590532,add_bos_token=True" \
--tasks hellaswag,arc_easy \
--device cuda:0 \
--batch_size 8 \
--output_path results/ettin-decoder-400m-step590532Is done in the same way as the above, since they are decoders now.
- Evaluation Harness: EleutherAI/lm-evaluation-harness
- Specific Commit: 867413f8677f00f6a817262727cbb041bf36192a
- Model Collection: jhu-clsp on HuggingFace
- Documentation: lm-eval docs
For issues with decoder evaluation, please refer to the EleutherAI evaluation harness documentation or open an issue in the Ettin repository.