feat: add inference and evaluation script with dataset transformations #733

mattmazzola · 2023-06-09T22:02:42Z

⚠️ This PR is not intended to merge directly, but to share work from our fork which may be useful to Metaseq ⚠️

Issue

Wanted a way to evaluate models using the same methods and commands we used for training them
Wanted a structured way to use different metrics or normalize differences depending on the dataset being tested
Wanted to use the same evaluation metrics used by Azure Babel for comparison
Wanted to be able to generate few-shot prompts for inference

Solutions

Add script for model inference and evaluation
1. Add evaluation support for HuggingFace, ParlAI, COCO, Grindstone implementations of HELM metrics
2. Update Docker file to install evaluation dependencies
  1. Unlikely to be correct modification
3. Move _flatten_config to metaseq.utils as it used by multiple code
Add mappings between dataset and pipeline configuration of eval libraries, metrics, and transformation functions
1. Metrics Example: Summarization datasets (Reddit and CNN-DM) should use evaluation libraries with ROUGE-L and BERTScore-F, but Classification datasets (HellaSwag and PIQA) should use evaluation libraries with Accuracy metric
2. Transformation Example: HellaSwag model outputs (4) something something -> 4
Added necessary evaluation libraries and re-implemented some metrics
Add PromptGeneratror to create few-shot prompts based on configuration using Jinja templates

This PR is quite large so it may be hard to make sense of.
Originally was only going to be inference.py and few other modifications, but then I kept brining in missing dependencies to avoid gaps and it grew a lot 🤔

Testing

Did not test 😔

Related to: #726

Much of this work was done by @sahajgg, @tupini07, and @anselmwang 🙏

Dockerfile

tupini07 · 2023-06-09T22:35:02Z

metaseq/data/prompt_generator.py

+        tokenizer_vocab_file_path="/mnt/input_data_dir/pretrained_models/OPT/dependencies/gpt2-vocab.json",
+        tokenizer_merges_file_path="/mnt/input_data_dir/pretrained_models/OPT/dependencies/gpt2-merges.txt",


If Metaseq has a standardized path for the vocab and merges files then we'll need to replace them here :) If not we might need to remove the default value.

metaseq/generation_metrics/grindstone_metrics.py

metaseq/generation_metrics/__init__.py

metaseq/utils.py

tupini07

left some comments :)

metaseq/cli/inference.py

mattmazzola · 2023-06-09T22:09:39Z

Dockerfile

+RUN pip install \
+        aim==3.16.2 \
+        py-rouge==1.1 \
+        rouge_score==0.1.2 \
+        parlai==1.7.1 \
+        evaluate==0.4.0
+
+ENV NLTK_DATA="/usr/share/nltk_data"
+RUN python -c "import nltk; nltk.download('punkt', download_dir='${NLTK_DATA}')"
+


This likely isn't the correct place to make this change.

It is only snippet from our whole Dockerfile which adds the evaluation libraries

mattmazzola · 2023-06-09T22:10:12Z

metaseq/data/datasets/dataset_configurations.py

+from metaseq.data.datasets.types import CommonDatasetConfiguration, DatasetConfiguration, DatasetConfigurationTeacherGenerated, DatasetModelConfig, DatasetModelHooks, DatasetTeacherGeneratedDataHooks, IdentityDict
+
+# Visual diagram of where hooks/functions are called during inference or data generation
+# https://excalidraw.com/#json=zoAk_TdynBHQnP9vZufGm,ekcVg_HqiF79cAp58_HKRQ


This visualization may be important for understanding

Dockerfile

metaseq/generation_metrics/__init__.py

metaseq/generation_metrics/grindstone_metrics.py

metaseq/utils.py

metaseq/cli/inference.py

Matt Mazzola added 4 commits June 9, 2023 14:15

Add inference CLI script

3317dea

Add more missing components

0a82a95

Add data transformers

d99055e

More missed files

68cdbb1

facebook-github-bot added the cla signed label Jun 9, 2023

tupini07 reviewed Jun 9, 2023

View reviewed changes

metaseq/generation_metrics/grindstone_metrics.py Outdated Show resolved Hide resolved

metaseq/generation_metrics/__init__.py Outdated Show resolved Hide resolved

tupini07 reviewed Jun 9, 2023

View reviewed changes

metaseq/utils.py Outdated Show resolved Hide resolved

tupini07 reviewed Jun 9, 2023

View reviewed changes

metaseq/cli/inference.py Show resolved Hide resolved

metaseq/cli/inference.py Outdated Show resolved Hide resolved

metaseq/cli/inference.py Show resolved Hide resolved

metaseq/cli/inference.py Outdated Show resolved Hide resolved

Matt Mazzola added 8 commits June 12, 2023 07:54

buld build typo

7e1c21d

remove generation metrics init

64602f8

rename to strip_token

0792397

reset to 0 when generating tokens

655c89f

Update comment

788902f

Add comment to clarify dataset loading

bb99337

remove comment

39d6e05

remove aim

14c6c93

mattmazzola commented Jun 12, 2023

View reviewed changes

mattmazzola marked this pull request as ready for review June 12, 2023 15:19

mattmazzola requested review from Xirider, andrewPoulton, bashnick, davides, igormolybogFB, moyapchen, ngoyal2707, punitkoura, suchenzang, tangbinh and urielsinger as code owners June 12, 2023 15:19

mattmazzola requested review from ArmenAg, adampolyak, lilisierrayu and zycalice as code owners June 12, 2023 15:19

		tokenizer_vocab_file_path="/mnt/input_data_dir/pretrained_models/OPT/dependencies/gpt2-vocab.json",
		tokenizer_merges_file_path="/mnt/input_data_dir/pretrained_models/OPT/dependencies/gpt2-merges.txt",

feat: add inference and evaluation script with dataset transformations #733

Are you sure you want to change the base?

feat: add inference and evaluation script with dataset transformations #733

Uh oh!

Conversation

mattmazzola commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solutions

Testing

Uh oh!

Uh oh!

tupini07 Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tupini07 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattmazzola Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

mattmazzola Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattmazzola commented Jun 9, 2023 •

edited

Loading