Skip to content

Releases: NVIDIA-NeMo/Evaluator

NVIDIA Evaluator 0.3.0

03 Jun 16:12
d2f7b17

Choose a tag to compare

Release: NVIDIA Evaluator 0.3.0

NVIDIA NeMo Evaluator Launcher 0.2.6

21 May 11:54
ab491cb

Choose a tag to compare

Changelog Details
  • fix(export): add large Gym artifacts to excluded files by @marta-sd :: PR: #907
  • fix(nel): fall back to main when version tag not found in skills add by @piojanu :: PR: #906
  • fix(launcher): re-raise unrelated ModuleNotFoundError in container_metadata init by @wprazuch :: PR: #909
  • feat: add export_mounts and export_image to auto-export config by @agronskiy :: PR: #911
  • fix: move auto-export to separate CPU-only sbatch job by @AdamRajfer :: PR: #901
  • fix: make SWEbench live progress script restart-safe by @piojanu :: PR: #872
  • fix: force base-10 in bash _walltime_to_seconds by @agronskiy :: PR: #916
  • docs: remove internal GitLab URL from launching-evals skill by @piojanu :: PR: #920
  • feat: improve launching-evals and nel-assistant skills by @piojanu :: PR: #910
  • fix(exporters): socket name too long for long hostnames by @prokotg :: PR: #921
  • feat(launcher): expose GPUs to eval container for compute-eval by @wprazuch :: PR: #912
  • chore(launcher): add exclude_patterns to mlflow exporter (EVAL-632) by @agronskiy :: PR: #940
  • feat(mlflow/exporter): default to proxied multipart upload by @agronskiy :: PR: #952
  • fix: use docker cp to make sure artifacts are populated in DinD scenarios by @marta-sd :: PR: #1019
  • fix: disable xtrace before setting up env vars holding secrets by @marta-sd :: PR: #1021

0.3.0 dev builds (latest: 0.3.0.dev26)

13 May 15:09
7a44667

Choose a tag to compare

Pre-release

Rolling dev builds from the dev/0.3.0 branch.

Latest: 0.3.0.dev26

Install

pip install https://github.com/NVIDIA-NeMo/Evaluator/releases/download/v0.3.0-dev/nemo_evaluator-0.3.0.dev26-py3-none-any.whl

Versioning

Branch Version format Published
dev/0.3.0 0.3.0.devN (N = commit count) here (GitHub pre-release)
main 0.3.0 PyPI

Auto-updated on every push to dev/0.3.0 or manual workflow dispatch. Not for production use.

NVIDIA NeMo Evaluator 0.2.8

08 May 06:51
76c865c

Choose a tag to compare

Changelog Details
  • feat(byob): add completions_logprob endpoint and extend scorers/datasets by @kanishks-23 :: PR: #953
  • feat(byob): add explicit few-shot dataset support by @kanishks-23 :: PR: #993

NVIDIA NeMo Evaluator 0.2.7

29 Apr 15:07
b93f1f4

Choose a tag to compare

Changelog Details
  • feat: [EVAL-878] allow custom HTTP headers in payload_modifier by @wprazuch :: PR: #945

NVIDIA NeMo Evaluator 0.2.6

16 Apr 09:49
cb5e2f8

Choose a tag to compare

Changelog Details
  • fix: write BYOB results to per-benchmark subdirectory to avoid data overwriting by @laszkiewiczp :: PR: #856
  • fix: use normalized name in BYOB FDF evaluation entry by @laszkiewiczp :: PR: #855
  • feat: BYOB add output_parser parameter to judge_score() by @laszkiewiczp :: PR: #859
  • fix: byob readme example by @laszkiewiczp :: PR: #854
  • fix: remove obsolete run_eval from all by @marta-sd :: PR: #886
  • feat(per-sample-score): per sample score by @AWarno :: PR: #888
  • feat: replace Werkzeug dev server with waitress for high-concurrency adapter by @agronskiy :: PR: #896
  • fix: move logger creation in ProgressTrackingInterceptor to the top by @marta-sd :: PR: #900
  • fix(evaluator): distinguish interrupted and failed sigterm exits by @ngoncharenko :: PR: #882
  • fix: use poll() and disable IPv6 in waitress adapter server by @agronskiy :: PR: #905

NVIDIA NeMo Evaluator Launcher 0.2.5

16 Apr 09:49
cb5e2f8

Choose a tag to compare

Changelog Details
  • feat: support arbitrary sbatch flags via sbatch_extra_flags by @gchlebus :: PR: #864
  • feat(extra-params): export extra params by @AWarno :: PR: #873
  • docs: skill cleanups and fixes by @piojanu :: PR: #878
  • docs: add auxiliary deployments example and documentation by @AdamRajfer :: PR: #875
  • feat: allow duplicate task names in nel by @laszkiewiczp :: PR: #874
  • fix: add missing task_idx arg to TestSbatchExtraFlags by @laszkiewiczp :: PR: #885
  • feat: syntactic sugar overrides for tasks by @anowaczynski-nvidia :: PR: #759
  • feat: add watch mode for continuous checkpoint evaluation by @marta-sd :: PR: #857
  • feat: expose invocation ID as NEL_INVOCATION_ID env var by @agronskiy :: PR: #894
  • feat: replace Werkzeug dev server with waitress for high-concurrency adapter by @agronskiy :: PR: #896
  • feat: mount results for deployment by @AdamRajfer :: PR: #899
  • fix: raise error when execution.env_vars is used in config by @marta-sd :: PR: #898
  • fix(evaluator): distinguish interrupted and failed sigterm exits by @ngoncharenko :: PR: #882

NVIDIA NeMo Evaluator Launcher 0.2.4

19 Mar 08:32
26f45ea

Choose a tag to compare

Changelog Details
  • feat: deploy auxiliary endpoints by @wprazuch :: PR: #830
  • feat: add launching-evals and accessing-mlflow skills by @piojanu :: PR: #865
  • feat: rename to nel skills add and add marketplace entries by @piojanu :: PR: #868

NVIDIA NeMo Evaluator 0.2.5

18 Mar 01:35
a8c6072

Choose a tag to compare

Changelog Details
  • feat: add --platform flag for BYOB container builds by @laszkiewiczp :: PR: #832
  • chore: Remove duplicated skill for byob, add it to readme and marketplace by @wprazuch :: PR: #845
  • fix: remove deprecated api_key field from ApiEndpoint by @gchlebus :: PR: #850

NVIDIA NeMo Evaluator Launcher 0.2.3

18 Mar 01:35
a8c6072

Choose a tag to compare

Changelog Details
  • docs(nemotron-3-super): reproducible configs by @prokotg :: PR: #840
  • docs(SKILL.md): add ARM64 and non-standard GPU compatibility note by @himorishige :: PR: #818
  • fix(deprecated-multiple-instances-flag): fix deprecated multiple instances by @AWarno :: PR: #838
  • fix(nel-assistant): correct --model-type to --model_type in SKILL.md by @himorishige :: PR: #813
  • feat(malformed-configs-validation): validation of malformed configs by @AWarno :: PR: #811
  • fix: fixes for user-reported bugs after 0.2 release by @marta-sd :: PR: #837
  • docs(post_cmd): add post_cmd documentation by @e-dobrowolska :: PR: #841
  • feat: add configurable health check timeout for local executor by @laszkiewiczp :: PR: #844
  • chore: Simplify launcher evaluation templates and skill guidance by @piojanu :: PR: #846
  • chore: Remove duplicated skill for byob, add it to readme and marketplace by @wprazuch :: PR: #845
  • chore: Update for 26.03 by @wprazuch :: PR: #852
  • chore: VLMEvalkit bump by @wprazuch :: PR: #853
  • fix: bypass unlisted-task safeguard for local .sqsh by @gchlebus :: PR: #849