Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing confidence interval #1650

Open
yoavkatz opened this issue Mar 6, 2025 · 1 comment · May be fixed by #1660
Open

Missing confidence interval #1650

yoavkatz opened this issue Mar 6, 2025 · 1 comment · May be fixed by #1660
Assignees

Comments

@yoavkatz
Copy link
Member

yoavkatz commented Mar 6, 2025

Even when running on multiple examples, sometime confidence interval as missing:

You can remove this warning by passing 'verification_mode=all_checks' instead.
warnings.warn(
LiteLLM Inference (watsonx/meta-llama/llama-3-2-1b-instruct): 100%|█████████████████████████████████████████| 10/10 [00:02<00:00, 4.27it/s]
/Users/yoavkatz/miniforge3/envs/fme/lib/python3.10/site-packages/scipy/stats/_resampling.py:144: RuntimeWarning: invalid value encountered in divide
a_hat = 1/6 * sum(nums) / sum(dens)(3/2)
/Users/yoavkatz/miniforge3/envs/fme/lib/python3.10/site-packages/scipy/stats/_resampling.py:144: RuntimeWarning: invalid value encountered in divide
a_hat = 1/6 * sum(nums) / sum(dens)
(3/2)
/Users/yoavkatz/miniforge3/envs/fme/lib/python3.10/site-packages/scipy/stats/_resampling.py:144: RuntimeWarning: invalid value encountered in divide
a_hat = 1/6 * sum(nums) / sum(dens)**(3/2)
Sample input and output for template 'templates.my_entailment_as_fields' and num_demos '3':
source prediction processed_prediction
0 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
1 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
2 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
3 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
4 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
5 [{'role': 'system', 'content': 'Indicate wheth... contradiction contradiction
6 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
7 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
8 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral
9 [{'role': 'system', 'content': 'Indicate wheth... neutral neutral

template num_demos f1_micro ci_low ci_high
0 templates.my_entailment_as_question 0 0.12 nan nan
1 templates.my_entailment_as_question 3 0.22 nan nan
2 templates.my_entailment_as_fields 0 0 nan nan
3 templates.my_entailment_as_fields 3 0.4 nan nan
@yoavkatz
Copy link
Member Author

yoavkatz commented Mar 6, 2025

To replicate:

python examples/evaluate_different_templates.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants