Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace Gemma-2-9b/27b incorrect #53

Open
cinjon opened this issue Oct 2, 2024 · 3 comments
Open

HuggingFace Gemma-2-9b/27b incorrect #53

cinjon opened this issue Oct 2, 2024 · 3 comments

Comments

@cinjon
Copy link

cinjon commented Oct 2, 2024

Hi, I was able to verify the MMLU score for HuggingFace gemma-2-9b-it to within .2. However, for gemma-2-27b-it, the score (52.3% on all) is way off. Is there some mistake on the repo there? Or is it particularly sensitive to bfloat16?

@Gopi-Uppari
Copy link

Hi @cinjon,

The MMLU scores for the Gemma-2-9B-IT and Gemma-2-27B-IT models are 71.3% and 75.2%, respectively. For further reference, please refer to this paper. The performance degradation observed in the Gemma-2-27B-IT model is likely due to its sensitivity to bfloat16 precision settings, which can impact inference quality if not handled properly.

For more detailed insights and related discussions, please check the following references:
ref1
ref2

Thank you.

@cinjon
Copy link
Author

cinjon commented Oct 9, 2024

Hi again. I am struggling with this and made a reproduction for you to look at: https://gist.github.com/cinjon/de9a22f57cfa0dc9ccb2afc255a8093e.

The main problem are the results below, which show roughly reproductions on gemma-27b, slight degradation on gemma-27b-it, slight degradation on gemma-2-9b, and terrible result on gemma-2-9b-it. What am I doing wrong? Thanks.

1. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-9b"
--> all 0.7057399230878793
2. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-9b-it"
--> all 0.6387266771115225
3. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-27b-it"
--> all 0.7518159806295399
4. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-27b"
--> all 0.7517447657028913

@cinjon
Copy link
Author

cinjon commented Oct 11, 2024

To be clear, it's not the "bfloat16" in the gist either - it's roughly the same result with "float32" too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants