-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Thanks for open-sourcing the amazing work! I tried to reproduce the results using the Llama3.1-8B-Instruct, and I achieved 91.5% on 7473 samples of the GSM8K training set. However, when I used LLMLingua.py to filter the formatted data using this line, it only returns the 847 samples.
Is this the expected behavior when trying to reproduce the results?
It seems the filtering relies on a specific answer format such as “The final answer is...”, as shown in this line. However, there are other valid formats like \n\n\boxed{} and \n\boxed that might be excluded by the current filtering logic.
Lastly, I was wondering why you didn't perform the same filtering and CoT/answer split on the Qwen model and why it is necessary for LLaMA.
The processed Qwen dataset has the repeated answer at the end of each output.