Why was there a number of tokens reduction for these chronos models compared to the t5 models? #124
-
Hello there, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@CoCoNuTeK Please follow the issue guidelines in the repo and use discussions for Q/A. Issues are intended for issues (such as bugs) in the code. The vocab size in the context of Chronos relates to the precision. 4096 was a reasonable choice. While larger values may improve precision, note that you don't want the bins to be too fine. In that case very few items may fall into those bins which may lead to the model not learning the distribution properly. Please check out the paper for discussion on such design choices: https://arxiv.org/abs/2403.07815 |
Beta Was this translation helpful? Give feedback.
@CoCoNuTeK Please follow the issue guidelines in the repo and use discussions for Q/A. Issues are intended for issues (such as bugs) in the code.
The vocab size in the context of Chronos relates to the precision. 4096 was a reasonable choice. While larger values may improve precision, note that you don't want the bins to be too fine. In that case very few items may fall into those bins which may lead to the model not learning the distribution properly. Please check out the paper for discussion on such design choices: https://arxiv.org/abs/2403.07815