You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "/transformers/generation/configuration_utils.py", line 726, in validate
raise ValueError(
ValueError: Invalid `cache_implementation` (offloaded). Choose one of: ['static', 'offloaded_static', 'sliding_window', 'hybrid', 'mamba', 'quantized', 'static']
Expected behavior
I expected that cache_implementation="offloaded" is a valid option taken by model.generate(). After enabling KV cache offloading, the peak memory usage should go down and inference time should go up.
The text was updated successfully, but these errors were encountered:
System Info
transformers
version: 4.46.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am following the official example to enable KV cache offloading. https://huggingface.co/docs/transformers/en/kv_cache#offloaded-cache
And I got the error message:
Expected behavior
I expected that
cache_implementation="offloaded"
is a valid option taken bymodel.generate()
. After enabling KV cache offloading, the peak memory usage should go down and inference time should go up.The text was updated successfully, but these errors were encountered: