-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Different inference results and speed between /generate and OpenAI endpoint
#2747
opened Nov 14, 2024 by
jegork
2 of 4 tasks
ssm
models have been deprecated in favor of mamba
models
#2739
opened Nov 10, 2024 by
mokeddembillel
2 of 4 tasks
Local installation: weight backbone.embeddings.weight does not exist (Mamba)
#2737
opened Nov 10, 2024 by
mokeddembillel
2 of 4 tasks
In dev mode, server is stuck at Server started at unix:///tmp/text-generation-server-0
#2735
opened Nov 10, 2024 by
mokeddembillel
2 of 4 tasks
launch TGI with the argument
--max-input-tokens
smaller than sliding_window=4096 (got here max_input_tokens=16384)
#2730
opened Nov 7, 2024 by
ashwincv0112
1 of 4 tasks
device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar
#2729
opened Nov 6, 2024 by
SokolAnn
2 of 4 tasks
Python client: Pydantic protected namespace "model_"
#2722
opened Nov 4, 2024 by
Simon-Stone
4 tasks
FlashLlamaForCausalLM
's using name dense
for its mlp submodule causes error when using LoRA adapter
#2715
opened Nov 2, 2024 by
sadra-barikbin
CUDA Error: No kernel image is available for execution on the device
#2703
opened Oct 28, 2024 by
shubhamgajbhiye1994
2 of 4 tasks
Complexe response format lead the container to run forever on CPU
#2681
opened Oct 23, 2024 by
Rictus
2 of 4 tasks
PREFIX_CACHING=0 does not disable prefix caching in v2.3.1
#2676
opened Oct 21, 2024 by
sam-ulrich1
2 of 4 tasks
(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently
#2675
opened Oct 21, 2024 by
nathan-az
3 of 4 tasks
Prefix caching causes 2 different responses from the same HTTP call with seed set depending on what machine calls
#2670
opened Oct 18, 2024 by
sam-ulrich1
2 of 4 tasks
OpenAI Client format + chat template for a single call
#2644
opened Oct 14, 2024 by
vitalyshalumov
1 of 4 tasks
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.