huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 9.1k

Code
Issues 124
Pull requests 22
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

124 Open 1,233 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Different inference results and speed between /generate and OpenAI endpoint

#2747 opened Nov 14, 2024 by jegork

2 of 4 tasks

CUDA OutOfMemory even after warmup phase succeeded

#2744 opened Nov 13, 2024 by martinigoyanes

ssm models have been deprecated in favor of mamba models

#2739 opened Nov 10, 2024 by mokeddembillel

2 of 4 tasks

Local installation: weight backbone.embeddings.weight does not exist (Mamba)

#2737 opened Nov 10, 2024 by mokeddembillel

2 of 4 tasks

Support for Falcon-Mamba-7B

#2736 opened Nov 10, 2024 by mokeddembillel

1 of 2 tasks

In dev mode, server is stuck at Server started at unix:///tmp/text-generation-server-0

#2735 opened Nov 10, 2024 by mokeddembillel

2 of 4 tasks

Failed to build vllm in local install

#2734 opened Nov 9, 2024 by mokeddembillel

2 of 4 tasks

Bi-gram Repetation Penalty for the TGI configuration

#2731 opened Nov 7, 2024 by mertege

launch TGI with the argument --max-input-tokens smaller than sliding_window=4096 (got here max_input_tokens=16384)

#2730 opened Nov 7, 2024 by ashwincv0112

1 of 4 tasks

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar

#2729 opened Nov 6, 2024 by SokolAnn

2 of 4 tasks

TGI crashes while loading Qwen2-VL-7B-Instruct

#2728 opened Nov 6, 2024 by ktobah

2 of 4 tasks

Unable to load/run LoRA Adapters on llama - 7B

#2727 opened Nov 5, 2024 by kaushikmitr

Python client: Pydantic protected namespace "model_"

#2722 opened Nov 4, 2024 by Simon-Stone

4 tasks

FlashLlamaForCausalLM's using name dense for its mlp submodule causes error when using LoRA adapter

#2715 opened Nov 2, 2024 by sadra-barikbin

detokenize

#2705 opened Oct 29, 2024 by oroojlooy

CUDA Error: No kernel image is available for execution on the device

#2703 opened Oct 28, 2024 by shubhamgajbhiye1994

2 of 4 tasks

Is there a way to defines "bad_words"?

#2700 opened Oct 28, 2024 by tonylek

TGI Server should be installable via pip

#2696 opened Oct 27, 2024 by cdoern

Complexe response format lead the container to run forever on CPU

#2681 opened Oct 23, 2024 by Rictus

2 of 4 tasks

PREFIX_CACHING=0 does not disable prefix caching in v2.3.1

#2676 opened Oct 21, 2024 by sam-ulrich1

2 of 4 tasks

(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently

#2675 opened Oct 21, 2024 by nathan-az

3 of 4 tasks

Prefix caching causes 2 different responses from the same HTTP call with seed set depending on what machine calls

#2670 opened Oct 18, 2024 by sam-ulrich1

2 of 4 tasks

TGI does not support FP8 quantized models on ROCm

#2654 opened Oct 16, 2024 by Bihan

1 of 4 tasks

OpenAI Client format + chat template for a single call

#2644 opened Oct 14, 2024 by vitalyshalumov

1 of 4 tasks

How do you download a subfile?

#2643 opened Oct 14, 2024 by PeterTucker

1 of 4 tasks

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly