Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

mokeddembillel · 2024-11-10T21:26:22Z

System Info

System Specifications

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Reproducing Steps and Traceback

~/Desktop/Code/text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m
2024-11-10 21:18:24.957 | INFO | text_generation_server.utils.import_utils::80 - Detected system cuda
/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
Using prefix caching = True
Using Attention = flashinfer
Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
/opt/conda/envs/tgi/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
warnings.warn(
Error when initializing model
Traceback (most recent call last):
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/models/custom_modeling/mamba_modeling.py", line 213, in init
self.lm_head = SpeculativeHead.load(config, f"{prefix}.embeddings", weights)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/layers/speculative.py", line 40, in load
lm_head = TensorParallelHead.load(config, prefix, weights)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/layers/tensor_parallel.py", line 66, in load
weight = weights.get_tensor(f"{prefix}.weight")
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/weights.py", line 213, in get_tensor
filename, tensor_name = self.get_filename(tensor_name)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/weights.py", line 192, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight backbone.embeddings.weight does not exist

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m

Expected behavior

Web server starting

The text was updated successfully, but these errors were encountered:

mokeddembillel · 2024-11-10T21:26:51Z

Solved the issue, will submit a pull request.

mokeddembillel linked a pull request Nov 10, 2024 that will close this issue

Fix: Change embeddings to embedding #2738

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

mokeddembillel commented Nov 10, 2024

mokeddembillel commented Nov 10, 2024

Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

Comments

mokeddembillel commented Nov 10, 2024

System Info

System Specifications

Reproducing Steps and Traceback

Information

Tasks

Reproduction

Expected behavior

mokeddembillel commented Nov 10, 2024