Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

Open
2 of 4 tasks
mokeddembillel opened this issue Nov 10, 2024 · 1 comment · May be fixed by #2738
Open
2 of 4 tasks

Local installation: weight backbone.embeddings.weight does not exist (Mamba) #2737

mokeddembillel opened this issue Nov 10, 2024 · 1 comment · May be fixed by #2738

Comments

@mokeddembillel
Copy link

System Info

System Specifications

2024-11-10T21:20:44.880890Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: 97f7a22
Docker label: N/A
nvidia-smi:
Sun Nov 10 21:20:43 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:9E:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:A0:00.0 Off | 0 |
| N/A 25C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:A2:00.0 Off | 0 |
| N/A 27C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:A4:00.0 Off | 0 |
| N/A 27C P8 31W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA L40S On | 00000000:C6:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA L40S On | 00000000:C8:00.0 Off | 0 |
| N/A 26C P8 30W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA L40S On | 00000000:CA:00.0 Off | 0 |
| N/A 29C P8 33W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA L40S On | 00000000:CC:00.0 Off | 0 |
| N/A 26C P8 30W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Reproducing Steps and Traceback

~/Desktop/Code/text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m
2024-11-10 21:18:24.957 | INFO | text_generation_server.utils.import_utils::80 - Detected system cuda
/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
Using prefix caching = True
Using Attention = flashinfer
Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
/opt/conda/envs/tgi/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
warnings.warn(
Error when initializing model
Traceback (most recent call last):
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/models/custom_modeling/mamba_modeling.py", line 213, in init
self.lm_head = SpeculativeHead.load(config, f"{prefix}.embeddings", weights)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/layers/speculative.py", line 40, in load
lm_head = TensorParallelHead.load(config, prefix, weights)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/layers/tensor_parallel.py", line 66, in load
weight = weights.get_tensor(f"{prefix}.weight")
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/weights.py", line 213, in get_tensor
filename, tensor_name = self.get_filename(tensor_name)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/weights.py", line 192, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight backbone.embeddings.weight does not exist

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m

Expected behavior

Web server starting

@mokeddembillel
Copy link
Author

Solved the issue, will submit a pull request.

@mokeddembillel mokeddembillel linked a pull request Nov 10, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant