Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Logprobs overflow to -3.4e+38 #4876

Open
5 tasks done
zhc7 opened this issue Mar 29, 2025 · 0 comments
Open
5 tasks done

[Bug] Logprobs overflow to -3.4e+38 #4876

zhc7 opened this issue Mar 29, 2025 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@zhc7
Copy link

zhc7 commented Mar 29, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Image

logprobs overflow to the maximum negative value of fp32

Reproduction

I'm using Qwen2.5-14B-Instruct

command:

sampling_params = {
    "temperature": 0.9,
    "top_p": 0.9,
    "skip_special_tokens": False,
    "stop": "<|im_end|>",
}

ret = await self.engine.async_generate(
    input_ids=ids,
    sampling_params=sampling_params,
    return_logprob=True,
)

Environment

Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H800
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.3, V12.3.107
CUDA Driver Version: 535.161.08
PyTorch: 2.6.0+cu124
sglang: 0.4.4.post1
sgl_kernel: 0.0.5.post3
flashinfer: 0.2.3+cu124torch2.5
triton: 3.2.0
transformers: 4.48.3
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.9.1
fastapi: 0.115.5
hf_transfer: 0.1.9
huggingface_hub: 0.26.2
interegular: 0.3.3
modelscope: 1.23.1
orjson: 3.10.11
packaging: 23.2
psutil: 5.9.4
pydantic: 2.10.5
multipart: 0.0.18
zmq: 25.1.2
uvicorn: 0.22.0
uvloop: 0.21.0
vllm: 0.7.3.dev68+g9cf47594.d20250213
openai: 1.59.6
anthropic: 0.49.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    CPU Affinity    NUMA Affinity GPU NUMA ID
GPU0     X      NV8     NV8     NV8     NV8     NV8     NV8     NV8     PIX     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     0,8,16,24,34    0    N/A
GPU1    NV8      X      NV8     NV8     NV8     NV8     NV8     NV8     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     2,10,18,30      2    N/A
GPU2    NV8     NV8      X      NV8     NV8     NV8     NV8     NV8     SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     6,14,22,28      3    N/A
GPU3    NV8     NV8     NV8      X      NV8     NV8     NV8     NV8     SYS     SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     4,12,20,26      1    N/A
GPU4    NV8     NV8     NV8     NV8      X      NV8     NV8     NV8     SYS     SYS     SYS     SYS     SYS     PIX     SYS     SYS     SYS     1,9,19,27,33    4    N/A
GPU5    NV8     NV8     NV8     NV8     NV8      X      NV8     NV8     SYS     SYS     SYS     SYS     SYS     SYS     PIX     SYS     SYS     3,11,15,21      6    N/A
GPU6    NV8     NV8     NV8     NV8     NV8     NV8      X      NV8     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX     SYS     7,25,31,39      7    N/A
GPU7    NV8     NV8     NV8     NV8     NV8     NV8     NV8      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX     5,13,17,23      5    N/A
NIC0    PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC1    PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC2    SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYS     SYS
NIC3    SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYS
NIC4    SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS
NIC5    SYS     SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS
NIC6    SYS     SYS     SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS
NIC7    SYS     SYS     SYS     SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS
NIC8    SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_200
  NIC1: mlx5_400
  NIC2: mlx5_401
  NIC3: mlx5_402
  NIC4: mlx5_403
  NIC5: mlx5_404
  NIC6: mlx5_405
  NIC7: mlx5_406
  NIC8: mlx5_407


ulimit soft: 1048576

cc @Qiaolin-Yu , thanks!

@Qiaolin-Yu Qiaolin-Yu self-assigned this Mar 29, 2025
@Qiaolin-Yu Qiaolin-Yu added the bug Something isn't working label Mar 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants