Support Sp token Function Call Token Implementation #13339

glide-the · 2025-05-06T10:50:03Z

Support <|observation|> for function call behavior, add in the EOG detection logic for src/llama-vocab.cpp#L1976-L1977

Verification the PR

checkout Ref: #13058

1. Build

cmake llama.cpp -B llama.cpp/build \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_CUDA=ON \
  -DLLAMA_CURL=ON \
  -DCMAKE_C_COMPILER=gcc-13 \
  -DCMAKE_CXX_COMPILER=g++-13 \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc

cmake  --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split llama-server

2. Convert HF Weights

python convert_hf_to_gguf.py THUDM/glm-4-9b-chat-hf \
  --outfile glm-4-9b.gguf \
  --outtype q8_0

3. Run Inference

{
            "name": "C++ Server Launch",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/bin/llama-server",
            "args": [
                "--jinja",
                "-m",
                "/mnt/ceph/develop/jiawei/model_checkpoint/glm-4-9b-chat-hf.gguf",
                "--port",
                "8000"
            ],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ],
            "miDebuggerPath": "/usr/bin/gdb"
 }

4.Funcation Call test

{
    "max_tokens": 1000,

    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ], 
    "stream": false,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": "北京天气怎么样"
        } 
    ],
    
    "model": "glm-4",
    "request_id": "mycompany-1713755521779"
}

ngxson · 2025-05-06T12:30:51Z

This should be done when converting HF --> GGUF. Please update convert_hf_to_gguf.py instead

add sp token

ecc4b45

glide-the mentioned this pull request May 6, 2025

Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file #13058

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Sp token Function Call Token Implementation #13339

Support Sp token Function Call Token Implementation #13339

Uh oh!

glide-the commented May 6, 2025 •

edited

Loading

Uh oh!

ngxson commented May 6, 2025

Uh oh!

Uh oh!

Support Sp token Function Call Token Implementation #13339

Are you sure you want to change the base?

Support Sp token Function Call Token Implementation #13339

Uh oh!

Conversation

glide-the commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification the PR

Uh oh!

ngxson commented May 6, 2025

Uh oh!

Uh oh!

glide-the commented May 6, 2025 •

edited

Loading