Skip to content

Support Sp token Function Call Token Implementation#13339

Open
glide-the wants to merge 1 commit into
ggml-org:masterfrom
glide-the:glm_obs_sp_token
Open

Support Sp token Function Call Token Implementation#13339
glide-the wants to merge 1 commit into
ggml-org:masterfrom
glide-the:glm_obs_sp_token

Conversation

@glide-the
Copy link
Copy Markdown

@glide-the glide-the commented May 6, 2025

Support <|observation|> for function call behavior, add in the EOG detection logic for src/llama-vocab.cpp#L1976-L1977

Verification the PR

checkout Ref: #13058

1. Build

cmake llama.cpp -B llama.cpp/build \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_CUDA=ON \
  -DLLAMA_CURL=ON \
  -DCMAKE_C_COMPILER=gcc-13 \
  -DCMAKE_CXX_COMPILER=g++-13 \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc

cmake  --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split llama-server

2. Convert HF Weights

python convert_hf_to_gguf.py THUDM/glm-4-9b-chat-hf \
  --outfile glm-4-9b.gguf \
  --outtype q8_0

3. Run Inference

{
            "name": "C++ Server Launch",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/bin/llama-server",
            "args": [
                "--jinja",
                "-m",
                "/mnt/ceph/develop/jiawei/model_checkpoint/glm-4-9b-chat-hf.gguf",
                "--port",
                "8000"
            ],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ],
            "miDebuggerPath": "/usr/bin/gdb"
 }

4.Funcation Call test

{
    "max_tokens": 1000,

    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ], 
    "stream": false,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": "北京天气怎么样"
        } 
    ],
    
    "model": "glm-4",
    "request_id": "mycompany-1713755521779"
}

@ngxson
Copy link
Copy Markdown
Contributor

ngxson commented May 6, 2025

This should be done when converting HF --> GGUF. Please update convert_hf_to_gguf.py instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants