Skip to content

Support Sp token Function Call Token Implementation #13339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

glide-the
Copy link

@glide-the glide-the commented May 6, 2025

Support <|observation|> for function call behavior, add in the EOG detection logic for src/llama-vocab.cpp#L1976-L1977

Verification the PR

checkout Ref: #13058

1. Build

cmake llama.cpp -B llama.cpp/build \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_CUDA=ON \
  -DLLAMA_CURL=ON \
  -DCMAKE_C_COMPILER=gcc-13 \
  -DCMAKE_CXX_COMPILER=g++-13 \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc

cmake  --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split llama-server

2. Convert HF Weights

python convert_hf_to_gguf.py THUDM/glm-4-9b-chat-hf \
  --outfile glm-4-9b.gguf \
  --outtype q8_0

3. Run Inference

{
            "name": "C++ Server Launch",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/bin/llama-server",
            "args": [
                "--jinja",
                "-m",
                "/mnt/ceph/develop/jiawei/model_checkpoint/glm-4-9b-chat-hf.gguf",
                "--port",
                "8000"
            ],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ],
            "miDebuggerPath": "/usr/bin/gdb"
 }

4.Funcation Call test

{
    "max_tokens": 1000,

    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ], 
    "stream": false,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": "北京天气怎么样"
        } 
    ],
    
    "model": "glm-4",
    "request_id": "mycompany-1713755521779"
}

@ngxson
Copy link
Collaborator

ngxson commented May 6, 2025

This should be done when converting HF --> GGUF. Please update convert_hf_to_gguf.py instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants