gguf-py: Optimize `GGUFReader` read-only mode performance #13378

Isotr0py · 2025-05-08T07:54:53Z

Make sure to read the contributing guidelines before submitting a PR

Refine gguf-py: Improve GGUFReader read-only mode performance #10159 to keep compatability with numpy2.0

Original description

This PR aims to optimize the GGUFReader read-only performance with following modifications:

Using native python file I/O to build fields instead of memmap array.
Optimize _get_str and _get function in read-only mode with np.from_buffer.
Avoid calculating offsets from array with creating intermediate data, using tell from native python I/O file to update offsets instead.

Performance Comparison

Benchmark script

#!/usr/bin/env python3
import logging
import sys
import time
from pathlib import Path
import psutil
from gguf.gguf_reader import GGUFReader

logger = logging.getLogger("reader")

sys.path.insert(0, str(Path(__file__).parent.parent))


def read_gguf_file(gguf_file_path):
    """
    Reads and prints key-value pairs and tensor information from a GGUF file in an improved format.

    Parameters:
    - gguf_file_path: Path to the GGUF file.
    """

    time0 = time.time()
    ram_init1 = psutil.virtual_memory()[2]
    ram_init2 = psutil.virtual_memory()[3]/1000000000

    reader = GGUFReader(gguf_file_path)

    # List all key-value pairs in a columnized format
    print("Key-Value Pairs:") # noqa: NP100
    max_key_length = max(len(key) for key in reader.fields.keys())
    for key, field in reader.fields.items():
        value = field.parts[field.data[0]]
        print(f"{key:{max_key_length}} : {value}") # noqa: NP100
    print("----") # noqa: NP100

    # List all tensors
    print("Tensors:") # noqa: NP100
    tensor_info_format = "{:<30} | Shape: {:<15} | Size: {:<12} | Quantization: {}"
    print(tensor_info_format.format("Tensor Name", "Shape", "Size", "Quantization")) # noqa: NP100
    print("-" * 80) # noqa: NP100
    for tensor in reader.tensors:
        shape_str = "x".join(map(str, tensor.shape))
        size_str = str(tensor.n_elements)
        quantization_str = tensor.tensor_type.name
        print(tensor_info_format.format(tensor.name, shape_str, size_str, quantization_str)) # noqa: NP100

    print('Time (s):', time.time() - time0)
    print('RAM memory % used:', psutil.virtual_memory()[2] - ram_init1)
    print('RAM Used (GB):', psutil.virtual_memory()[3]/1000000000 - ram_init2)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        logger.info("Usage: reader.py <path_to_gguf_file>")
        sys.exit(1)
    gguf_file_path = sys.argv[1]
    read_gguf_file(gguf_file_path)

Comparison Results

File: qwen2-0_5b-instruct-q2_k.gguf
CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
RAM: 16GB

Master

Time (s): 12.987974643707275
RAM memory % used: 1.7999999999999972
RAM Used (GB): 0.31249203199999975

This PR

Time (s): 4.433131456375122
RAM memory % used: 0.7999999999999972
RAM Used (GB): 0.1335459839999995

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2025-05-08T07:56:32Z

@compilade Sorry for the very late update. Can you please take a look to this optimization PR once again? Thanks!

Isotr0py added 2 commits May 8, 2025 15:28

refactor

206672f

Signed-off-by: Isotr0py <[email protected]>

fix type check

84e5e6a

Signed-off-by: Isotr0py <[email protected]>

Isotr0py mentioned this pull request May 8, 2025

gguf-py: Improve GGUFReader read-only mode performance #10159

Closed

4 tasks

github-actions bot added the python python script changes label May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf-py: Optimize `GGUFReader` read-only mode performance #13378

gguf-py: Optimize `GGUFReader` read-only mode performance #13378

Isotr0py commented May 8, 2025

Isotr0py commented May 8, 2025

gguf-py: Optimize GGUFReader read-only mode performance #13378

Are you sure you want to change the base?

gguf-py: Optimize GGUFReader read-only mode performance #13378

Conversation

Isotr0py commented May 8, 2025

Original description

Isotr0py commented May 8, 2025

gguf-py: Optimize `GGUFReader` read-only mode performance #13378

gguf-py: Optimize `GGUFReader` read-only mode performance #13378