Skip to content

gguf-py: Optimize GGUFReader read-only mode performance #13378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Isotr0py
Copy link

@Isotr0py Isotr0py commented May 8, 2025

Make sure to read the contributing guidelines before submitting a PR

Original description

This PR aims to optimize the GGUFReader read-only performance with following modifications:

  • Using native python file I/O to build fields instead of memmap array.
  • Optimize _get_str and _get function in read-only mode with np.from_buffer.
  • Avoid calculating offsets from array with creating intermediate data, using tell from native python I/O file to update offsets instead.

Performance Comparison

Benchmark script
#!/usr/bin/env python3
import logging
import sys
import time
from pathlib import Path
import psutil
from gguf.gguf_reader import GGUFReader

logger = logging.getLogger("reader")

sys.path.insert(0, str(Path(__file__).parent.parent))


def read_gguf_file(gguf_file_path):
    """
    Reads and prints key-value pairs and tensor information from a GGUF file in an improved format.

    Parameters:
    - gguf_file_path: Path to the GGUF file.
    """

    time0 = time.time()
    ram_init1 = psutil.virtual_memory()[2]
    ram_init2 = psutil.virtual_memory()[3]/1000000000

    reader = GGUFReader(gguf_file_path)

    # List all key-value pairs in a columnized format
    print("Key-Value Pairs:") # noqa: NP100
    max_key_length = max(len(key) for key in reader.fields.keys())
    for key, field in reader.fields.items():
        value = field.parts[field.data[0]]
        print(f"{key:{max_key_length}} : {value}") # noqa: NP100
    print("----") # noqa: NP100

    # List all tensors
    print("Tensors:") # noqa: NP100
    tensor_info_format = "{:<30} | Shape: {:<15} | Size: {:<12} | Quantization: {}"
    print(tensor_info_format.format("Tensor Name", "Shape", "Size", "Quantization")) # noqa: NP100
    print("-" * 80) # noqa: NP100
    for tensor in reader.tensors:
        shape_str = "x".join(map(str, tensor.shape))
        size_str = str(tensor.n_elements)
        quantization_str = tensor.tensor_type.name
        print(tensor_info_format.format(tensor.name, shape_str, size_str, quantization_str)) # noqa: NP100

    print('Time (s):', time.time() - time0)
    print('RAM memory % used:', psutil.virtual_memory()[2] - ram_init1)
    print('RAM Used (GB):', psutil.virtual_memory()[3]/1000000000 - ram_init2)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        logger.info("Usage: reader.py <path_to_gguf_file>")
        sys.exit(1)
    gguf_file_path = sys.argv[1]
    read_gguf_file(gguf_file_path)
Comparison Results

File: qwen2-0_5b-instruct-q2_k.gguf
CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
RAM: 16GB

Master

Time (s): 12.987974643707275
RAM memory % used: 1.7999999999999972
RAM Used (GB): 0.31249203199999975

This PR

Time (s): 4.433131456375122
RAM memory % used: 0.7999999999999972
RAM Used (GB): 0.1335459839999995

Isotr0py added 2 commits May 8, 2025 15:28
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@github-actions github-actions bot added the python python script changes label May 8, 2025
@Isotr0py
Copy link
Author

Isotr0py commented May 8, 2025

@compilade Sorry for the very late update. Can you please take a look to this optimization PR once again? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant