Skip to content

Overhead introduced by constructing buffers for bfloat16 dtypes on read #260

@dfm-anthropic

Description

@dfm-anthropic

I've been running some benchmarks to test bandwidth on tensorstore reads, and I've found significant overhead for non-standard dtypes (e.g. bfloat16) where tensorstore seems to initialize the array to zero before overwriting it with the data. Here's the benchmark that I've been running:

import time
import tensorstore as ts

dataset = ts.open(
    {
        "driver": "zarr3",
        "kvstore": {
            "driver": "gcs",
            "bucket": "...",
            "path": "...",
        },
        "open": True,
        "create": False,
    }
).result()

start = time.perf_counter()
array = dataset.read().result()
duration = time.perf_counter() - start

gb = array.nbytes / 1024**3
bandwidth = gb / duration
print(f"{gb:.2f} GB loaded in {duration:.2f} s ({bandwidth:.2f} GB/s)")

where the tensor that I'm loading is about 55 GB of bfloat16 data.

If I run this benchmark using the current master branch of tensorstore, I see:

54.93 GB loaded in 58.35 s (0.94 GB/s)

but, if I disable zero initialization on reads (more details below), I get:

54.93 GB loaded in 31.47 s (1.75 GB/s)

which suggests that there's about 30s of overhead introduced by the current implementations construction of the 55 GB array!

I've narrowed the offending line down to:

r->construct(n, ptr.get());

I can avoid this overhead by hacking tensorstore to skip that r->construct(...) or replace this line with:

-   constexpr BFloat16() : rep_(0) {}
+   BFloat16() = default;

for bfloat16 specifically.

I wanted to ask here if anyone had advice about what would be the preferred approach for safely avoiding this overhead. I'd be very happy to open a PR if we decide that there's something general to do here. Thanks!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions