Overhead introduced by constructing buffers for bfloat16 dtypes on read

I've been running some benchmarks to test bandwidth on tensorstore reads, and I've found significant overhead for non-standard dtypes (e.g. bfloat16) where tensorstore seems to initialize the array to zero before overwriting it with the data. Here's the benchmark that I've been running:

```python
import time
import tensorstore as ts

dataset = ts.open(
    {
        "driver": "zarr3",
        "kvstore": {
            "driver": "gcs",
            "bucket": "...",
            "path": "...",
        },
        "open": True,
        "create": False,
    }
).result()

start = time.perf_counter()
array = dataset.read().result()
duration = time.perf_counter() - start

gb = array.nbytes / 1024**3
bandwidth = gb / duration
print(f"{gb:.2f} GB loaded in {duration:.2f} s ({bandwidth:.2f} GB/s)")
```

where the tensor that I'm loading is about 55 GB of bfloat16 data.

If I run this benchmark using the current master branch of tensorstore, I see:

```
54.93 GB loaded in 58.35 s (0.94 GB/s)
```

but, if I disable zero initialization on reads (more details below), I get:

```
54.93 GB loaded in 31.47 s (1.75 GB/s)
```

which suggests that there's about 30s of overhead introduced by the current implementations construction of the 55 GB array!

I've narrowed the offending line down to:

https://github.com/google/tensorstore/blob/367ef7d380618624b8f7e5078b1765345b65e3a4/tensorstore/data_type.cc#L86

I can avoid this overhead by hacking tensorstore to skip that `r->construct(...)` or replace [this line](https://github.com/google/tensorstore/blob/367ef7d380618624b8f7e5078b1765345b65e3a4/tensorstore/util/bfloat16.h#L76) with:

```diff
-   constexpr BFloat16() : rep_(0) {}
+   BFloat16() = default;
```

for bfloat16 specifically.

I wanted to ask here if anyone had advice about what would be the preferred approach for safely avoiding this overhead. I'd be very happy to open a PR if we decide that there's something general to do here. Thanks!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overhead introduced by constructing buffers for bfloat16 dtypes on read #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Overhead introduced by constructing buffers for bfloat16 dtypes on read #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions