feat(io_uring): Add support for registered buffers#72
Conversation
vadimskipin
left a comment
There was a problem hiding this comment.
Cool! I thought about this optimization but have not try. Results are really impressive.
There was a problem hiding this comment.
Pull request overview
Adds io_uring “fixed/registered buffer” support to FiberScheduler and integrates it into the file-perf benchmark to avoid per-IO buffer pinning/allocation overhead.
Changes:
- Added
FiberScheduler::registerBuffers(),readFixed(), andwriteFixed()APIs backed by liburing helpers. - Updated
file-perfto optionally register per-job buffers and issueIORING_OP_READ_FIXED/IORING_OP_WRITE_FIXEDvia a new--fixed-buffersflag. - Extended the
bbperf runner to pass through--fixed-buffersfor file-perf runs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/perf/file-perf.cpp | Adds a --fixed-buffers mode that registers per-job buffers and switches IO submission to fixed-buffer ops. |
| src/fibers/fiber.cpp | Implements readFixed, writeFixed, and registerBuffers on the scheduler’s per-CPU rings. |
| include/silk/fibers/fiber.h | Exposes and documents the new fixed-buffer APIs on the public scheduler interface. |
| bb | Adds CLI plumbing to enable fixed-buffer mode for file-perf via bb perf and bb file-perf. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| continue; | ||
| } | ||
| int r = ::io_uring_register_buffers(&processor->ring, iovecs, count); |
There was a problem hiding this comment.
What is about NUMA awareness here? Allocate once and use on any CPU does not look optimal. Should we maintain separate buffers per-node (per-cpu)?
There was a problem hiding this comment.
Fair point 👍 This may require some changes on all three new apis (read_fixed/write_fixed/register_buffers) I think. Currently the buffers can physically live on one node and cores on other nodes have to pay remote-memory cost.
I'm thinking of having something simple struct
struct FixedBuf
{
void * base[SILK_MAX_NUMA_NODES]; // node-local pinned bases
uint32_t index; // node-relative index
uint32_t len;
};and make read_fixed and write_fixed apis accepts this FixedBuffer along with offset instead of plain void * pointer.
what do you think? May be it's complex? open to other ideas if you got any simpler approach (I'm not super familiar with NUMA in general :) )
There was a problem hiding this comment.
It seems silk just need to expose raw register-buffer API. It would be better to write client code first and then decide what can be pushed into silk.
There was a problem hiding this comment.
After bit of playing I think it's better to keep the NUMA aware part, out of the silk's registerBuffer api.
Here are the rationale.
-
Basically what
io_uring_register_bufferdoes is just register the given buffer (block of memory address) to be used on kernel side. It doesn't do anymallocormemset. It's up to the client side that does those. Whatever the (CPU, NODE) touches that memory address first (viamemsetfor example) it get resident on that node. For example [how we do allign_maloc and memset on file-perf itself.](https://github.com/kavirajk/silk/blob/c8ee4a7c3189c27b4cac905016cd8a4421e1cfbe/src/perf/file-p
erf.cpp?plain=1#L209-L211) -
The Nature of work-stealing in silk itself is a tension here. With "buffer on specific node", I think no amount of carefulness on making sure the buffer becomes resident on local node, doesn't guarantee once the fiber is stolen from other CPU (paying the remote node cost anyway). We shouldn't complicate with
move_pagesto move those address to local node during such work stealing cases at this point I think. -
I ran few tests with
numactl --membind and --cpubindto understand the remote node cost infile-perf. I think if user want's to really take advantage of "local node buffer register", they can bind the node vianumactl --cpunodebind=Nto avoid work-stealing from CPU of different node. Even that doesn't need any api changes on silk side.
My honest take is, we should leave the registerBuffers api as is and document it's "shared nature of the buffer and how work-stealing can add remote node latency".
Curious to know your opinion @vadimskipin
|
@vadimskipin curious to get your thoughts here. |
vadimskipin
left a comment
There was a problem hiding this comment.
So client is responsible to correctly pick a buffer index. OK, looks good.
There was a problem hiding this comment.
probably, would be better to rename this file into fiber-io-fixed-test.cpp
vadimskipin
left a comment
There was a problem hiding this comment.
Need to squash commits
|
Please rebase your changes. Main branch must have a clean linear history of commits. No merge commits! |
Add support for new apis to scheduler 1. register_buffers 2. read_fixed() 3. write_fixe() This let us register the pre-allocated buffers that iouring can use during IO operations rather then allocating it per-io. This is mainly based on best practices learned from TUM DBMS paper https://arxiv.org/pdf/2512.04859 Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Add a flag to run file-perf with register buffer iouring api ``` ./bb -b release perf --duration 60s --warmup 10s file --fixed-buffers ``` The numbers looks super interesting. So worth adding it to upstream Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
`./bb fmt` Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Changes 1. Make sure the readFixed api on the registered buffer is checked by msan for uninitialized memory (similar to readv api) 2. Fix the nbytes len field (uint64_t -> uint32_t) because that's the underlying io_uring_* api expects 3. Add a round trip test for new api Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Document the new apis in corresponding docs Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
ed7cffd to
3944ece
Compare
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
|
thanks for the review @vadimskipin. Rebased with clean commit history. |
vadimskipin
left a comment
There was a problem hiding this comment.
Please squash commits when merging!
|
@vadimskipin I don't have permission to merge :) |
Changes
Added following new apis to the scheduler
Which internally using liburing helpers io_uring_register_buffers(), io_uring_prep_read_fixed(), io_uring_prep_write_fixed().
This let us register the pre-allocated buffers that iouring can use during IO operations rather then allocating it per-io.
This is mainly based on best practices learned from TUM DBMS paper
https://arxiv.org/pdf/2512.04859
I also integrated with
file-perfbenchmark. And the numbers looked promising. See below for the actual improvement numbersPerformance
My setup is m8id.8xlarge EC2 instance
Normal
Fixed Buffers
Throughput diff (normal vs fixed buffers)
Latency diff (normal vs fixed buffers)