From 6795f7be944e259648c58fb6f5a31da882e7e61b Mon Sep 17 00:00:00 2001 From: cjsha Date: Fri, 26 Sep 2025 14:44:08 -0400 Subject: [PATCH 1/3] jpn technical edits --- articles/tutorials/tune-readsize.md | 140 ++++++++++------------------ 1 file changed, 50 insertions(+), 90 deletions(-) diff --git a/articles/tutorials/tune-readsize.md b/articles/tutorials/tune-readsize.md index 15a09693..9b58736d 100644 --- a/articles/tutorials/tune-readsize.md +++ b/articles/tutorials/tune-readsize.md @@ -19,96 +19,56 @@ times can be achieved. > - GPU: NVIDIA GTX 1070 8GB > - OS: Windows 11 -## Data Transmission from ONIX Hardware to Host Computer - -ONIX is capable of transferring data directly from production to the -host computer. However, if the host is busy when ONIX starts -producing data, ONIX will temporarily store this new data in its hardware buffer -while it waits for the host to be ready to accept new data. - -Key details about this process: - -- The size of hardware-to-host data transfers is determined by the - property of the - operator which is in every Bonsai - workflow that uses to acquire data from ONIX. -- Increasing `ReadSize` allows the host to read larger chunks of data from - ONIX per read operation without significantly increasing the duration of the - read operation, therefore increasing the maximum rate at which data can be - read. -- If the host is busy or cannot perform read operations rapidly enough to keep - up with the rate at which ONIX produces data, the ONIX hardware buffer will - start to accumulate excessive data. -- Accumulation of excess data in the hardware buffer collapses real-time - performance and risks hardware buffer overflow which would prematurely - terminate the acquisition session. `ReadSize` can be increased to avoid this - situation. -- As long as this situation is avoided, decreasing `ReadSize` means that ONIX - doesn't need to produce as much data before the host can access it. This, - in effect, means software can start operating on data closer to the time - that the data was produced, thus achieving lower-latency feedback-loops. - -In other words, a small `ReadSize` can help the host access data sooner to when -that data was created. However, each data transfer incurs overhead. If -`ReadSize` is so small that ONIX produces a `ReadSize` amount of data faster -than the average time it takes the host computer to perform a read operation, -the hardware buffer will accumulate excessive data. This will destroy real-time -performance and eventually cause the hardware buffer to overflow, terminating -acquisition. The goal of this tutorial is to tune StartAcquisition's `ReadSize` -so that data flows from production to the software running on the host as -quickly as possible by minimizing the amount of time that it sits idly in both -the ONIX hardware buffer and the host computer's buffer. This provides software -access to the data as close to when the data was produced as possible which -helps achieve lower latency closed-loop feedback. - -### Technical Details - -> [!NOTE] -> This section explains more in-depth how data is transferred from ONIX to the -> host computer. Although these details provide additional context about ONIX, -> they are more technical and are not required for following the rest of the -> tutorial. - -When the host computer reads data from the ONIX -hardware, it retrieves a **ReadSize**-bytes sized chunk of data using the -following procedure: - -1. A `ReadSize`-bytes long block of memory is allocated on the host computer's - RAM by the host API for the purpose of holding incoming data from ONIX. -1. A pointer to that memory is provided to the - [RIFFA](https://open-ephys.github.io/ONI/v1.0/api/liboni/driver-translators/riffa.html) - driver (the PCIe backend/kernel driver for the ONIX system) which moves the - allocated memory block into a more privileged state known as kernel mode so - that it can initiate a [DMA - transfer](https://en.wikipedia.org/wiki/Direct_memory_access). DMA allows - data transfer to be performed by ONIX hardware without additional CPU - intervention. -1. The data transfer completes once this block of data has been populated with - `ReadSize` bytes of data from ONIX. -1. The RIFFA driver moves the memory block from kernel mode to user mode so - that it can be accessed by software. The API function returns with a pointer - to the filled buffer. - -During this process, memory is allocated only once by the API, and the transfer -is [zero-copy](https://en.wikipedia.org/wiki/Zero-copy). The API-allocated -buffer is written autonomously by ONIX hardware using minimal resources from -the host computer. - -So far, all this occurs on the host-side. Meanwhile, on the ONIX-side: - -- If ONIX produces new data before the host is able to consume the data in the - API-allocated buffer, this new data is added to the back of ONIX hardware - buffer FIFO. The ONIX hardware buffer consists of 2GB of RAM that belongs to - the acquisition hardware (it is _not_ RAM in the host computer) dedicated to - temporarily storing data that is waiting to be transferred to the host. Data - is removed from the front of the hardware buffer and transferred to the host - once it's ready to accept more data. -- If the memory is allocated on the host-side and the data transfer is - initiated by the host API before any data is produced, ONIX transfers new - data directly to the host bypassing the hardware buffer. In this case, ONIX - is literally streaming data to the host _the moment it is produced_. This - data becomes available for reading by the host once ONIX transfers the full - `ReadSize` bytes. +## Hardware Buffer and ReadSize + +The ONIX **Hardware Buffer** consists of 2GB of dedicated RAM +that belongs to the acquisition hardware (it is _not_ RAM in the host computer). +The hardware buffer temporarily stores data that has not yet been transferred to +the host. When the host software is consuming data optimally, the hardware +buffer is bypassed entirely and data flows directly from production to the +host RAM, minimizing the latency between data collection and processing. + +Each time the host software reads data from the hardware, it obtains +**ReadSize** bytes of data using the following procedure: + +1. A block of memory that is `ReadSize` bytes long is allocated by the API +2. A pointer to that memory is provided to the kernel driver, which locks it + into kernel mode and initiates a [DMA + transfer](https://en.wikipedia.org/wiki/Direct_memory_access) from the + hardware. +3. The transfer is performed by the ONIX hardware without CPU intervention and + completes once `ReadSize` bytes have been produced. +4. Upon transfer completion, the buffer is passed back to user mode and the API + function returns with a pointer to the filled buffer. + +There are a couple of things to note about this process: + +1. Memory is allocated only once by the API, and the transfer is + [zero-copy](https://en.wikipedia.org/wiki/Zero-copy). ONIX hardware writes + directly into the API-allocated buffer autonomously without using the host + computer's resources. Within this process, `ReadSize` determines the amount + of data that is transferred each time the API reads data from the hardware. +2. If the buffer is allocated and the transfer initiated by the host API before + data is produced by the hardware, the data is transferred directly into the + buffer and completely bypasses the Hardware Buffer. In this case, hardware is + literally streaming data to the software buffer _the moment it is produced_. + It is physically impossible to achieve lower latencies than this situation. + The goal of this tutorial is to allow your system to operate in this regime. + +The size of hardware to host data transfers is determined by the + property of the +StartAcquisition operator, which is necessary for every workflow that uses + to acquire data from ONIX. Choosing an optimal `ReadSize` +value balances the tradeoff between latency and overall bandwidth. Smaller +`ReadSize` values mean that less data needs to accumulate before the kernel +driver relinquishes control of the buffer to software. This, in effect, means +less time needs to pass before software can start operating on data, and thus +lower-latency feedback loops can be achieved. However, because each transfer +requires calls to the kernel driver, they incur significant overhead. If +`ReadSize` is so low that the average time it takes to perform a data transfer +is longer than the time it takes the hardware to produce data, data will +accumulate in the Hardware Buffer. This will destroy real-time performance and +eventually cause the hardware buffer to overflow, terminating acquisition. ## Tuning `ReadSize` to Optimize Closed Loop Performance From db30566be17ba664a1b28834b3dfb96dacd58091 Mon Sep 17 00:00:00 2001 From: cjsha Date: Fri, 26 Sep 2025 15:58:05 -0400 Subject: [PATCH 2/3] cs edits to jpn's technical description - Add summary at start of section - Add a third bullet point describing the condition where data is streamed to hardware buffer instead of host RAM. - Move description of ONIX hardware buffer from beginning of section to the next mention of the hardware buffer. - Add brief clauses to describe what kernel mode and user mode do in this context (I feel like readers could get lost here). - Add qualification regarding the statement "it is physically impossible to achieve lower latencies than this" (line ~60) - Add a few sentences to the last paragraph about the perils of a `ReadSize` value that is too large. --- articles/tutorials/tune-readsize.md | 84 +++++++++++++++++------------ 1 file changed, 51 insertions(+), 33 deletions(-) diff --git a/articles/tutorials/tune-readsize.md b/articles/tutorials/tune-readsize.md index 9b58736d..60f5a7b2 100644 --- a/articles/tutorials/tune-readsize.md +++ b/articles/tutorials/tune-readsize.md @@ -21,24 +21,30 @@ times can be achieved. ## Hardware Buffer and ReadSize -The ONIX **Hardware Buffer** consists of 2GB of dedicated RAM -that belongs to the acquisition hardware (it is _not_ RAM in the host computer). -The hardware buffer temporarily stores data that has not yet been transferred to -the host. When the host software is consuming data optimally, the hardware -buffer is bypassed entirely and data flows directly from production to the -host RAM, minimizing the latency between data collection and processing. - -Each time the host software reads data from the hardware, it obtains -**ReadSize** bytes of data using the following procedure: - -1. A block of memory that is `ReadSize` bytes long is allocated by the API +Data is transferred in `ReadSize`-bytes chunks from ONIX to the host computer. +This `ReadSize` value can be set by the user. If `ReadSize` is so small that +ONIX produces `ReadSize` bytes of data faster than the host computer can perform +a read operation, newly produced data is streamed to ONIX's hardware buffer +instead of directly to the host's RAM. If this happens too much, closed-loop +feedback performance suffers and the likelihood of hardware buffer overflow +increases. However, if `ReadSize` is so large that it takes a long time for ONIX +to produce a `ReadSize` amount of data, a single `ReadSize`-chunk contains data +from a larger span of time. This increases the average closed-loop latency. The +goal is to set a `ReadSize` that balances these consideration. The rest of this +section describes ONIX-to-host data transfers in greater technical detail to +help better understand this balancing act. + +Each time the host software reads data from the hardware, it obtains `ReadSize` +bytes of data using the following procedure: + +1. A block of memory that is `ReadSize` bytes long is allocated by the API. 2. A pointer to that memory is provided to the kernel driver, which locks it - into kernel mode and initiates a [DMA - transfer](https://en.wikipedia.org/wiki/Direct_memory_access) from the - hardware. -3. The transfer is performed by the ONIX hardware without CPU intervention and - completes once `ReadSize` bytes have been produced. -4. Upon transfer completion, the buffer is passed back to user mode and the API + into kernel mode wherein ONIX can directly access that that block of memory. + The kernel drive initiates a [DMA transfer](https://en.wikipedia.org/wiki/Direct_memory_access). +3. The transfer is performed by ONIX hardware without additional CPU + intervention and completes once `ReadSize` bytes have been transferred. +4. Upon transfer completion, the buffer is passed from kernel mode back to user + mode which relinquishes control of the memory block to software. The API function returns with a pointer to the filled buffer. There are a couple of things to note about this process: @@ -50,25 +56,37 @@ There are a couple of things to note about this process: of data that is transferred each time the API reads data from the hardware. 2. If the buffer is allocated and the transfer initiated by the host API before data is produced by the hardware, the data is transferred directly into the - buffer and completely bypasses the Hardware Buffer. In this case, hardware is - literally streaming data to the software buffer _the moment it is produced_. - It is physically impossible to achieve lower latencies than this situation. - The goal of this tutorial is to allow your system to operate in this regime. + buffer. In this case, hardware is literally streaming data to the software + buffer _the moment it is produced_. With the constraint that the entire + buffer must be filled with `ReadSize` bytes before software can access it, it + is physically impossible to achieve lower latencies than this. The goal of + this tutorial is to allow your system to operate in this regime. +3. If ONIX is produces data while data is being transferred to or waiting to be + consumed by the host, this stream of new data is redirected to the ONIX + `Hardware Buffer`. The ONIX hardware buffer consists of 2GB of dedicated RAM + that belongs to the acquisition hardware (it is _not_ RAM in the host + computer). The hardware buffer temporarily stores data that has not yet been + transferred to the host. The size of hardware to host data transfers is determined by the property of the -StartAcquisition operator, which is necessary for every workflow that uses - to acquire data from ONIX. Choosing an optimal `ReadSize` -value balances the tradeoff between latency and overall bandwidth. Smaller -`ReadSize` values mean that less data needs to accumulate before the kernel -driver relinquishes control of the buffer to software. This, in effect, means -less time needs to pass before software can start operating on data, and thus -lower-latency feedback loops can be achieved. However, because each transfer -requires calls to the kernel driver, they incur significant overhead. If -`ReadSize` is so low that the average time it takes to perform a data transfer -is longer than the time it takes the hardware to produce data, data will -accumulate in the Hardware Buffer. This will destroy real-time performance and -eventually cause the hardware buffer to overflow, terminating acquisition. + operator, which is included in every +workflow that uses to acquire data from ONIX. Choosing an +optimal `ReadSize` value balances the tradeoff between latency and overall +bandwidth. Smaller `ReadSize` values mean that less data needs to accumulate +before the kernel driver relinquishes control of the buffer to software. This, +in effect, means less time needs to pass before software can start operating on +data, and thus lower-latency feedback loops can be achieved. However, because +each transfer requires calls to the kernel driver, they incur significant +overhead. If `ReadSize` is so small that the average time it takes to perform a +data transfer is longer than the time it takes the hardware to produce a +`ReadSize` amount of data, data will accumulate in the Hardware Buffer. This +will destroy real-time performance and eventually cause the hardware buffer to +overflow, terminating acquisition. Larger `ReadSize` values mean that more data +needs to accumulate before the kernel driver relinquishes control of the buffer +to software. This means more time needs to pass before software can start +operating on data. This increases average latency but reduces the risk of +accumulating data in the ONIX hardware buffer. ## Tuning `ReadSize` to Optimize Closed Loop Performance From 11961e2d0ee22f5c2bc56e7d4d2586075a075a8c Mon Sep 17 00:00:00 2001 From: cjsha <36574350+cjsha@users.noreply.github.com> Date: Wed, 29 Oct 2025 18:15:08 -0400 Subject: [PATCH 3/3] Correction about ReadSize values rounding up to the next 32-bit aligned- value --- articles/tutorials/tune-readsize.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/articles/tutorials/tune-readsize.md b/articles/tutorials/tune-readsize.md index 60f5a7b2..4ffb1fac 100644 --- a/articles/tutorials/tune-readsize.md +++ b/articles/tutorials/tune-readsize.md @@ -359,7 +359,7 @@ shows that the hardware buffer does not accumulate data: > - **As of OpenEphys.Onix1 0.7.0:** As long as you stay above the minimum > mentioned in the previous bullet point, `ReadSize` can be set to any value > by the user. The OpenEphys.Onix1 Bonsai package will round this `ReadSize` -> to the nearest multiple of four and uses that value instead. For example, +> up to the next multiple of four and uses that value instead. For example, > if you try to set `ReadSize` to 887, the software will use the value 888 > instead. > - If you are using a data I/O operator that has capacity to produce data at @@ -412,4 +412,4 @@ necessarily increases the total data throughput of the latency measurements will not reflect the latencies you will experience during the actual experiment. - \ No newline at end of file +