Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 70 additions & 92 deletions articles/tutorials/tune-readsize.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,96 +19,74 @@ times can be achieved.
> - GPU: NVIDIA GTX 1070 8GB
> - OS: Windows 11

## Data Transmission from ONIX Hardware to Host Computer

ONIX is capable of transferring data directly from production to the
host computer. However, if the host is busy when ONIX starts
producing data, ONIX will temporarily store this new data in its hardware buffer
while it waits for the host to be ready to accept new data.

Key details about this process:

- The size of hardware-to-host data transfers is determined by the
<xref:OpenEphys.Onix1.StartAcquisition.ReadSize> property of the
<xref:OpenEphys.Onix1.StartAcquisition> operator which is in every Bonsai
workflow that uses <xref:OpenEphys.Onix1> to acquire data from ONIX.
- Increasing `ReadSize` allows the host to read larger chunks of data from
ONIX per read operation without significantly increasing the duration of the
read operation, therefore increasing the maximum rate at which data can be
read.
- If the host is busy or cannot perform read operations rapidly enough to keep
up with the rate at which ONIX produces data, the ONIX hardware buffer will
start to accumulate excessive data.
- Accumulation of excess data in the hardware buffer collapses real-time
performance and risks hardware buffer overflow which would prematurely
terminate the acquisition session. `ReadSize` can be increased to avoid this
situation.
- As long as this situation is avoided, decreasing `ReadSize` means that ONIX
doesn't need to produce as much data before the host can access it. This,
in effect, means software can start operating on data closer to the time
that the data was produced, thus achieving lower-latency feedback-loops.

In other words, a small `ReadSize` can help the host access data sooner to when
that data was created. However, each data transfer incurs overhead. If
`ReadSize` is so small that ONIX produces a `ReadSize` amount of data faster
than the average time it takes the host computer to perform a read operation,
the hardware buffer will accumulate excessive data. This will destroy real-time
performance and eventually cause the hardware buffer to overflow, terminating
acquisition. The goal of this tutorial is to tune StartAcquisition's `ReadSize`
so that data flows from production to the software running on the host as
quickly as possible by minimizing the amount of time that it sits idly in both
the ONIX hardware buffer and the host computer's buffer. This provides software
access to the data as close to when the data was produced as possible which
helps achieve lower latency closed-loop feedback.

### Technical Details

> [!NOTE]
> This section explains more in-depth how data is transferred from ONIX to the
> host computer. Although these details provide additional context about ONIX,
> they are more technical and are not required for following the rest of the
> tutorial.

When the host computer reads data from the ONIX
hardware, it retrieves a **ReadSize**-bytes sized chunk of data using the
following procedure:

1. A `ReadSize`-bytes long block of memory is allocated on the host computer's
RAM by the host API for the purpose of holding incoming data from ONIX.
1. A pointer to that memory is provided to the
[RIFFA](https://open-ephys.github.io/ONI/v1.0/api/liboni/driver-translators/riffa.html)
driver (the PCIe backend/kernel driver for the ONIX system) which moves the
allocated memory block into a more privileged state known as kernel mode so
that it can initiate a [DMA
transfer](https://en.wikipedia.org/wiki/Direct_memory_access). DMA allows
data transfer to be performed by ONIX hardware without additional CPU
intervention.
1. The data transfer completes once this block of data has been populated with
`ReadSize` bytes of data from ONIX.
1. The RIFFA driver moves the memory block from kernel mode to user mode so
that it can be accessed by software. The API function returns with a pointer
to the filled buffer.

During this process, memory is allocated only once by the API, and the transfer
is [zero-copy](https://en.wikipedia.org/wiki/Zero-copy). The API-allocated
buffer is written autonomously by ONIX hardware using minimal resources from
the host computer.

So far, all this occurs on the host-side. Meanwhile, on the ONIX-side:

- If ONIX produces new data before the host is able to consume the data in the
API-allocated buffer, this new data is added to the back of ONIX hardware
buffer FIFO. The ONIX hardware buffer consists of 2GB of RAM that belongs to
the acquisition hardware (it is _not_ RAM in the host computer) dedicated to
temporarily storing data that is waiting to be transferred to the host. Data
is removed from the front of the hardware buffer and transferred to the host
once it's ready to accept more data.
- If the memory is allocated on the host-side and the data transfer is
initiated by the host API before any data is produced, ONIX transfers new
data directly to the host bypassing the hardware buffer. In this case, ONIX
is literally streaming data to the host _the moment it is produced_. This
data becomes available for reading by the host once ONIX transfers the full
`ReadSize` bytes.
## Hardware Buffer and ReadSize

Data is transferred in `ReadSize`-bytes chunks from ONIX to the host computer.
This `ReadSize` value can be set by the user. If `ReadSize` is so small that
ONIX produces `ReadSize` bytes of data faster than the host computer can perform
a read operation, newly produced data is streamed to ONIX's hardware buffer
instead of directly to the host's RAM. If this happens too much, closed-loop
feedback performance suffers and the likelihood of hardware buffer overflow
increases. However, if `ReadSize` is so large that it takes a long time for ONIX
to produce a `ReadSize` amount of data, a single `ReadSize`-chunk contains data
from a larger span of time. This increases the average closed-loop latency. The
goal is to set a `ReadSize` that balances these consideration. The rest of this
section describes ONIX-to-host data transfers in greater technical detail to
help better understand this balancing act.

Each time the host software reads data from the hardware, it obtains `ReadSize`
bytes of data using the following procedure:

1. A block of memory that is `ReadSize` bytes long is allocated by the API.
2. A pointer to that memory is provided to the kernel driver, which locks it
into kernel mode wherein ONIX can directly access that that block of memory.
The kernel drive initiates a [DMA transfer](https://en.wikipedia.org/wiki/Direct_memory_access).
3. The transfer is performed by ONIX hardware without additional CPU
intervention and completes once `ReadSize` bytes have been transferred.
4. Upon transfer completion, the buffer is passed from kernel mode back to user
mode which relinquishes control of the memory block to software. The API
function returns with a pointer to the filled buffer.

There are a couple of things to note about this process:

1. Memory is allocated only once by the API, and the transfer is
[zero-copy](https://en.wikipedia.org/wiki/Zero-copy). ONIX hardware writes
directly into the API-allocated buffer autonomously without using the host
computer's resources. Within this process, `ReadSize` determines the amount
of data that is transferred each time the API reads data from the hardware.
2. If the buffer is allocated and the transfer initiated by the host API before
data is produced by the hardware, the data is transferred directly into the
buffer. In this case, hardware is literally streaming data to the software
buffer _the moment it is produced_. With the constraint that the entire
buffer must be filled with `ReadSize` bytes before software can access it, it
is physically impossible to achieve lower latencies than this. The goal of
this tutorial is to allow your system to operate in this regime.
3. If ONIX is produces data while data is being transferred to or waiting to be
consumed by the host, this stream of new data is redirected to the ONIX
`Hardware Buffer`. The ONIX hardware buffer consists of 2GB of dedicated RAM
that belongs to the acquisition hardware (it is _not_ RAM in the host
computer). The hardware buffer temporarily stores data that has not yet been
transferred to the host.

The size of hardware to host data transfers is determined by the
<xref:OpenEphys.Onix1.StartAcquisition.ReadSize> property of the
<xref:OpenEphys.Onix1.StartAcquisition> operator, which is included in every
workflow that uses <xref:OpenEphys.Onix1> to acquire data from ONIX. Choosing an
optimal `ReadSize` value balances the tradeoff between latency and overall
bandwidth. Smaller `ReadSize` values mean that less data needs to accumulate
before the kernel driver relinquishes control of the buffer to software. This,
in effect, means less time needs to pass before software can start operating on
data, and thus lower-latency feedback loops can be achieved. However, because
each transfer requires calls to the kernel driver, they incur significant
overhead. If `ReadSize` is so small that the average time it takes to perform a
data transfer is longer than the time it takes the hardware to produce a
`ReadSize` amount of data, data will accumulate in the Hardware Buffer. This
will destroy real-time performance and eventually cause the hardware buffer to
overflow, terminating acquisition. Larger `ReadSize` values mean that more data
needs to accumulate before the kernel driver relinquishes control of the buffer
to software. This means more time needs to pass before software can start
operating on data. This increases average latency but reduces the risk of
accumulating data in the ONIX hardware buffer.

## Tuning `ReadSize` to Optimize Closed Loop Performance

Expand Down Expand Up @@ -381,7 +359,7 @@ shows that the hardware buffer does not accumulate data:
> - **As of OpenEphys.Onix1 0.7.0:** As long as you stay above the minimum
> mentioned in the previous bullet point, `ReadSize` can be set to any value
> by the user. The OpenEphys.Onix1 Bonsai package will round this `ReadSize`
> to the nearest multiple of four and uses that value instead. For example,
> up to the next multiple of four and uses that value instead. For example,
> if you try to set `ReadSize` to 887, the software will use the value 888
> instead.
> - If you are using a data I/O operator that has capacity to produce data at
Expand Down Expand Up @@ -434,4 +412,4 @@ necessarily increases the total data throughput of
the latency measurements will not reflect the latencies you will experience
during the actual experiment.

<!-- ## Tuning `ReadSize` with Real-Time Computation -->
<!-- ## Tuning `ReadSize` with Real-Time Computation -->
Loading