Is there an existing issue for this?
Platform/Architecture
Linux aarch64 (also reproducible on x86_64)
Fast DDS version
v2.6.x (Humble)
Transport layer
SHM (data-sharing)
Description
When all matched readers for a StatefulWriter use data-sharing (SHM), the writer still routes every sample through the flow controller and constructs an RTPSMessageGroup — even though there is no network delivery to perform. This adds significant latency that could be avoided.
The delivery path today:
unsent_change_added_to_history()
→ prepare_datasharing_delivery() // writes data to SHM ✓
→ flow_controller_->add_new_sample() // queues for delivery
→ deliver_sample_nts()
→ deliver_sample_to_intraprocesses() // if local readers
→ deliver_sample_to_datasharing() // notifies SHM readers
→ deliver_sample_to_network() // no-op (no remote readers)
→ check_acked_status()
The flow_controller_->add_new_sample() path involves:
- Acquiring the
LocatorSelectorSender lock
- Constructing an
RTPSMessageGroup (buffer allocation, locator selection)
- Calling
deliver_sample_nts() which then branches on reader type
When there are no remote (network) readers, the RTPSMessageGroup construction and locator selection are pure overhead — deliver_sample_to_network() would be a no-op anyway.
Proposed optimization
Call deliver_sample_to_intraprocesses() and deliver_sample_to_datasharing() directly (they're already standalone methods), then only enter the flow controller if there are network readers:
// In unsent_change_added_to_history(), inside the `if (should_be_sent)` block:
if (there_are_local_readers_)
{
deliver_sample_to_intraprocesses(change);
}
if (there_are_datasharing_readers_)
{
deliver_sample_to_datasharing(change);
}
if (there_are_remote_readers_)
{
// Flow controller handles network delivery; intraprocess/datasharing
// are no-ops inside deliver_sample_nts() due to change_is_unsent() guard.
flow_controller_->add_new_sample(this, change, max_blocking_time);
}
else
{
check_acked_status();
}
Why this is safe
-
Idempotent delivery: Both deliver_sample_to_intraprocesses() and deliver_sample_to_datasharing() check change_is_unsent() before delivering (lines 643 and 678 in v2.6.x). If the flow controller later calls deliver_sample_nts(), these are safe no-ops.
-
SHM data is ready: prepare_datasharing_delivery() runs before this code, so the shared memory payload is already committed.
-
Thread safety: The entire block runs under mp_mutex (acquired at the top of unsent_change_added_to_history()).
-
Mixed topologies: When remote readers exist, the flow controller still runs. The pre-delivered intraprocess/datasharing samples are skipped via the change_is_unsent() guard in deliver_sample_nts(), and deliver_sample_to_network() proceeds normally. Intra-host latency improves without affecting cross-host behavior.
-
check_acked_status(): Called in the else-branch to match what deliver_sample_nts() normally does (needed for best-effort readers that are immediately acked and for the on_writer_change_received_by_all() callback).
Benchmark results
8MB sensor_msgs::Image at 10Hz, data-sharing enabled, SHM-only transport, intra-host (single machine):
| Metric |
Before (flow controller) |
After (bypass) |
Improvement |
| Publish call (p50) |
~1,345 us |
~412 us |
3.3x |
| E2E latency (p50) |
~2,282 us |
~961 us |
2.4x |
Measured on aarch64 Linux. The improvement comes entirely from skipping the RTPSMessageGroup construction and flow controller overhead when there is no network delivery to perform.
Additional context
This also benefits the mixed (intra-host + cross-host) case: intraprocess and datasharing readers receive data immediately instead of waiting for the flow controller to schedule delivery. The flow controller only handles the network path.
Is there an existing issue for this?
Platform/Architecture
Linux aarch64 (also reproducible on x86_64)
Fast DDS version
v2.6.x (Humble)
Transport layer
SHM (data-sharing)
Description
When all matched readers for a
StatefulWriteruse data-sharing (SHM), the writer still routes every sample through the flow controller and constructs anRTPSMessageGroup— even though there is no network delivery to perform. This adds significant latency that could be avoided.The delivery path today:
The
flow_controller_->add_new_sample()path involves:LocatorSelectorSenderlockRTPSMessageGroup(buffer allocation, locator selection)deliver_sample_nts()which then branches on reader typeWhen there are no remote (network) readers, the
RTPSMessageGroupconstruction and locator selection are pure overhead —deliver_sample_to_network()would be a no-op anyway.Proposed optimization
Call
deliver_sample_to_intraprocesses()anddeliver_sample_to_datasharing()directly (they're already standalone methods), then only enter the flow controller if there are network readers:Why this is safe
Idempotent delivery: Both
deliver_sample_to_intraprocesses()anddeliver_sample_to_datasharing()checkchange_is_unsent()before delivering (lines 643 and 678 in v2.6.x). If the flow controller later callsdeliver_sample_nts(), these are safe no-ops.SHM data is ready:
prepare_datasharing_delivery()runs before this code, so the shared memory payload is already committed.Thread safety: The entire block runs under
mp_mutex(acquired at the top ofunsent_change_added_to_history()).Mixed topologies: When remote readers exist, the flow controller still runs. The pre-delivered intraprocess/datasharing samples are skipped via the
change_is_unsent()guard indeliver_sample_nts(), anddeliver_sample_to_network()proceeds normally. Intra-host latency improves without affecting cross-host behavior.check_acked_status(): Called in the else-branch to match whatdeliver_sample_nts()normally does (needed for best-effort readers that are immediately acked and for theon_writer_change_received_by_all()callback).Benchmark results
8MB
sensor_msgs::Imageat 10Hz, data-sharing enabled, SHM-only transport, intra-host (single machine):Measured on aarch64 Linux. The improvement comes entirely from skipping the
RTPSMessageGroupconstruction and flow controller overhead when there is no network delivery to perform.Additional context
This also benefits the mixed (intra-host + cross-host) case: intraprocess and datasharing readers receive data immediately instead of waiting for the flow controller to schedule delivery. The flow controller only handles the network path.