Skip flow controller for data-sharing-only writers (performance optimization)

### Is there an existing issue for this?

- [X] I searched the existing issues

### Platform/Architecture

Linux aarch64 (also reproducible on x86_64)

### Fast DDS version

v2.6.x (Humble)

### Transport layer

SHM (data-sharing)

### Description

When all matched readers for a `StatefulWriter` use data-sharing (SHM), the writer still routes every sample through the flow controller and constructs an `RTPSMessageGroup` — even though there is no network delivery to perform. This adds significant latency that could be avoided.

The delivery path today:

```
unsent_change_added_to_history()
  → prepare_datasharing_delivery()          // writes data to SHM ✓
  → flow_controller_->add_new_sample()      // queues for delivery
    → deliver_sample_nts()
      → deliver_sample_to_intraprocesses()  // if local readers
      → deliver_sample_to_datasharing()     // notifies SHM readers
      → deliver_sample_to_network()         // no-op (no remote readers)
      → check_acked_status()
```

The `flow_controller_->add_new_sample()` path involves:
- Acquiring the `LocatorSelectorSender` lock
- Constructing an `RTPSMessageGroup` (buffer allocation, locator selection)
- Calling `deliver_sample_nts()` which then branches on reader type

When there are no remote (network) readers, the `RTPSMessageGroup` construction and locator selection are pure overhead — `deliver_sample_to_network()` would be a no-op anyway.

### Proposed optimization

Call `deliver_sample_to_intraprocesses()` and `deliver_sample_to_datasharing()` directly (they're already standalone methods), then only enter the flow controller if there are network readers:

```cpp
// In unsent_change_added_to_history(), inside the `if (should_be_sent)` block:

if (there_are_local_readers_)
{
    deliver_sample_to_intraprocesses(change);
}
if (there_are_datasharing_readers_)
{
    deliver_sample_to_datasharing(change);
}

if (there_are_remote_readers_)
{
    // Flow controller handles network delivery; intraprocess/datasharing
    // are no-ops inside deliver_sample_nts() due to change_is_unsent() guard.
    flow_controller_->add_new_sample(this, change, max_blocking_time);
}
else
{
    check_acked_status();
}
```

### Why this is safe

1. **Idempotent delivery**: Both `deliver_sample_to_intraprocesses()` and `deliver_sample_to_datasharing()` check `change_is_unsent()` before delivering (lines 643 and 678 in v2.6.x). If the flow controller later calls `deliver_sample_nts()`, these are safe no-ops.

2. **SHM data is ready**: `prepare_datasharing_delivery()` runs before this code, so the shared memory payload is already committed.

3. **Thread safety**: The entire block runs under `mp_mutex` (acquired at the top of `unsent_change_added_to_history()`).

4. **Mixed topologies**: When remote readers exist, the flow controller still runs. The pre-delivered intraprocess/datasharing samples are skipped via the `change_is_unsent()` guard in `deliver_sample_nts()`, and `deliver_sample_to_network()` proceeds normally. Intra-host latency improves without affecting cross-host behavior.

5. **`check_acked_status()`**: Called in the else-branch to match what `deliver_sample_nts()` normally does (needed for best-effort readers that are immediately acked and for the `on_writer_change_received_by_all()` callback).

### Benchmark results

8MB `sensor_msgs::Image` at 10Hz, data-sharing enabled, SHM-only transport, intra-host (single machine):

| Metric | Before (flow controller) | After (bypass) | Improvement |
|--------|---:|---:|---|
| Publish call (p50) | ~1,345 us | ~412 us | **3.3x** |
| E2E latency (p50) | ~2,282 us | ~961 us | **2.4x** |

Measured on aarch64 Linux. The improvement comes entirely from skipping the `RTPSMessageGroup` construction and flow controller overhead when there is no network delivery to perform.

### Additional context

This also benefits the mixed (intra-host + cross-host) case: intraprocess and datasharing readers receive data immediately instead of waiting for the flow controller to schedule delivery. The flow controller only handles the network path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip flow controller for data-sharing-only writers (performance optimization) #6352

Is there an existing issue for this?

Platform/Architecture

Fast DDS version

Transport layer

Description

Proposed optimization

Why this is safe

Benchmark results

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Before (flow controller)	After (bypass)	Improvement
Publish call (p50)	~1,345 us	~412 us	3.3x
E2E latency (p50)	~2,282 us	~961 us	2.4x

Skip flow controller for data-sharing-only writers (performance optimization) #6352

Description

Is there an existing issue for this?

Platform/Architecture

Fast DDS version

Transport layer

Description

Proposed optimization

Why this is safe

Benchmark results

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions