Skip to content

Data race in TopicPayloadPool detected by ThreadSanitizer #6340

@PavelGuzenfeld

Description

@PavelGuzenfeld

Description

ThreadSanitizer detects data races in TopicPayloadPool during DDS intra-process communication. This affects ROS 2 applications using spin_some on multiple threads with the same DDS domain.

Reported from ROS 2: ros2/rclcpp#2941

TSan Report Summary

The race is between:

  • Write by publisher thread: fastcdr::Cdr::serialize_array → writing to payload buffer
  • Read by subscriber thread: fastcdr::Cdr::deserialize_array → reading from same payload buffer

Both access the same memory block allocated via TopicPayloadPool::do_get_payload.

Root Causes

1. Mutex released before reference increment in do_get_payload()

At TopicPayloadPool.cpp:83, the pool mutex is released before payload_node->reference() is called. A concurrent release_payload() could see ref_count==0 in this window and recycle the node.

2. PayloadNode::reference() uses memory_order_relaxed

The sharing path (get_payload(const SerializedPayload_t&)) increments the reference count with memory_order_relaxed, which provides no happens-before guarantee. The subscriber may not see the publisher's writes to the payload buffer.

Reproducer

From https://gist.github.com/muji-4ok/6b02f0de0a23a6a5cbf7ef07f4ead4fc — a ROS 2 program with one publisher and two subscribers on separate threads, built with -fsanitize=thread.

Proposed Fix

PR #6339 addresses both issues:

  1. Holds the lock through reference() and metadata reads in do_get_payload()
  2. Strengthens memory ordering from relaxed to acq_rel in PayloadNode::reference()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions