Skip to content

agents: fix infinite recursion on ~CommandStream#412

Open
santigimeno wants to merge 1 commit intonode-v24.x-nsolid-v6.xfrom
santi/fix_grpc_race_cond
Open

agents: fix infinite recursion on ~CommandStream#412
santigimeno wants to merge 1 commit intonode-v24.x-nsolid-v6.xfrom
santi/fix_grpc_race_cond

Conversation

@santigimeno
Copy link
Member

@santigimeno santigimeno commented Jan 15, 2026

When trying to reset a CommandStream because of reasons: for example, we change grpc endpoint dynamically, we can enter an infinite recursion calling GrpcAgent::reset_command_stream() because when calling ~CommandStream() the OnDone() callback may call the observer (GrpcAgent instance) which in turn would call
GrpcAgent::reset_command_stream(). Avoid this recursion by adding a new cancelling_for_destruction_ member variable to CommandStream which allows avoiding calling the observer if triggered by the destructor.

Summary by CodeRabbit

  • Bug Fixes

    • Suppress observer notifications during object destruction and ensure cancellation completes cleanly to improve shutdown reliability.
  • Tests

    • Added a test covering reconnection and configuration recovery after an invalid initial setup, verifying command delivery and a clean client shutdown.

@santigimeno santigimeno requested a review from RafaelGSS January 15, 2026 15:47
@santigimeno santigimeno self-assigned this Jan 15, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds a destruction-aware flag to CommandStream to suppress observer notifications during teardown and updates cancellation ordering; also adds a test validating agent reconnection after an initial invalid NSOLID_GRPC and subsequent CommandStream behavior.

Changes

Cohort / File(s) Summary
CommandStream Destruction Handling
agents/grpc/src/command_stream.h, agents/grpc/src/command_stream.cc
Added private cancelling_for_destruction_ flag (default false). Destructor sets the flag before invoking cancellation. OnDone now skips observer notification when the flag is set, preventing callbacks during destruction.
Agent Reconnection Test
test/agents/test-grpc-basic.mjs
New test "should reconnect after initial invalid NSOLID_GRPC" that starts with an invalid env, applies valid TLS/insecure settings, constructs a client, obtains agentId, builds dynamic config, verifies server→client CommandStream command handling, and asserts clean client shutdown and server close.

Sequence Diagram(s)

(omitted — changes are localized and do not introduce a multi-component sequential flow warranting a diagram)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hop through shutdowns, flag in paw, 🐇
Quiet the callers, avoid the maw.
Reconnect hums after a shaky start,
Commands settle softly, play their part.
A rabbit twitches—everything restarts.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the specific issue being fixed (infinite recursion in CommandStream destructor) and is directly aligned with the main change in the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch santi/fix_grpc_race_cond

Comment @coderabbitai help to get the list of available commands and usage tips.

@santigimeno santigimeno force-pushed the santi/fix_grpc_race_cond branch from 1820255 to c4554c9 Compare January 16, 2026 11:11
@santigimeno santigimeno requested a review from EHortua January 16, 2026 11:23
Copy link

@EHortua EHortua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working fine
Without the patch
image

With the patch
image
image

When trying to reset a CommandStream because of reasons: for example, we
change `grpc` endpoint dynamically, we can enter an infinite recursion
calling `GrpcAgent::reset_command_stream()` because when calling
`~CommandStream()` the `OnDone()` callback may call the observer
(GrpcAgent instance) which in turn would call
`GrpcAgent::reset_command_stream()`. Avoid this recursion by adding a
new `cancelling_for_destruction_` member variable to `CommandStream`
which allows avoiding calling the observer if triggered by the
destructor.
@santigimeno santigimeno force-pushed the santi/fix_grpc_race_cond branch from c4554c9 to fa2433d Compare February 5, 2026 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants