Background
PR #4245 fixed the production telemetry bug where hop_count was None on ~99% of terminal GET events. Root cause was op_manager.get_current_hop(id) being called at log time, which returns None once the operation has been cleaned up.
The fix threaded hop_count through the wire (GetMsg::Response) so the originator has the populated value when constructing GET log events.
The remaining gap
PutSuccess, UpdateSuccess, SubscribeSuccess, and SubscribeNotFound events still carry hop_count: Option<usize> and ALL go through the same broken op_manager.get_current_hop() stub. They have the same None-on-99%-of-events bug.
Specifically:
crates/core/src/tracing.rs:1271 — the PUT call site still uses the stub.
crates/core/src/tracing.rs Subscribe paths — hop_count: None, // TODO: Track hop count from operation state (two sites).
- The
op_state_manager::get_current_hop stub at crates/core/src/node/op_state_manager.rs:615 always returns None.
What needs doing
Same approach as #4245: add a positional hop_count field to:
PutMsg::Response
UpdateMsg::Response
SubscribeMsg::Response
Thread the field through storer / relay / exhaustion paths, bump min-compatible-version for each, update tracing-side extraction with the same min(max_htl) clamp pattern, and add roundtrip + classifier-preservation unit tests.
Why not done in #4245
GET-only scope: GET is the operation the dashboard breakage was most visible for. Three additional wire-format-change PRs would have tripled review surface area.
[AI-assisted - Claude]
Background
PR #4245 fixed the production telemetry bug where
hop_countwasNoneon ~99% of terminal GET events. Root cause wasop_manager.get_current_hop(id)being called at log time, which returnsNoneonce the operation has been cleaned up.The fix threaded
hop_countthrough the wire (GetMsg::Response) so the originator has the populated value when constructing GET log events.The remaining gap
PutSuccess,UpdateSuccess,SubscribeSuccess, andSubscribeNotFoundevents still carryhop_count: Option<usize>and ALL go through the same brokenop_manager.get_current_hop()stub. They have the sameNone-on-99%-of-events bug.Specifically:
crates/core/src/tracing.rs:1271— the PUT call site still uses the stub.crates/core/src/tracing.rsSubscribe paths —hop_count: None, // TODO: Track hop count from operation state(two sites).op_state_manager::get_current_hopstub atcrates/core/src/node/op_state_manager.rs:615always returnsNone.What needs doing
Same approach as #4245: add a positional
hop_countfield to:PutMsg::ResponseUpdateMsg::ResponseSubscribeMsg::ResponseThread the field through storer / relay / exhaustion paths, bump
min-compatible-versionfor each, update tracing-side extraction with the samemin(max_htl)clamp pattern, and add roundtrip + classifier-preservation unit tests.Why not done in #4245
GET-only scope: GET is the operation the dashboard breakage was most visible for. Three additional wire-format-change PRs would have tripled review surface area.
[AI-assisted - Claude]