Skip to content

Routing log never captures input/output tokens — dream cost-aware tuning is blind #452

@rockfordlhotka

Description

@rockfordlhotka

Summary

tier-routing-log.jsonl entries are written with inputTokens / outputTokens always null, so the dream's tier-routing review has no per-entry cost data. This silently disables the cost-aware machinery added in #451 (the High-tier cost floor) and the lowOutputAtHigh flagged-cluster rule — both depend on per-entry tokens that are never present.

Evidence

On the live cluster, 0 of 261 entries in /data/agent/tier-routing-log.jsonl have inputTokens/outputTokens populated. Subagent entries do carry latencyMs (e.g. 485678) but leave tokens null, which is what tipped this off.

Because the tokens are null, TierRoutingAnalyzer.BuildScan skips every entry in its cost-delta loop (if (e.InputTokens is not long it || e.OutputTokens is not long ot) continue;), so projectedCostDelta is null on every threshold scan — even after #451 correctly wired tierModelMap through. The ceiling is currently climbing back off its floor purely on the LLM's quality judgment (the directive's "≥5 flips OR quality reason" path), not on the cost signal #451 was built to provide. So the cost floor has never actually executed against a real number in production.

Root cause — three write sites, none persist usable tokens

  • src/RockBot.Subagent/SubagentRunner.cs:311 — sets LatencyMs but omits InputTokens/OutputTokens entirely. This is the dominant source of High-tier routing decisions, so it's the most important to fix.
  • src/RockBot.Agent/UserMessageHandler.cs:461 — omits InputTokens/OutputTokens (and LatencyMs) entirely.
  • src/RockBot.Agent/UserMessageHandler.cs:248attempts InputTokens = firstResponse.Usage?.InputTokenCount / OutputTokens = firstResponse.Usage?.OutputTokenCount, but these land null, i.e. firstResponse.Usage isn't populated on the response object the handler holds at that point.

Suggested direction

AgentLoopRunner already aggregates per-iteration usage internally (src/RockBot.Host/AgentLoopRunner.cs:893,985 sum response.Usage?.InputTokenCount across the loop). The cleanest fix is to surface that aggregate on the loop result and have all three write sites persist it, rather than relying on a single firstResponse.Usage that may be empty. Threading the aggregated UsageDetails through is also what lets ModelId-based pricing joins work end-to-end.

Impact / why it matters

  • Add routing cost floor to stop the dream ratcheting balancedCeiling to its floor #451's High-tier cost floor and the tierModelMap wiring are inert until this lands — the dream can't weigh cost against quality, which was the original driver of the balancedCeiling ratchet-to-floor.
  • This matters most for the future case where High stops sharing Balanced's model and becomes a genuinely premium tier: the dream would still be blind to the cost difference and could over-route to High again, with no cost signal to correct it.
  • The lowOutputAtHigh detection rule (over-routing signal) also can't fire without output tokens.

Acceptance criteria

  • New tier-routing-log.jsonl entries from subagent and user-message paths carry non-null inputTokens/outputTokens when the provider returns usage.
  • TierRoutingAnalyzer threshold scans produce a non-null projectedCostDelta once tokenized entries exist.
  • Verify on the live cluster that the dream's routing notes begin citing cost deltas (not just flip counts).

Related: #451 (cost floor), and the routing-tuning observability gap.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions