You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tier-routing-log.jsonl entries are written with inputTokens / outputTokensalways null, so the dream's tier-routing review has no per-entry cost data. This silently disables the cost-aware machinery added in #451 (the High-tier cost floor) and the lowOutputAtHigh flagged-cluster rule — both depend on per-entry tokens that are never present.
Evidence
On the live cluster, 0 of 261 entries in /data/agent/tier-routing-log.jsonl have inputTokens/outputTokens populated. Subagent entries do carry latencyMs (e.g. 485678) but leave tokens null, which is what tipped this off.
Because the tokens are null, TierRoutingAnalyzer.BuildScan skips every entry in its cost-delta loop (if (e.InputTokens is not long it || e.OutputTokens is not long ot) continue;), so projectedCostDelta is null on every threshold scan — even after #451 correctly wired tierModelMap through. The ceiling is currently climbing back off its floor purely on the LLM's quality judgment (the directive's "≥5 flips OR quality reason" path), not on the cost signal #451 was built to provide. So the cost floor has never actually executed against a real number in production.
Root cause — three write sites, none persist usable tokens
src/RockBot.Subagent/SubagentRunner.cs:311 — sets LatencyMs but omitsInputTokens/OutputTokens entirely. This is the dominant source of High-tier routing decisions, so it's the most important to fix.
src/RockBot.Agent/UserMessageHandler.cs:248 — attemptsInputTokens = firstResponse.Usage?.InputTokenCount / OutputTokens = firstResponse.Usage?.OutputTokenCount, but these land null, i.e. firstResponse.Usage isn't populated on the response object the handler holds at that point.
Suggested direction
AgentLoopRunner already aggregates per-iteration usage internally (src/RockBot.Host/AgentLoopRunner.cs:893,985 sum response.Usage?.InputTokenCount across the loop). The cleanest fix is to surface that aggregate on the loop result and have all three write sites persist it, rather than relying on a single firstResponse.Usage that may be empty. Threading the aggregated UsageDetails through is also what lets ModelId-based pricing joins work end-to-end.
This matters most for the future case where High stops sharing Balanced's model and becomes a genuinely premium tier: the dream would still be blind to the cost difference and could over-route to High again, with no cost signal to correct it.
The lowOutputAtHigh detection rule (over-routing signal) also can't fire without output tokens.
Acceptance criteria
New tier-routing-log.jsonl entries from subagent and user-message paths carry non-null inputTokens/outputTokens when the provider returns usage.
TierRoutingAnalyzer threshold scans produce a non-null projectedCostDelta once tokenized entries exist.
Verify on the live cluster that the dream's routing notes begin citing cost deltas (not just flip counts).
Related: #451 (cost floor), and the routing-tuning observability gap.
Summary
tier-routing-log.jsonlentries are written withinputTokens/outputTokensalways null, so the dream's tier-routing review has no per-entry cost data. This silently disables the cost-aware machinery added in #451 (the High-tier cost floor) and thelowOutputAtHighflagged-cluster rule — both depend on per-entry tokens that are never present.Evidence
On the live cluster, 0 of 261 entries in
/data/agent/tier-routing-log.jsonlhaveinputTokens/outputTokenspopulated. Subagent entries do carrylatencyMs(e.g.485678) but leave tokens null, which is what tipped this off.Because the tokens are null,
TierRoutingAnalyzer.BuildScanskips every entry in its cost-delta loop (if (e.InputTokens is not long it || e.OutputTokens is not long ot) continue;), soprojectedCostDeltaisnullon every threshold scan — even after #451 correctly wiredtierModelMapthrough. The ceiling is currently climbing back off its floor purely on the LLM's quality judgment (the directive's "≥5 flips OR quality reason" path), not on the cost signal #451 was built to provide. So the cost floor has never actually executed against a real number in production.Root cause — three write sites, none persist usable tokens
src/RockBot.Subagent/SubagentRunner.cs:311— setsLatencyMsbut omitsInputTokens/OutputTokensentirely. This is the dominant source of High-tier routing decisions, so it's the most important to fix.src/RockBot.Agent/UserMessageHandler.cs:461— omitsInputTokens/OutputTokens(andLatencyMs) entirely.src/RockBot.Agent/UserMessageHandler.cs:248— attemptsInputTokens = firstResponse.Usage?.InputTokenCount/OutputTokens = firstResponse.Usage?.OutputTokenCount, but these land null, i.e.firstResponse.Usageisn't populated on the response object the handler holds at that point.Suggested direction
AgentLoopRunneralready aggregates per-iteration usage internally (src/RockBot.Host/AgentLoopRunner.cs:893,985sumresponse.Usage?.InputTokenCountacross the loop). The cleanest fix is to surface that aggregate on the loop result and have all three write sites persist it, rather than relying on a singlefirstResponse.Usagethat may be empty. Threading the aggregatedUsageDetailsthrough is also what letsModelId-based pricing joins work end-to-end.Impact / why it matters
tierModelMapwiring are inert until this lands — the dream can't weigh cost against quality, which was the original driver of the balancedCeiling ratchet-to-floor.lowOutputAtHighdetection rule (over-routing signal) also can't fire without output tokens.Acceptance criteria
tier-routing-log.jsonlentries from subagent and user-message paths carry non-nullinputTokens/outputTokenswhen the provider returns usage.TierRoutingAnalyzerthreshold scans produce a non-nullprojectedCostDeltaonce tokenized entries exist.Related: #451 (cost floor), and the routing-tuning observability gap.