persistenceMaxQPS=0 breaks queue reader host-level rate limiter, causing unthrottled error log spam

I'm trying to squeeze as much performance as I can from my poor laptop running Omes against Temporal with ScyllaDB / Cassandra, and I'm unsure where the bottleneck is. Among other things, noticed this issue (well, AI and I noticed it). Here's the AI description, which I think is reasonable:

Setting history.persistenceMaxQPS: 0 in dynamic config (intended to mean "unlimited") causes all queue reader host-level rate limiters to be created with rate=0, burst=0. This results in every loadAndSubmitTasks call failing its Wait() and logging an unthrottled error with full stacktrace.

Root cause:
NewHostRateLimiterRateFn in service/history/queue_factory_base.go:224-233 falls back to persistenceMaxRPS() * ratio when MaxPollHostRPS=0. If persistenceMaxQPS is also 0, the effective rate becomes 0 * 0.3 = 0, creating a rate limiter with burst=0. Go's rate.Limiter.ReserveN(now, 1) returns OK()=false when burst < tokens, triggering the error path at service/history/queues/reader.go:433.
Impact:
- 317K error log lines in a 5-minute run (99.2% of all server log output)
- Each log line includes a full JSON-serialized stacktrace
- Affects all queue processors: transfer (106K), timer (105K), archival (104K), visibility (22K), outbound (2K)
- All 128 shards affected (~2500 errors per shard)
- Significant CPU and I/O overhead from log serialization

Secondary issue: 
The error log at reader.go:433 has no rate limiting despite being in a hot loop. Even when triggered legitimately, it should use a throttled logger.

Suggested fixes:
1. In NewHostRateLimiterRateFn, handle persistenceMaxRPS() <= 0 by using the default value (9000) or returning math.MaxFloat64
2. Add rate limiting to the error log at reader.go:433
Reproduction: Set history.persistenceMaxQPS: 0 in dynamic config, start server with 128 shards, observe log output.


I can of course work on fixing this / these.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persistenceMaxQPS=0 breaks queue reader host-level rate limiter, causing unthrottled error log spam #9599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

persistenceMaxQPS=0 breaks queue reader host-level rate limiter, causing unthrottled error log spam #9599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions