Skip to content

stats: remove storm of refcount bumping during metrics snapshot.#43958

Draft
jmarantz wants to merge 10 commits intoenvoyproxy:mainfrom
jmarantz:refcount-storm-on-metrics-snapshot
Draft

stats: remove storm of refcount bumping during metrics snapshot.#43958
jmarantz wants to merge 10 commits intoenvoyproxy:mainfrom
jmarantz:refcount-storm-on-metrics-snapshot

Conversation

@jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Mar 14, 2026

Commit Message:

Changes Envoy’s stats snapshotting/lifetime strategy to avoid per-metric refcount churn during a metrics flush by retaining scope references instead of taking a refcounted handle to every individual stat.

Changes:

Update MetricSnapshotImpl to keep metrics alive by capturing ConstScopeSharedPtrs for all scopes, rather than incrementing refcounts on every stat during snapshot creation.
Restrict Allocator/AllocatorImpl stat-construction APIs (makeCounter/makeGauge/makeTextReadout) to protected, pushing call sites toward Scope APIs and enabling the snapshot optimization assumption (“stats are scope-owned”).
Update related tests/benchmarks to use scope-based stat construction and adjust benchmark ranges.

This change makes the existing bmFlushToSinks benchmark in test/server/server_stats_flush_benchmark_test.cc run 5x faster, especially visible when extended to 10M stats, which is done in this PR, highlighting the problematic metrics flush duration >5s when the interval is 5s. I will note also that the problem with the excessive ref-count churn is likely much worse on ARM though I did not measure that.

Before:

Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
bmFlushToSinks/10                             0.002 ms        0.002 ms       299435
bmFlushToSinks/100                            0.011 ms        0.011 ms        61284
bmFlushToSinks/1000                           0.109 ms        0.109 ms         6353
bmFlushToSinks/10000                           1.12 ms         1.12 ms          616
bmFlushToSinks/100000                          27.0 ms         27.0 ms           23
bmFlushToSinks/1000000                          394 ms          394 ms            2
bmFlushToSinks/10000000                        4841 ms         4840 ms            1

After:

Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
bmFlushToSinks/10                             0.002 ms        0.002 ms       436476
bmFlushToSinks/100                            0.004 ms        0.004 ms       202299
bmFlushToSinks/1000                           0.030 ms        0.030 ms        23043
bmFlushToSinks/10000                          0.376 ms        0.375 ms         1879
bmFlushToSinks/100000                          6.73 ms         6.73 ms          103
bmFlushToSinks/1000000                          100 ms          100 ms            7
bmFlushToSinks/10000000                        1065 ms         1065 ms            1

Additional Description:
Risk Level: medium -- changing mechanism of keeping stats alive during metrics sink
Testing: //test/... but, man, Envoy test has gotten super flaky since I last did dev. Things eventually pass when I re-run 5 or so integration tests one at a time, on a clean client. IDK what's up with CI.
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
Fixes #43836

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #43958 was opened by jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

/retest

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes Envoy’s stats snapshotting/lifetime strategy to avoid per-metric refcount churn during a metrics flush, primarily by retaining scope references instead of taking a refcounted handle to every individual stat.

Changes:

  • Update MetricSnapshotImpl to keep metrics alive by capturing ConstScopeSharedPtrs for all scopes, rather than incrementing refcounts on every stat during snapshot creation.
  • Restrict Allocator/AllocatorImpl stat-construction APIs (makeCounter/makeGauge/makeTextReadout) to protected, pushing call sites toward Scope APIs and enabling the snapshot optimization assumption (“stats are scope-owned”).
  • Update related tests/benchmarks to use scope-based stat construction and adjust benchmark ranges.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/server/server_stats_flush_benchmark_test.cc Expands benchmark ranges for sink flush performance testing.
test/server/admin/prometheus_stats_test.cc Migrates test metric creation to ThreadLocalStoreImpl/Scope APIs.
test/common/stats/thread_local_store_test.cc Avoids repeated allocator-based counter creation for use_count() assertions by reusing a store-owned counter.
test/common/stats/metric_impl_test.cc Adds fixture wrappers for allocator stat creation (now protected) and updates call sites.
test/common/stats/allocator_impl_test.cc Adds fixture wrappers for allocator stat creation (now protected) and updates call sites.
source/server/server.h Replaces per-stat “snapped_*” holders with scopes_ to retain metric lifetimes via scopes.
source/server/server.cc Implements new snapshot behavior: capture scopes first, then collect sinked metric references.
source/common/stats/allocator_impl.h Moves allocator construction methods to protected and adds friend access for tests/isolated store.
envoy/stats/allocator.h Makes allocator construction methods protected and documents rationale (scope ownership + snapshot performance).

Comment on lines 63 to 65
void addCounter(const std::string& name, Stats::StatNameTagVector cluster_tags) {
Stats::StatNameManagedStorage name_storage(baseName(name, cluster_tags), *symbol_table_);
Stats::StatNameManagedStorage tag_extracted_name_storage(name, *symbol_table_);
counters_.push_back(alloc_.makeCounter(name_storage.statName(),
tag_extracted_name_storage.statName(), cluster_tags));
counters_.push_back(&scope_->counterFromStatNameWithTags(pool_.add(name), cluster_tags));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is wrong, here and below; this compiles fine.

Comment on lines 67 to 71
void addGauge(const std::string& name, Stats::StatNameTagVector cluster_tags,
Stats::Gauge::ImportMode import_mode = Stats::Gauge::ImportMode::Accumulate) {
Stats::StatNameManagedStorage name_storage(baseName(name, cluster_tags), *symbol_table_);
Stats::StatNameManagedStorage tag_extracted_name_storage(name, *symbol_table_);
gauges_.push_back(alloc_.makeGauge(
name_storage.statName(), tag_extracted_name_storage.statName(), cluster_tags, import_mode));
gauges_.push_back(
&scope_->gaugeFromStatNameWithTags(pool_.add(name), cluster_tags, import_mode));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copilot is incorrect; this compiles fine.

Comment on lines +75 to +76
Stats::TextReadoutSharedPtr textReadout =
&scope_->textReadoutFromStatNameWithTags(pool_.add(name), cluster_tags);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copilot is incorrect; this compiles fine.

@jmarantz jmarantz marked this pull request as ready for review March 14, 2026 14:50
@jmarantz jmarantz changed the title WIP stats: remove storm of refcount bumping during metrics snapshot. stats: remove storm of refcount bumping during metrics snapshot. Mar 14, 2026
@jmarantz
Copy link
Contributor Author

jmarantz commented Mar 14, 2026

@pradeepcrao @ggreenway I want to get feedback on this as a whole, but I think it would be reasonble to ask that I make one refactor before this merges.

I tried to ensure we don't directly generate stats from the Stats::Allocator by making its makeCounter (etc) methods protected. I couldn't make them private because of some integration test infrastructure around notifying counters.

However I think what I should do is:

  • merge Stats::AllocatorImpl and Stats::Allocator into one class, and make the methods non-virtual.
  • change the NotifyingCounter mechnaism in the integration tests to make a delegate for Scope instead of at the Allocator level.

I think what I have here is safe today but someone could work around the restricted access to makeCounter by subclassing; it would be better for that to be private.

But I thought I'd get high level comments on this PR first. I think this may go a long way to improving Envoy performance on ARM, at least with large numbers of stats.

@agrawroh
Copy link
Member

Amazing! <3

/assign @ggreenway

@jmarantz
Copy link
Contributor Author

I think #43968 should merge first and then this PR will be more obviously correct -- will make makeCounter et al private instead of protected.

@jmarantz
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance optimization by changing how metric snapshots are handled to avoid reference counting churn. The core idea is to hold references to scopes instead of individual stats. This is enforced by making the stat creation methods on the Allocator protected, forcing usage of Scope APIs. The changes are well-implemented, and the necessary updates to tests and benchmarks are included. The benchmark results clearly demonstrate the performance gain. I have kept the minor suggestion to improve a comment that has become misleading due to the refactoring, as it does not contradict any existing rules.

// less than a second, and in fastbuild it takes less than 5.
CounterSharedPtr counter =
alloc_.makeCounter(my_counter_scoped_name_, StatName(), StatNameTagVector{});
uint32_t use_count = counter->use_count() - 1; // Subtract off this instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment // Subtract off this instance. is a leftover from the previous implementation where the test held a CounterSharedPtr instance. Now that counter is a raw pointer, this comment is misleading. The -1 is to subtract the reference held by the central allocator, so that use_count reflects only the references from the TLS caches. A more accurate comment would be // Subtract off the central allocator's reference..

Suggested change
uint32_t use_count = counter->use_count() - 1; // Subtract off this instance.
uint32_t use_count = counter->use_count() - 1; // Subtract off the central allocator's reference.

@ggreenway
Copy link
Member

I haven't looked at any code yet, but one thing immediately came to mind: in the stats access logger (it's new), for a matched add/subtract pair (for something like an active/concurrency stat, it holds a reference to a gauge separate from any scope, so holding a count on all the scopes doesn't necessarily hold all the metrics in the store (although in nearly all cases it would). I'm not sure the best way to handle that. It's currently being worked on in #43812.

@jmarantz
Copy link
Contributor Author

How would a metric be added to the store without being held by a scope? Would it call Allocator::makeCounter directly (as this PR attempts to prevent by putting makeCounter in protected section (soon to be private section when #43968 lands)?

@jmarantz
Copy link
Contributor Author

I haven't read about stats access loggers yet but can't they own a scope for the stats they create/manage?

@kyessenov
Copy link
Contributor

Looks nice, thanks for re-working the snapshotting. The stats logger has its own long-living scope, so I think we're fine there. We can now also defer-delete scopes and extend their lifetime to the next flush, which solves the long-standing problem of stats missing when flush period is > add/delete of a scope.

@ggreenway
Copy link
Member

How would a metric be added to the store without being held by a scope? Would it call Allocator::makeCounter directly (as this PR attempts to prevent by putting makeCounter in protected section (soon to be private section when #43968 lands)?

It's initially created via a scope, but the scope has eviction enabled, and during an operation the logger holds an additional reference/sharedptr to the metric, so it could be removed from the only scope that holds it, but there could still be a reference/sharedptr.

@jmarantz
Copy link
Contributor Author

I think I need to catch up on what scope eviction is. I hadn't thought about the possiblity someone holding onto a reference to a stat, longer than its scope lives.

If that's the case in the code today for stats access loggers, then I think this PR is not safe :(

Do we need stats access logger's stats to be sinked?

@kyessenov
Copy link
Contributor

I think I need to catch up on what scope eviction is. I hadn't thought about the possiblity someone holding onto a reference to a stat, longer than its scope lives.

This is possible without eviction. I think it's probably not happening in the tree, all stats are allocated off-a-scope which should outlast, but I don't know how to verify it.

If that's the case in the code today for stats access loggers, then I think this PR is not safe :(

Eviction actually requires not holding a reference to a stat. Gauge is a special case, and they are tricky.

ggreenway pushed a commit that referenced this pull request Mar 17, 2026
… implementation from Stats::AllocatorImpl to Stats::Allocator (#43968)

A long time ago, all Envoy stats were stored in a large shared memory
block, significantly limiting how many you could get, and the lengths of
their names. This facilitated keeping stat continuity across
parent/child hot-restart. The fixed shared-mem block size was very
limiting and a bit inefficient, so an alternative mechanism of keeping
stats on the heap was added for when hot-restart was not needed. To
switch between them, we used the Allocator interface, and kept two
implementations of that.

Then we removed the shared-memory version in favor of using message
passing to update counters from parent to child during hot restart.

Now it's time to remove that interface; it's making it a bit harder to
reason about guaranteeing that stats users don't call the allocator
directly; it's really an implementation detail of the stats system. So
this is a pure refactor (no functional change) which can help that, and
make it easier to reason about the safety of #43958 .

As removing headers and class names can be a breaking change, temporary
forwarding headers are left behind, temporarily; out-of-repo referencess
should be changed by June.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

I'm having some trouble understandinge the life of a stats-access-logger stat and how it interacts with scopes. Also, why are gauges different?

Is there some doc or design or detailed PR description, describing the stats-access-logger feature and how it works?

@kyessenov
Copy link
Contributor

I'm having some trouble understandinge the life of a stats-access-logger stat and how it interacts with scopes. Also, why are gauges different?

Is there some doc or design or detailed PR description, describing the stats-access-logger feature and how it works?

Gauges are different because we actually want UpDownCounter (which is just like two counters that you take a diff off). So we're effectively overlaying semantics of UpDownCounter over Gauge implementation and there's some friction.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

I'm not really following how the semantics of how Gauge values are adjusted and interpreted impact its lifetime.

I think the centerpiece of this PR is that I was trying to lighten the cost of the lifetime management of indidual stats by holding onto the scopes and assuming that would be sufficient. This premise is defeated by:

{ // on main thread
  ScopePtr scope = createScope();
  CounterSharedPtr counter = &scope->counter(...); // bumps ref count
  scope.reset();  // counter still exists and is reachable from stats allocator iteration
}
{ // on worker thread
  counter.reset()  // frees the counter which could race with metrics snapshot.
}

I'm mulling over another refactor to try to limit the appearance of CounterSharedPtr et al outside the stats system itself. I think only by creating that boundary (if it's not too late) can I make the leap to saying "if you hold onto a scope, all the stats you found in the Allocator are safe".

Another option I am considering, which might be easier, would be avoid looping over the allocator during metrics collection; I should only loop over scopes and the stats held inside the scopes. I could use a flag bit in the stat to de-dedup. This would have the impact of excluding any stats that are not in scopes from the metrics snapshot. That might be OK.

@ggreenway
Copy link
Member

All the crazy stuff in the stats access logger is new (hasn't made it into a release yet) and isn't working correctly, and the solutions have all felt convoluted. If we can get a huge efficiency gain for an operation that happens all the time, I'd be tempted to completely re-work how that's done in the stats access logger, or delete the feature from it.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@kyessenov
Copy link
Contributor

@ggreenway The problem that Josh highlighted is present regardless of the log stats extension. We need some way to couple scope with stat lifetimes throughout the code base, maybe by returning some struct that shares ScopeSharedPtr with raw pointer to the stat to the workers. I don't think it's safe to merge this change without doing that first, since Josh's example may very well lurk somewhere, and the general deletion order with xDS is highly complex.

@jmarantz
Copy link
Contributor Author

The discussion about the new access logger stats only helped highlight that this is too dependent on the patterns stats are typically used, but leaves time bombs for anyone straying outside that.

Summarizing my plan above, it will be either.

  1. Remove CounterSharedPtr et al from the codebase outside stats impl and tests.
  2. Iterate over stats inside scopes while holding a reference to the scopes. This will mean we sink only stats in scopes, which is probably ok.

@pradeepcrao
Copy link
Contributor

  1. Remove CounterSharedPtr et al from the codebase outside stats impl and tests.
  2. Iterate over stats inside scopes while holding a reference to the scopes. This will mean we sink only stats in scopes, which is probably ok.

This sounds like the right thing to do.

@ggreenway
Copy link
Member

  1. Remove CounterSharedPtr et al from the codebase outside stats impl and tests.

I agree this seems like the right plan. However, if this is the plan, now is the time to chime in on #43812

@jmarantz
Copy link
Contributor Author

I'm leaning toward option 1 (loop over scopes) and not try to hide CounterSharedPtr from the masses. It's pretty ingrained at this point. And it's pretty convenient some of the use-cases to have the stat kept alive after the return.

@pradeepcrao
Copy link
Contributor

I'm leaning toward option 1 (loop over scopes) and not try to hide CounterSharedPtr from the masses. It's pretty ingrained at this point. And it's pretty convenient some of the use-cases to have the stat kept alive after the return.

I think we will need both, not just option 1. Use scopes to ensure the stats are not deleted during the flush operation, but still iterating over stats in the allocator. We improved the flush speed by an order of magnitude by not iterating over scopes and deduping stats using a StatNameHashSet in #17909

@jmarantz
Copy link
Contributor Author

jmarantz commented Mar 18, 2026

Oh thanks for the benchmark data from earlier. It seems things got a lot slower between the :"with change" numbers on that PR and the "before" on this one, degrading from 100ms for num_stats=1M to 304ms measured this week, that 3x performance loss might have happened in #18144 a few weeks after #17909 was merged.

Let me get a read on how fast we can make the scopes based iteration; I think we can de-dup with a spare flag bit in the stat rather than a hash-set, which might help.

I think there's going to be a huge churn and probably multiple refactoring PRs if we try to remove CounterSharedPtr from the codebase generally.

@ggreenway
Copy link
Member

ggreenway commented Mar 18, 2026

A quick grep shows not too many hits, but the dynamic modules ABI might be hard to fix:

edit: nevermind, I don't think that'll be hard to fix. It's a use of stats_store.iterate() and that's in the signature of the required callback.

./server/admin/stats_handler.cc:54:  for (const Stats::CounterSharedPtr& counter : server_.stats().counters()) {
./server/admin/prometheus_stats.cc:883:    const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/prometheus_stats.cc:950:    const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/prometheus_stats.cc:963:    const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/prometheus_stats.cc:981:    const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/stats_request.cc:141:      renderStat<Stats::CounterSharedPtr>(iter->first, response, variant);
./server/admin/prometheus_stats.h:77:  static uint64_t statsAsPrometheus(const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/prometheus_stats.h:88:  statsAsPrometheusText(const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/prometheus_stats.h:97:  statsAsPrometheusProtobuf(const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/prometheus_stats.h:107:  generateWithOutputFormat(const std::vector<Stats::CounterSharedPtr>& counters,
./server/admin/stats_request.h:18:  using StatOrScopes = absl::variant<ScopeVec, Stats::TextReadoutSharedPtr, Stats::CounterSharedPtr,
./server/server.cc:180:        snapped_counters_.push_back(Stats::CounterSharedPtr(&counter));
./server/server.h:494:  std::vector<Stats::CounterSharedPtr> snapped_counters_;
./extensions/bootstrap/dynamic_modules/abi_impl.cc:99:      [&name_view, &found, value_ptr](const Envoy::Stats::CounterSharedPtr& counter) -> bool {
./common/stats/thread_local_store.cc:85:    removeRejectedStats<CounterSharedPtr>(central_cache->counters_,
./common/stats/thread_local_store.cc:86:                                          [this](const CounterSharedPtr& counter) mutable {
./common/stats/thread_local_store.cc:153:std::vector<CounterSharedPtr> ThreadLocalStoreImpl::counters() const {
./common/stats/thread_local_store.cc:155:  std::vector<CounterSharedPtr> ret;
./common/stats/thread_local_store.cc:157:                 [&ret](Counter& counter) { ret.emplace_back(CounterSharedPtr(&counter)); });
./common/stats/thread_local_store.cc:601:         const StatNameTagVector& tags) -> CounterSharedPtr {
./common/stats/thread_local_store.cc:1105:  StatNameHashMap<CounterSharedPtr> counters_;
./common/stats/isolated_store_impl.cc:28:                       StatNameTagVectorOptConstRef tags) -> CounterSharedPtr {
./common/stats/allocator_impl.h:27:  CounterSharedPtr makeCounter(StatName name, StatName tag_extracted_name,
./common/stats/allocator_impl.h:62:  void markCounterForDeletion(const CounterSharedPtr& counter) override;
./common/stats/allocator_impl.h:105:  // to exist to hold it as (e.g.) a CounterSharedPtr rather than a Counter&
./common/stats/allocator_impl.h:107:  std::vector<CounterSharedPtr> deleted_counters_ ABSL_GUARDED_BY(mutex_);
./common/stats/thread_local_store.h:184:  std::vector<CounterSharedPtr> counters() const override;
./common/stats/thread_local_store.h:282:    StatNameHashMap<CounterSharedPtr> counters_;
./common/stats/thread_local_store.h:602:  // to exist to hold it as (e.g.) a CounterSharedPtr rather than a Counter&
./common/stats/thread_local_store.h:619:  CounterSharedPtr counters_overflow_;
./common/stats/thread_local_store.h:620:  CounterSharedPtr gauges_overflow_;
./common/stats/thread_local_store.h:621:  CounterSharedPtr histograms_overflow_;
./common/stats/allocator_impl.cc:300:CounterSharedPtr AllocatorImpl::makeCounter(StatName name, StatName tag_extracted_name,
./common/stats/allocator_impl.cc:309:  auto counter = CounterSharedPtr(makeCounterInternal(name, tag_extracted_name, stat_name_tags));
./common/stats/allocator_impl.cc:470:void AllocatorImpl::markCounterForDeletion(const CounterSharedPtr& counter) {
./common/stats/isolated_store_impl.h:43:  std::vector<CounterSharedPtr> counters() const override { return counters_.toVector(); }

@pradeepcrao
Copy link
Contributor

I think we can de-dup with a spare flag bit in the stat rather than a hash-set, which might help.

Neat!

@kyessenov
Copy link
Contributor

You could also use "used" bit since it's cleared anyways during flush. So that could be a marker if you combine the loops.

@ggreenway
Copy link
Member

Is this ready for review, or is this waiting on further changes?

@jmarantz
Copy link
Contributor Author

No sorry I haven't checked in anything yet to address the race I described yesterday. Working on iot.

I can shift to draft-mode to disambiguate.

@jmarantz jmarantz marked this pull request as draft March 19, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stats: excessive ref-count inc/dec causing performance bottleneck

6 participants