Skip to content

Conversation

@kgeckhart
Copy link
Contributor

@kgeckhart kgeckhart commented Nov 19, 2025

PR Description

Removes unnecessary usage of labelstore from the prometheus.interceptor used in prometheus pipelines. Beyond simplifying the code there's a performance gain from the reduction in duplicate staleness tracking through the labelstore,

goos: darwin
goarch: arm64
pkg: github.com/grafana/alloy/internal/component/prometheus
cpu: Apple M3 Pro
                                  │   baseline   │                new                 │
                                  │    sec/op    │   sec/op     vs base               │
Pipelines/default/2-metrics-11      5.638µ ±  7%   5.297µ ± 3%   -6.05% (p=0.009 n=6)
Pipelines/relabel/2-metrics-11      6.267µ ±  1%   5.819µ ± 2%   -7.14% (p=0.002 n=6)
Pipelines/default/10-metrics-11     27.78µ ±  6%   25.79µ ± 1%   -7.17% (p=0.002 n=6)
Pipelines/relabel/10-metrics-11     32.76µ ±  7%   29.71µ ± 2%   -9.29% (p=0.002 n=6)
Pipelines/default/100-metrics-11    309.6µ ± 10%   271.5µ ± 3%  -12.31% (p=0.002 n=6)
Pipelines/relabel/100-metrics-11    350.0µ ±  2%   308.3µ ± 5%  -11.92% (p=0.002 n=6)
Pipelines/default/1000-metrics-11   3.099m ±  8%   2.772m ± 3%  -10.57% (p=0.002 n=6)
Pipelines/relabel/1000-metrics-11   3.644m ±  4%   3.228m ± 3%  -11.44% (p=0.002 n=6)
geomean                             118.7µ         107.4µ        -9.52%

                                  │   baseline   │                 new                 │
                                  │     B/op     │     B/op      vs base               │
Pipelines/default/2-metrics-11      1.926Ki ± 0%   1.644Ki ± 0%  -14.66% (p=0.002 n=6)
Pipelines/relabel/2-metrics-11      2.505Ki ± 0%   2.082Ki ± 0%  -16.90% (p=0.002 n=6)
Pipelines/default/10-metrics-11     9.629Ki ± 0%   8.217Ki ± 0%  -14.66% (p=0.002 n=6)
Pipelines/relabel/10-metrics-11     12.52Ki ± 0%   10.41Ki ± 0%  -16.88% (p=0.002 n=6)
Pipelines/default/100-metrics-11    98.36Ki ± 0%   84.27Ki ± 0%  -14.33% (p=0.002 n=6)
Pipelines/relabel/100-metrics-11    126.6Ki ± 0%   105.5Ki ± 0%  -16.68% (p=0.002 n=6)
Pipelines/default/1000-metrics-11   997.1Ki ± 0%   856.4Ki ± 0%  -14.11% (p=0.002 n=6)
Pipelines/relabel/1000-metrics-11   1.249Mi ± 0%   1.043Mi ± 0%  -16.50% (p=0.002 n=6)
geomean                             41.76Ki        35.24Ki       -15.60%

                                  │  baseline   │                new                 │
                                  │  allocs/op  │  allocs/op   vs base               │
Pipelines/default/2-metrics-11       46.00 ± 0%    42.00 ± 0%   -8.70% (p=0.002 n=6)
Pipelines/relabel/2-metrics-11       58.00 ± 0%    52.00 ± 0%  -10.34% (p=0.002 n=6)
Pipelines/default/10-metrics-11      230.0 ± 0%    210.0 ± 0%   -8.70% (p=0.002 n=6)
Pipelines/relabel/10-metrics-11      290.0 ± 0%    260.0 ± 0%  -10.34% (p=0.002 n=6)
Pipelines/default/100-metrics-11    2.300k ± 0%   2.100k ± 0%   -8.70% (p=0.002 n=6)
Pipelines/relabel/100-metrics-11    2.900k ± 0%   2.600k ± 0%  -10.34% (p=0.002 n=6)
Pipelines/default/1000-metrics-11   24.49k ± 0%   22.49k ± 0%   -8.17% (p=0.002 n=6)
Pipelines/relabel/1000-metrics-11   30.49k ± 0%   27.49k ± 0%   -9.84% (p=0.002 n=6)
geomean                              985.0         892.5        -9.40%

The results were generated through the "headless" prometheus pipeline tests I added to track end-to-end scenarios related to WAL functionality that will be important as the labelstore continues to evolve.

Notes to the Reviewer

There's no need to have the interceptor also handle labelstore interactions as everything that uses the interceptor is also using prometheus.Fanout which is the most ideal entrypoint for labelstore interactions.

PR Checklist

  • CHANGELOG.md updated
  • Tests updated

@kgeckhart kgeckhart requested a review from a team as a code owner November 19, 2025 22:06

return &Fanout{
children: children,
componentID: componentID,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: This wasn't referenced anywhere

@kgeckhart kgeckhart force-pushed the kgeckhart/remove-labelstore-from-interceptor branch from 50b5eca to a27d37a Compare November 20, 2025 18:27
@kgeckhart kgeckhart force-pushed the kgeckhart/remove-labelstore-from-interceptor branch from a27d37a to 83fb65a Compare November 20, 2025 19:50
"github.com/grafana/alloy/syntax"
)

func TestRelabelThroughAppend(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered through the relabel pipeline test

Comment on lines 89 to 97
if len(res.Timeseries) == 1 {
results = append(results, res.Timeseries[0])
} else if len(res.Timeseries) == 2 {
results = res.Timeseries
}
// When we have two results make sure they match what we expect
if len(results) == 2 {
require.Equal(t, expect, results)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: This test failed so I thought I broke it but it's just flaky since it's possible one metric is remote written without the other.

go test ./internal/component/prometheus/remotewrite/... -run '^Test$' -count 20

used to break it locally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to have a for loop to ensure we eventually write both of the time series?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can change the remote write config to make sure both metrics are remote written at the same time? I'm not sure if that's possible though. Maybe we can set max_samples_per_send to 2 and batch_send_deadline to an hour? 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh yeah you're right this still isn't right. I'll fix it properly this time.

Copy link
Contributor Author

@kgeckhart kgeckhart Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the max samples + batch send that matches the test timeout, thanks for the idea!

Copy link
Contributor

@ptodev ptodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the label store, but on a high level this change makes sense. I'm just not sure why the Fanout is a better place for a cache than Interceptor? In theory should the Interceptor be better, since it's the entry point to a component?

Comment on lines 89 to 97
if len(res.Timeseries) == 1 {
results = append(results, res.Timeseries[0])
} else if len(res.Timeseries) == 2 {
results = res.Timeseries
}
// When we have two results make sure they match what we expect
if len(results) == 2 {
require.Equal(t, expect, results)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to have a for loop to ensure we eventually write both of the time series?

Comment on lines 89 to 97
if len(res.Timeseries) == 1 {
results = append(results, res.Timeseries[0])
} else if len(res.Timeseries) == 2 {
results = res.Timeseries
}
// When we have two results make sure they match what we expect
if len(results) == 2 {
require.Equal(t, expect, results)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can change the remote write config to make sure both metrics are remote written at the same time? I'm not sure if that's possible though. Maybe we can set max_samples_per_send to 2 and batch_send_deadline to an hour? 😆


- Add `meta_cache_address` to `beyla.ebpf` component. (@skl)

- Remove labelstore interactions from the prometheus interceptor simplifying prometheus pipelines. (@kgeckhart)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change really need a changelog entry? It is more like an internal refinment that doesn't impact users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 it's an optimization on a highly used pipeline

}

interceptor := c.newInterceptor(ls)
interceptor := NewInterceptor(livedebugging.ComponentID(o.ID), c.debugDataPublisher, alloyAppendable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that components such as scrape and receive_http also have an interceptor 🤔 It's not just for downstream components that have metrics streamed to them by upstream components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll comment on this in regards to why fanout is makes more sense for the labelstore interaction vs the interceptor. receive_http does not actually use the interceptor because it has no further logic to inject it only uses fanout.

@ptodev ptodev self-assigned this Nov 21, 2025
@kgeckhart
Copy link
Contributor Author

kgeckhart commented Nov 21, 2025

I'm just not sure why the Fanout is a better place for a cache than Interceptor? In theory should the Interceptor be better, since it's the entry point to a component?

It's a good question that I didn't expand upon in the PR. Fanout is the appender that must be used to emit metrics from any alloy component as it's what ensures we can multi-cast metrics to downstream consumers. The interceptor is used by most components that emit metrics but it is not a requirement, enrich + relabel use it to run their functionality, and scrape uses it to support livedebugging but it's not used by receive_http, otelcol.exporter.prometheus, and the operator which all use Fanout directly. Since any component that needs to emit metrics uses Fanout it makes the most sense to put logic there to enforce emitted metrics have a valid global seriesRef.

I added a comment to Fanout to indicate it's role with global series refs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants