tools/stress/device-observer: implement agent prom scraper#3808
Draft
nikw9944 wants to merge 2 commits into
Draft
tools/stress/device-observer: implement agent prom scraper#3808nikw9944 wants to merge 2 commits into
nikw9944 wants to merge 2 commits into
Conversation
Replaces the promscrape Noop stub with a real scraper that, on every --sample-interval tick, fetches the doublezero-agent Prometheus endpoint at --agent-metrics-url and appends one NDJSON row per metric sample to observer.agent_metrics.json in the working directory. Also exposes Scraper.Snapshot() returning the latest counter family totals so the abort decider (PR #3796) can detect mid-sample counter increments without standing up its own scraper. Refs #3794
- Cap the scrape response body at 16 MiB via io.LimitReader (defense in depth against a misbehaving or compromised agent endpoint). - Treat an empty-but-2xx response as a soft failure that freezes the snapshot, so the abort decider does not interpret a transient empty body as 'counters reset to zero'. - Document that disk-write failures intentionally freeze the snapshot so the decider only ever sees fully-persisted ticks. - Tone down the appendRows atomicity comment — the relevant invariant is 'single writer per working directory', not POSIX append atomicity. - Add TestEmptyBodyFreezesSnapshot. Refs #3794
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
promscrapeNoopstub with a real Prometheus scraper: on every--sample-intervaltick the observer fetches--agent-metrics-url, parses the response withexpfmt.NewTextParser, and appends one NDJSON row{t_ns, metric_name, value, labels_json}per metric sample to<working-dir>/observer.agent_metrics.json.Scraper.Snapshot()returning the latest counter family totals (sum across label series). The abort decider that lands in 3747-4: abort decider + sentinel (~200 LOC code) #3796 consumes this; gauges and other types are intentionally excluded since the trigger inputs are counter deltas only.main.go(one-line change to the existing collector list).io.LimitReader; treats an empty-but-2xx response as a soft failure so a transient empty body cannot clobber the snapshot to{}.nikw9944/doublezero-3793).Testing Verification
go test -race -count=1 ./tools/stress/device-observer/internal/promscrape/...— 9 unit tests covering: happy-path NDJSON shape, label serialization round-trip, counter-only snapshot, snapshot stable across HTTP 500, snapshot reflects latest after a successful tick, empty-body freezes snapshot, malformed exposition body is logged and skipped, context cancellation, NDJSON line integrity.golangci-lint run ./tools/stress/device-observer/...clean.var _ collector.Collector = (*Scraper)(nil)assertion guarantees the scraper still satisfies the collector interface used bymain.go's errgroup.