Skip to content

Conversation

@lym953
Copy link
Contributor

@lym953 lym953 commented Oct 3, 2025

This PR

Use the existing Rust-based stats concentrator in libdatadog, instead of using our concentrator built from scratch.

Testing

Test 1

Works well in low-volume and no-concurrency scenario (~3 invocations / sec, using reserved concurrency of 1)

  • Total hits count is correct
  • The timestamp of 1 invocation (out of 1703) is 2 minutes too late, compared with CloudWatch metrics. But I don't think it's a big problem.
image

Test 2

Doesn't work well in high-volume and high-concurrency scenarios. I see up 9% under/over-counting over 5000 invocations. Will debug this later and fix it in separate PRs.

));
let (_, concentrator) = StatsConcentratorService::new(config, tags_provider);
let mut aggregator = StatsAggregator::new(640, concentrator);
let mut aggregator = StatsAggregator::new(720, concentrator);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duncanista I'm trying to understand this test in order to fix it. Why "The batch should only contain the first 2 payloads" when "Payload below is 115 bytes" and aggregator's max_content_size_bytes is 640 bytes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the number was different before and someone updated it (? I don't remember, payload probably grew with new types too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now it's 352 bytes. Will update the comment and change max_content_size_bytes to 704.

Comment on lines +195 to +196
process_tags: "process_tags".to_string(),
process_tags_hash: 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New fields resulted from upgrading libdatadog

metrics: HashMap::new(),
meta_struct: HashMap::new(),
span_links: Vec::new(),
span_events: Vec::new(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New field resulted from upgrading libdatadog

@lym953 lym953 marked this pull request as ready for review October 3, 2025 19:22
@lym953 lym953 requested a review from a team as a code owner October 3, 2025 19:22
}

pub struct StatsConcentratorService {
concentrator: StatsConcentrator,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the concentrator we built from scratch.


pub struct StatsConcentratorService {
concentrator: StatsConcentrator,
concentrator: SpanConcentrator,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let handle = StatsConcentratorHandle::new(tx);
let concentrator = StatsConcentrator::new(config, tags_provider);
let service: StatsConcentratorService = Self { concentrator, rx };
// TODO: set span_kinds_stats_computed and peer_tag_keys
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it in separate diffs.

},
};
if let Err(err) = self.stats_concentrator.add(stats) {
if let Err(err) = self.stats_concentrator.add(span) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pass the pb::Span, which is what the concentrator takes in. No need to construct StatsEvent.

datadog-trace-utils = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" , features = ["mini_agent"] }
datadog-trace-normalization = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" }
datadog-trace-obfuscation = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" }
datadog-trace-stats = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just moved to a separate crate.

}
}

pub struct StatsConcentratorService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we consider renaming it to SpanConcentratorService?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think StatsConcentratorService is more clear in the context of Bottlecap. Do you prefer SpanConcentratorService?

SetTracerMetadata(TracerMetadata),
Add(StatsEvent),
Flush(bool, oneshot::Sender<Vec<pb::ClientStatsPayload>>),
Add(Box<pb::Span>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we boxing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the error without boxing:

error: large size difference between variants
  --> src/traces/stats_concentrator_service.rs:35:1
   |
35 | / pub enum ConcentratorCommand {
36 | |     SetTracerMetadata(TracerMetadata),
   | |     --------------------------------- the second-largest variant contains at least 96 bytes
37 | |     Add(pb::Span),
   | |     ------------- the largest variant contains at least 336 bytes
38 | |     Flush(bool, oneshot::Sender<Option<ClientStatsPayload>>),
39 | | }
   | |_^ the entire enum is at least 0 bytes
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#large_enum_variant

Let me add a comment to make this clear.

@lym953 lym953 merged commit 93c7011 into main Oct 16, 2025
12 of 13 checks passed
@lym953 lym953 deleted the yiming.luo/trace-stats-14 branch October 16, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants