-
Couldn't load subscription status.
- Fork 15
feat: [Trace Stats] Use the concentrator in libdatadog #887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| )); | ||
| let (_, concentrator) = StatsConcentratorService::new(config, tags_provider); | ||
| let mut aggregator = StatsAggregator::new(640, concentrator); | ||
| let mut aggregator = StatsAggregator::new(720, concentrator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duncanista I'm trying to understand this test in order to fix it. Why "The batch should only contain the first 2 payloads" when "Payload below is 115 bytes" and aggregator's max_content_size_bytes is 640 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the number was different before and someone updated it (? I don't remember, payload probably grew with new types too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now it's 352 bytes. Will update the comment and change max_content_size_bytes to 704.
| process_tags: "process_tags".to_string(), | ||
| process_tags_hash: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New fields resulted from upgrading libdatadog
| metrics: HashMap::new(), | ||
| meta_struct: HashMap::new(), | ||
| span_links: Vec::new(), | ||
| span_events: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New field resulted from upgrading libdatadog
| } | ||
|
|
||
| pub struct StatsConcentratorService { | ||
| concentrator: StatsConcentrator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the concentrator we built from scratch.
|
|
||
| pub struct StatsConcentratorService { | ||
| concentrator: StatsConcentrator, | ||
| concentrator: SpanConcentrator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the existing concentrator in libdatadog. https://github.com/DataDog/libdatadog/blob/main/datadog-trace-stats/src/span_concentrator/mod.rs#L56
| let handle = StatsConcentratorHandle::new(tx); | ||
| let concentrator = StatsConcentrator::new(config, tags_provider); | ||
| let service: StatsConcentratorService = Self { concentrator, rx }; | ||
| // TODO: set span_kinds_stats_computed and peer_tag_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it in separate diffs.
| }, | ||
| }; | ||
| if let Err(err) = self.stats_concentrator.add(stats) { | ||
| if let Err(err) = self.stats_concentrator.add(span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pass the pb::Span, which is what the concentrator takes in. No need to construct StatsEvent.
bottlecap/Cargo.toml
Outdated
| datadog-trace-utils = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" , features = ["mini_agent"] } | ||
| datadog-trace-normalization = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" } | ||
| datadog-trace-obfuscation = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" } | ||
| datadog-trace-stats = { git = "https://github.com/DataDog/libdatadog", rev = "5c16bbbb8ff0f7e20880e63a07bb48f3099b5116" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just moved to a separate crate.
| } | ||
| } | ||
|
|
||
| pub struct StatsConcentratorService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we consider renaming it to SpanConcentratorService?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think StatsConcentratorService is more clear in the context of Bottlecap. Do you prefer SpanConcentratorService?
| SetTracerMetadata(TracerMetadata), | ||
| Add(StatsEvent), | ||
| Flush(bool, oneshot::Sender<Vec<pb::ClientStatsPayload>>), | ||
| Add(Box<pb::Span>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we boxing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the error without boxing:
error: large size difference between variants
--> src/traces/stats_concentrator_service.rs:35:1
|
35 | / pub enum ConcentratorCommand {
36 | | SetTracerMetadata(TracerMetadata),
| | --------------------------------- the second-largest variant contains at least 96 bytes
37 | | Add(pb::Span),
| | ------------- the largest variant contains at least 336 bytes
38 | | Flush(bool, oneshot::Sender<Option<ClientStatsPayload>>),
39 | | }
| |_^ the entire enum is at least 0 bytes
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#large_enum_variant
Let me add a comment to make this clear.
This PR
Use the existing Rust-based stats concentrator in libdatadog, instead of using our concentrator built from scratch.
Testing
Test 1
Works well in low-volume and no-concurrency scenario (~3 invocations / sec, using reserved concurrency of 1)
hitscount is correctTest 2
Doesn't work well in high-volume and high-concurrency scenarios. I see up 9% under/over-counting over 5000 invocations. Will debug this later and fix it in separate PRs.