Skip to content

Commit b8e63ff

Browse files
theletterftiffany76opentelemetrybotcartermpjaydeluca
authored
Add Performance doc to Java (#4828)
Co-authored-by: Tiffany Hrabusa <[email protected]> Co-authored-by: opentelemetrybot <[email protected]> Co-authored-by: Phillip Carter <[email protected]> Co-authored-by: Jay DeLuca <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]>
1 parent ce38468 commit b8e63ff

File tree

1 file changed

+194
-0
lines changed

1 file changed

+194
-0
lines changed
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
title: Performance
3+
description: Performance reference for the OpenTelemetry Java agent
4+
weight: 75
5+
---
6+
7+
The OpenTelemetry Java agent instruments your application by running inside the
8+
same Java Virtual Machine (JVM). Like any other software agent, the Java agent
9+
requires system resources like CPU, memory, and network bandwidth. The use of
10+
resources by the agent is called agent overhead or performance overhead. The
11+
OpenTelemetry Java agent has minimal impact on system performance when
12+
instrumenting JVM applications, although the final agent overhead depends on
13+
multiple factors.
14+
15+
Some factors that might increase agent overhead are environmental, such as the
16+
physical machine architecture, CPU frequency, amount and speed of memory, system
17+
temperature, and resource contention. Other factors include virtualization and
18+
containerization, the operating system and its libraries, the JVM version and
19+
vendor, JVM settings, the algorithmic design of the software being monitored,
20+
and software dependencies.
21+
22+
Due to the complexity of modern software and the broad diversity in deployment
23+
scenarios, it is impossible to come up with a single agent overhead estimate. To
24+
find the overhead of any instrumentation agent in a given deployment, you have
25+
to conduct experiments and collect measurements directly. Therefore, treat all
26+
statements about performance as general information and guidelines that are
27+
subject to evaluation in a specific system.
28+
29+
The following sections describe the minimum requirements of the OpenTelemetry
30+
Java agent, as well as potential constraints impacting performance, and
31+
guidelines to optimize and troubleshoot the performance of the agent.
32+
33+
## Guidelines to reduce agent overhead
34+
35+
The following best practices and techniques might help reduce overhead caused by
36+
the Java agent.
37+
38+
### Configure trace sampling
39+
40+
The volume of spans processed by the instrumentation might impact agent
41+
overhead. You can configure trace sampling to adjust the span volume and reduce
42+
resource usage. See [Sampling](/docs/languages/java/sampling).
43+
44+
### Turn off specific instrumentations
45+
46+
You can further reduce agent overhead by turning off instrumentations that
47+
aren't needed or are producing too many spans. To turn off an instrumentation,
48+
use `-Dotel.instrumentation.<name>.enabled=false` or the
49+
`OTEL_INSTRUMENTATION_<NAME>_ENABLED` environment variable, where `<name>` is
50+
the name of the instrumentation.
51+
52+
For example, the following option turns off the JDBC instrumentation:
53+
`-Dotel.instrumentation.jdbc.enabled=false`
54+
55+
### Allocate more memory for the application
56+
57+
Increasing the maximum heap size of the JVM using the `-Xmx<size>` option might
58+
help in alleviating agent overhead issues, as instrumentations can generate a
59+
large number of short-lived objects in memory.
60+
61+
### Reduce manual instrumentation to what you need
62+
63+
Too much manual instrumentation might introduce inefficiencies that increase
64+
agent overhead. For example, using `@WithSpan` on every method results in a high
65+
span volume, which in turn increases noise in the data and consumes more system
66+
resources.
67+
68+
### Provision adequate resources
69+
70+
Make sure to provision enough resources for your instrumentation and for the
71+
Collector. The amount of resources such as memory or disk depend on your
72+
application architecture and needs. For example, a common setup is to run the
73+
instrumented application on the same host as the OpenTelemetry Collector. In
74+
that case, consider rightsizing the resources for the Collector and optimize its
75+
settings. See [Scaling](/docs/collector/scaling/).
76+
77+
## Constraints impacting the performance of the Java agent
78+
79+
In general, the more telemetry you collect from your application, the greater
80+
the the impact on agent overhead. For example, tracing methods that aren't
81+
relevant to your application can still produce considerable agent overhead
82+
because tracing such methods is computationally more expensive than running the
83+
method itself. Similarly, high cardinality tags in metrics might increase memory
84+
usage. Debug logging, if turned on, also increases write operations to disk and
85+
memory usage.
86+
87+
Some instrumentations, for example JDBC or Redis, produce high span volumes that
88+
increase agent overhead. For more information on how to turn off unnecessary
89+
instrumentations, see
90+
[Turn off specific instrumentations](#turn-off-specific-instrumentations).
91+
92+
> [!NOTE] Experimental features of the Java agent might increase agent overhead
93+
> due to the experimental focus on functionality over performance. Stable
94+
> features are safer in terms of agent overhead.
95+
96+
## Troubleshooting agent overhead issues
97+
98+
When troubleshooting agent overhead issues, do the following:
99+
100+
- Check minimum requirements. See
101+
[Prerequisites](/docs/languages/java/getting-started/#prerequisites).
102+
- Use the latest compatible version of the Java agent.
103+
- Use the latest compatible version of your JVM.
104+
105+
Consider taking the following actions to decrease agent overhead:
106+
107+
- If your application is approaching memory limits, consider giving it more
108+
memory.
109+
- If your application is using all the CPU, you might want to scale it
110+
horizontally.
111+
- Try turning off or tuning metrics.
112+
- Tune trace sampling settings to reduce span volume.
113+
- Turn off specific instrumentations.
114+
- Review manual instrumentation for unnecessary span generation.
115+
116+
## Guidelines for measuring agent overhead
117+
118+
Measuring agent overhead in your own environment and deployments provides
119+
accurate data about the impact of instrumentation on the performance of your
120+
application or service. The following guidelines describe the general steps for
121+
collecting and comparing reliable agent overhead measurements.
122+
123+
### Decide what you want to measure
124+
125+
Different users of your application or service might notice different aspects of
126+
agent overhead. For example, while end users might notice degradation in service
127+
latency, power users with heavy workloads pay more attention to CPU overhead. On
128+
the other hand, users who deploy frequently, for example due to elastic
129+
workloads, care more about startup time.
130+
131+
Reduce your measurements to factors that are sure to impact user experience, so
132+
your datasets don't contain irrelevant information. Some examples of
133+
measurements include the following:
134+
135+
- User average, user peak, and machine average CPU usage
136+
- Total memory allocated and maximum heap used
137+
- Garbage collection pause time
138+
- Startup time in milliseconds
139+
- Average and percentile 95 (p95) service latency
140+
- Network read and write average throughput
141+
142+
### Prepare a suitable test environment
143+
144+
By measuring agent overhead in a controlled test environment you can better
145+
identify the factors affecting performance. When preparing a test environment,
146+
complete the following:
147+
148+
1. Make sure that the configuration of the test environment resembles
149+
production.
150+
2. Isolate the application under test from other services that might interfere.
151+
3. Turn off or remove all unnecessary system services on the application host.
152+
4. Ensure that the application has enough system resources to handle the test
153+
workload.
154+
155+
### Create a battery of realistic tests
156+
157+
Design the tests that you run against the test environment to resemble typical
158+
workloads as much as possible. For example, if some REST API endpoints of your
159+
service are susceptible to high request volumes, create a test that simulates
160+
heavy network traffic.
161+
162+
For Java applications, use a warm-up phase prior to starting measurements. The
163+
JVM is a highly dynamic machine that performs a large number of optimizations
164+
through just-in-time compilation (JIT). The warm-up phase helps the application
165+
to finish most of its class loading and gives the JIT compiler time to run the
166+
majority of optimizations.
167+
168+
Make sure to run a large number of requests and to repeat the test pass many
169+
times. This repetition helps to ensure a representative data sample. Include
170+
error scenarios in your test data. Simulate an error rate similar to that of a
171+
normal workload, typically between 2% and 10%.
172+
173+
{{% alert title="Note" color="info" %}}
174+
175+
Tests might increase costs when targeting observability backends and other
176+
commercial services. Plan your tests accordingly or consider using alternative
177+
solutions, such as self-hosted or locally run backends.
178+
179+
{{% /alert %}}
180+
181+
### Collect comparable measurements
182+
183+
To identify which factors might be affecting performance and causing agent
184+
overhead, collect measurements in the same environment after modifying a single
185+
factor or condition.
186+
187+
### Analyze the agent overhead data
188+
189+
After collecting data from multiple passes, you can plot results in a chart or
190+
compare averages using statistical tests to check for significant differences.
191+
192+
Consider that different stacks, applications, and environments might result in
193+
different operational characteristics and different agent overhead measurement
194+
results.

0 commit comments

Comments
 (0)