Skip to content

Tornado instrumentation HTTP_SERVER_DURATION metric is inaccurate and not async-safe #3486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
devmonkey22 opened this issue May 9, 2025 · 0 comments · May be fixed by #3489
Open

Tornado instrumentation HTTP_SERVER_DURATION metric is inaccurate and not async-safe #3486

devmonkey22 opened this issue May 9, 2025 · 0 comments · May be fixed by #3489
Labels
bug Something isn't working

Comments

@devmonkey22
Copy link

Describe your environment

OS: Ubuntu
Python version: Python 3.10
SDK version: 1.31.1
API version: 0.52b1 (same code in 0.53b0 though)

What happened?

The tornado instrumenter and how it tracks a request's elapsed time is not async-safe and does not appear to be able to handle multiple concurrent requests without overwriting and using an incorrect value. This makes the HTTP_SERVER_DURATION metric inaccurate.

Steps to Reproduce

  1. Start one long-running request to a tornado server.
  2. Wait a moment, and start a second long-running request to a tornado server.
  3. Let the first request finish
  4. Let the second request finish.
  5. Review the calculated elapsed time of that request.

This effect is much more visible with streaming requests/handlers like an SSE handler or WebSocketHandler (if I ignore #2761) that may stay connected for hours/days.

The _START_TIME is stored in server_histograms which is a global dict created by the instrumentor. It stores this in the _prepare hook, then reads this back in _record_on_finish_metrics. You can get interleaved start times.

The start time really needs tracked per handler instance (connection).

Expected Result

Request1 Prepare -> StartTime = 10:00:00am
Request2 Prepare -> StartTime = 10:00:05am
Request1 OnFinish -> Reads original StartTime of 10:00:00am, so at 10:00:10am, elapsed time should be 10 seconds.
Request1 OnFinish -> Reads correct original StartTime as 10:00:05am, so at 10:00:10am, elapsed time should be 5 seconds.

Actual Result

Request1 Prepare -> StartTime = 10:00:00am
Request2 Prepare -> StartTime = 10:00:05am
Request1 OnFinish -> Reads incorrect StartTime of 10:00:05am, so at 10:00:10am, elapsed time reported incorrectly as 5 seconds.
Request1 OnFinish -> Reads correct original StartTime as 10:00:05am, so at 10:00:10am, elapsed reported as 5 seconds.

Additional context

No response

Would you like to implement a fix?

Yes

@devmonkey22 devmonkey22 added the bug Something isn't working label May 9, 2025
devmonkey22 added a commit to devmonkey22/opentelemetry-python-contrib that referenced this issue May 9, 2025
@devmonkey22 devmonkey22 linked a pull request May 9, 2025 that will close this issue
10 tasks
devmonkey22 added a commit to devmonkey22/opentelemetry-python-contrib that referenced this issue May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant