Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote read API not performant in v2.13 #9764

Open
rishabhkumar92 opened this issue Oct 29, 2024 · 8 comments
Open

Remote read API not performant in v2.13 #9764

rishabhkumar92 opened this issue Oct 29, 2024 · 8 comments

Comments

@rishabhkumar92
Copy link

Describe the bug

Hello Team,
We have been testing using thanos querier with Mimir remote read API and are seeing significant performance difference in range query v/s remote read. One of first thing I noticed was queries weren't sharded that might be contributing to majority of

To Reproduce

Steps to reproduce the behavior:

  1. Start Mimir 2.13
  2. Perform remote read API call on one of reasonable expensive query similar to count(services_platform_service_request_count{namespace=~".*-staging$"}) by (namespace)

Expected behavior

This query is taking ~2 seconds to execute when query range API are used and it should take approx same time with remote read API too, however it took ~15+ seconds.

Environment

  • Infrastructure: Kubernetes, Laptop
  • Deployment tool: jsonnet

Additional Context

NA

@rishabhkumar92 rishabhkumar92 changed the title Remote read API not performant Remote read API not performant in v2.13 Oct 29, 2024
@rishabhkumar92
Copy link
Author

Image

Attaching a screenshot showing querier is taking a lot of time in data buffering even though Ingester finishes under few ms.

@rishabhkumar92
Copy link
Author

I saw fixes around remote read API not honoring hints. (ref) but I saw this perf issues during instant query so this is different issue than hinting fixes.

@pracucci
Copy link
Collaborator

pracucci commented Nov 7, 2024

The first thing that comes to my mind is that remote read supports two response types (specs):

  • Samples
  • Encoded chunks

Fetching samples is much slower than fetching the encoded chunks. Could you make sure that Thanos requests STREAMED_XOR_CHUNKS, please? That's what Mimir querier internally requests from ingesters when you run a range or instant query.

@pracucci
Copy link
Collaborator

pracucci commented Nov 7, 2024

Could you also share the full trace .json so I can look at it myself as well, please?

@rishabhkumar92
Copy link
Author

rishabhkumar92 commented Nov 7, 2024

@pracucci regarding Encoded chunks, I confirmed that encoded chunks is being used as response type and was introduced in thanos few years back (reference)

Regarding full trace, I am still figuring out how to export it as full json, also we are issuing federated query for 100 of tenant which is far slower in remote read compared to range APIs .

@rishabhkumar92
Copy link
Author

rishabhkumar92 commented Nov 7, 2024

Trace-4f3442-2024-11-07 15_12_26.json

Attaching a trace of an instant query which took 20+ seconds

@pracucci
Copy link
Collaborator

Attaching a trace of an instant query which took 20+ seconds

Thanks. I tried to load it in the Jaeger UI but doesn't work (apparently it's an invalid format for Jaeger). What format is the trace? Which application have you used to export it? Sorry for this ping-pong, but would be great if you could just give a me a trace that loads in the Jaeger UI.

To test it in Jaeger you can run it with:

docker run -p 16686:16686 jaegertracing/all-in-one:latest

Then upload the .json and see if it works. Thanks!

@rishabhkumar92
Copy link
Author

@pracucci I downloaded it from Grafana UI, can you try visualizing it in Grafana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants