Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Duplicated requests on refreshing the overview #105

Open
ansjcy opened this issue Feb 7, 2025 · 4 comments
Open

[BUG] Duplicated requests on refreshing the overview #105

ansjcy opened this issue Feb 7, 2025 · 4 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@ansjcy
Copy link
Member

ansjcy commented Feb 7, 2025

What is the bug?

On refreshing the overview page, we are sending multiple duplicate top n queries requests on all the metrics. See below screenshot.

Image

How can one reproduce the bug?

  • Run the QID with all metrics enabled
  • Hit the refresh button and check the network requests.
Image

What is the expected behavior?

Ideally, only one request per metric should be send to the backend on refresh.

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

@ansjcy ansjcy added bug Something isn't working untriaged labels Feb 7, 2025
@ansjcy
Copy link
Member Author

ansjcy commented Feb 12, 2025

This could be a first good issue for query insights dashboards.

@ansjcy ansjcy added good first issue Good for newcomers and removed untriaged labels Feb 12, 2025
@brucejxz
Copy link

brucejxz commented Mar 8, 2025

Seeing this as well and it's killing a pretty much empty 2-node test cluster I set up on a t4g.medium from this docker compose file: https://opensearch.org/docs/latest/install-and-configure/install-opensearch/docker/#deploy-an-opensearch-cluster-using-docker-compose.

When I first load the page, the requests look like:

Image

But if I refresh it again:

Image

This is accompanied by these logs:

opensearch-node1       | [2025-03-08T19:38:45,121][INFO ][o.o.c.c.FollowersChecker ] [opensearch-node1] FollowerChecker{discoveryNode={opensearch-node2}{GGrs_d1ARSqSkU6HWg1bPA}{5DCrvZfAR_eiH
9wWHD6muA}{172.18.0.4}{172.18.0.4:9300}{dimr}{shard_indexing_pressure_enabled=true}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} failed, retrying
opensearch-node1       | org.opensearch.transport.ReceiveTimeoutTransportException: [opensearch-node2][172.18.0.4:9300][internal:coordination/fault_detection/follower_check] request_id [1056
] timed out after [10014ms]
opensearch-node1       |        at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1421) [opensearch-2.19.0.jar:2.19.0]
opensearch-node1       |        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955) [opensearch-2.19.0.jar:2.19.0]
opensearch-node1       |        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-node1       |        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-node1       |        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-node1       | [2025-03-08T19:38:54,064][WARN ][o.o.c.InternalClusterInfoService] [opensearch-node1] Failed to update node information for ClusterInfoUpdateJob within 15s timeout
opensearch-node1       | [2025-03-08T19:38:56,123][INFO ][o.o.c.c.FollowersChecker ] [opensearch-node1] FollowerChecker{discoveryNode={opensearch-node2}{GGrs_d1ARSqSkU6HWg1bPA}{5DCrvZfAR_eiH
9wWHD6muA}{172.18.0.4}{172.18.0.4:9300}{dimr}{shard_indexing_pressure_enabled=true}, failureCountSinceLastSuccess=2, [cluster.fault_detection.follower_check.retry_count]=3} failed, retrying
opensearch-node1       | org.opensearch.transport.ReceiveTimeoutTransportException: [opensearch-node2][172.18.0.4:9300][internal:coordination/fault_detection/follower_check] request_id [1093
] timed out after [10006ms]
opensearch-node1       |        at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1421) [opensearch-2.19.0.jar:2.19.0]
opensearch-node1       |        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955) [opensearch-2.19.0.jar:2.19.0]
opensearch-node1       |        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-node1       |        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-node1       |        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-dashboards  | Unable to get top queries (cpu):  StatusCodeError: Request Timeout after 30000ms
opensearch-dashboards  |     at /usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:397:9
opensearch-dashboards  |     at Timeout.<anonymous> (/usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:429:7)
opensearch-dashboards  |     at listOnTimeout (node:internal/timers:569:17)
opensearch-dashboards  |     at processTimers (node:internal/timers:512:7) {
opensearch-dashboards  |   status: undefined,
opensearch-dashboards  |   displayName: 'RequestTimeout',
opensearch-dashboards  |   body: undefined
opensearch-dashboards  | }
opensearch-dashboards  | {"type":"response","@timestamp":"2025-03-08T19:38:33Z","tags":[],"pid":1,"method":"get","statusCode":200,"req":{"url":"/api/top_queries/cpu?from=2025-03-07T19%3A38%3
A33.633Z&to=2025-03-08T19%3A38%3A33.633Z","method":"get","headers":{"host":"localhost:5601","connection":"keep-alive","osd-version":"2.19.0","sec-ch-ua-platform":"\"macOS\"","user-agent":"Mo
zilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36","sec-ch-ua":"\"Chromium\";v=\"134\", \"Not:A-Brand\";v=\"24\", \"Brave\";
v=\"134\"","content-type":"application/json","sec-ch-ua-mobile":"?0","osd-xsrf":"osd-fetch","accept":"*/*","sec-gpc":"1","accept-language":"en-GB,en;q=0.7","sec-fetch-site":"same-origin","se
c-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"http://localhost:5601/app/query-insights-dashboards","accept-encoding":"gzip, deflate, br, zstd","securitytenant":""},"remoteAddress"
:"172.18.0.1","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36","referer":"http://localhost:5601/app/query-i
nsights-dashboards"},"res":{"statusCode":200,"responseTime":30157,"contentLength":9},"message":"GET /api/top_queries/cpu?from=2025-03-07T19%3A38%3A33.633Z&to=2025-03-08T19%3A38%3A33.633Z 200
 30157ms - 9.0B"}
opensearch-dashboards  | Unable to get top queries (cpu):  StatusCodeError: Request Timeout after 30000ms
opensearch-dashboards  |     at /usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:397:9
opensearch-dashboards  |     at Timeout.<anonymous> (/usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:429:7)
opensearch-dashboards  |     at listOnTimeout (node:internal/timers:569:17)
opensearch-dashboards  |     at processTimers (node:internal/timers:512:7) {
opensearch-dashboards  |   status: undefined,
opensearch-dashboards  |   displayName: 'RequestTimeout',
opensearch-dashboards  |   body: undefined
opensearch-dashboards  | }

@rishabh6788
Copy link

I migrated my self managed OS-2.12 cluster to 2.19.1 and upon clicking on query-insights tab on the dashboards the cluster became unresponsive and data nodes started dropping out. Restarted and tried again clicking on query-insights tab to confirm and it happened again. Had to disable the plugin to stabilize the cluster.

@ansjcy
Copy link
Member Author

ansjcy commented Mar 18, 2025

@rishabh6788 there are 2 main issues, one is the duplicated requests mentioned here, another one is by default we are fetching way too many records from the index. We are fixing this issue by setitng the default reader size to 50.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants