[Feature Request] Paginating _wlm/stats API #17592

Lindsay-00 · 2025-03-15T00:32:28Z

Is your feature request related to a problem? Please describe

The current _wlm/stats API in OpenSearch provides query group statistics across nodes in a single response, which scales poorly as cluster size increases. Similar to _cat APIs (e.g., _cat/indices, _cat/shards), this API suffers from large response sizes, high latency, and increased CPU/memory consumption. This makes it difficult for users to efficiently retrieve and process query group statistics, especially in large clusters.

The need for pagination arises to:

Limit response size, reducing memory usage and response latency.
Prevent unnecessary aggregation of statistics for all nodes at once.
Enable efficient navigation of query group statistics, similar to paginated APIs like _list/indices and _list/shards.

The issues and approaches discussed in the following OpenSearch GitHub issues are particularly relevant:

OpenSearch Issue #14257: Discusses pagination for _cat APIs, highlighting the impact of large responses on cluster performance.
OpenSearch Issue #15014: Tracks the introduction of _list APIs to replace _cat APIs, ensuring efficient pagination with next_token.
OpenSearch Issue #14258: Discusses pagination strategies, emphasizing deterministic sorting keys for stable pagination behavior.

Describe the solution you'd like

To address the issues of large response sizes and high resource consumption in _wlm/stats, we propose introducing a new API endpoint (/_list/wlm_stats) with token-based pagination. This follows the approach used in OpenSearch Issue #14257 and OpenSearch Issue #15014, where _list APIs were introduced for paginating large _cat responses.

Key Features

Token-Based Pagination (next_token): Users can fetch query group statistics in smaller chunks, reducing resource consumption.
Sorting Support: Users can sort results by Node ID or Query Group, ensuring a stable and predictable pagination order.
Tabular Output: The response is structured similarly to _cat APIs, making it easy to read and process.
Scalability: Limits the amount of data retrieved per request, preventing excessive load on the cluster.

Sorting Options

Since CPU and memory usage fluctuate frequently, sorting by these values is not supported because it would cause inconsistent pagination results. Instead, sorting will be restricted to stable attributes:

node_id (Default): Sorts results lexicographically by Node ID, then by Query Group. Ensures structured browsing.
query_group: Groups results by Query Group, useful for analyzing workload behavior.

Example API Calls

Fetch First Page (Sorted by Query Group)


GET /_list/wlm_stats?size=50&sort=query_group&order=asc

Returns results grouped by Query Group, making it easier to analyze workload performance.

Fetch First Page (Sorted by Node ID)


GET /_list/wlm_stats?size=50&sort=node_id&order=asc

Sorts results by Node ID, providing a stable, structured overview.

Fetch Next Page


GET /_list/wlm_stats?size=50&sort=node_id&order=asc&next_token=Base64EncodedCursor

Uses next_token to fetch the next 50 results in a stable order.

Related component

Search

Describe alternatives you've considered

An alternative solution is to enhance the existing _wlm/stats API with filtering options, ensuring that only the most relevant statistics are retrieved.

Key Features

Targeted Data Retrieval: Users can filter results by Node ID, Query Group, CPU Usage, and Memory Usage to retrieve only relevant information.
Sorting Support: Supports sorting by CPU Usage, Memory Usage, Node ID, and Query Group for better analysis.
Tabular Output: Maintains structured, easy-to-read output similar to _cat APIs.
Performance Optimization: Eliminates unnecessary data retrieval, improving query response times.

Example API Calls

Fetch Nodes with CPU Usage Above 50%

GET/_wlm/stats?cpu_threshold=50

Returns only nodes consuming more than 50% CPU.

Fetch Nodes with High Memory Usage

GET/_wlm/stats?memory_threshold=70

Retrieves only nodes using more than 70% memory.

Fetch Query Groups for a Specific Node

GET/_wlm/stats?node_id=jPPwGjW-TA2NZB6Gn7RZtg

Returns query group statistics for the given node.

Additional context

No response

The text was updated successfully, but these errors were encountered:

Lindsay-00 added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 15, 2025

github-actions bot added the Search Search query, autocomplete ...etc label Mar 15, 2025

github-project-automation bot added this to Search Project Board Mar 15, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board Mar 15, 2025

sandeshkr419 removed the untriaged label Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Paginating _wlm/stats API #17592

[Feature Request] Paginating _wlm/stats API #17592

Lindsay-00 commented Mar 15, 2025

[Feature Request] Paginating _wlm/stats API #17592

[Feature Request] Paginating _wlm/stats API #17592

Comments

Lindsay-00 commented Mar 15, 2025

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Key Features

Sorting Options

Example API Calls

Fetch First Page (Sorted by Query Group)

Fetch First Page (Sorted by Node ID)

Fetch Next Page

Related component

Describe alternatives you've considered

Key Features

Example API Calls

Additional context