Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Paginating _wlm/stats API #17592

Open
Lindsay-00 opened this issue Mar 15, 2025 · 0 comments
Open

[Feature Request] Paginating _wlm/stats API #17592

Lindsay-00 opened this issue Mar 15, 2025 · 0 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@Lindsay-00
Copy link

Is your feature request related to a problem? Please describe

The current _wlm/stats API in OpenSearch provides query group statistics across nodes in a single response, which scales poorly as cluster size increases. Similar to _cat APIs (e.g., _cat/indices, _cat/shards), this API suffers from large response sizes, high latency, and increased CPU/memory consumption. This makes it difficult for users to efficiently retrieve and process query group statistics, especially in large clusters.

The need for pagination arises to:

  1. Limit response size, reducing memory usage and response latency.
  2. Prevent unnecessary aggregation of statistics for all nodes at once.
  3. Enable efficient navigation of query group statistics, similar to paginated APIs like _list/indices and _list/shards.

The issues and approaches discussed in the following OpenSearch GitHub issues are particularly relevant:

OpenSearch Issue #14257: Discusses pagination for _cat APIs, highlighting the impact of large responses on cluster performance.
OpenSearch Issue #15014: Tracks the introduction of _list APIs to replace _cat APIs, ensuring efficient pagination with next_token.
OpenSearch Issue #14258: Discusses pagination strategies, emphasizing deterministic sorting keys for stable pagination behavior.

Describe the solution you'd like

To address the issues of large response sizes and high resource consumption in _wlm/stats, we propose introducing a new API endpoint (/_list/wlm_stats) with token-based pagination. This follows the approach used in OpenSearch Issue #14257 and OpenSearch Issue #15014, where _list APIs were introduced for paginating large _cat responses.

Key Features

  1. Token-Based Pagination (next_token): Users can fetch query group statistics in smaller chunks, reducing resource consumption.
  2. Sorting Support: Users can sort results by Node ID or Query Group, ensuring a stable and predictable pagination order.
  3. Tabular Output: The response is structured similarly to _cat APIs, making it easy to read and process.
  4. Scalability: Limits the amount of data retrieved per request, preventing excessive load on the cluster.

Sorting Options

Since CPU and memory usage fluctuate frequently, sorting by these values is not supported because it would cause inconsistent pagination results. Instead, sorting will be restricted to stable attributes:

  1. node_id (Default): Sorts results lexicographically by Node ID, then by Query Group. Ensures structured browsing.
  2. query_group: Groups results by Query Group, useful for analyzing workload behavior.

Example API Calls

Fetch First Page (Sorted by Query Group)
GET /_list/wlm_stats?size=50&sort=query_group&order=asc

Returns results grouped by Query Group, making it easier to analyze workload performance.

Fetch First Page (Sorted by Node ID)
GET /_list/wlm_stats?size=50&sort=node_id&order=asc

Sorts results by Node ID, providing a stable, structured overview.

Fetch Next Page
GET /_list/wlm_stats?size=50&sort=node_id&order=asc&next_token=Base64EncodedCursor

Uses next_token to fetch the next 50 results in a stable order.

Related component

Search

Describe alternatives you've considered

An alternative solution is to enhance the existing _wlm/stats API with filtering options, ensuring that only the most relevant statistics are retrieved.

Key Features

  1. Targeted Data Retrieval: Users can filter results by Node ID, Query Group, CPU Usage, and Memory Usage to retrieve only relevant information.
  2. Sorting Support: Supports sorting by CPU Usage, Memory Usage, Node ID, and Query Group for better analysis.
  3. Tabular Output: Maintains structured, easy-to-read output similar to _cat APIs.
  4. Performance Optimization: Eliminates unnecessary data retrieval, improving query response times.

Example API Calls

Fetch Nodes with CPU Usage Above 50%

GET/_wlm/stats?cpu_threshold=50

Returns only nodes consuming more than 50% CPU.

Fetch Nodes with High Memory Usage

GET/_wlm/stats?memory_threshold=70

Retrieves only nodes using more than 70% memory.

Fetch Query Groups for a Specific Node

GET/_wlm/stats?node_id=jPPwGjW-TA2NZB6Gn7RZtg

Returns query group statistics for the given node.

Additional context

No response

@Lindsay-00 Lindsay-00 added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 15, 2025
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Mar 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants