Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support Patterns Command #1080

Open
penghuo opened this issue Mar 12, 2025 · 2 comments
Open

[FEATURE] Support Patterns Command #1080

penghuo opened this issue Mar 12, 2025 · 2 comments
Labels
enhancement New feature or request untriaged

Comments

@penghuo
Copy link
Collaborator

penghuo commented Mar 12, 2025

Is your feature request related to a problem?
Currently, the PPL Patterns command attaches detected patterns to each event in the input stream. In contrast, both CloudWatch Logs Insights and Splunk's cluster command provide summarized pattern detection, offering aggregated insights into log data.

What solution would you like?

1. Introduction

Enhances the patterns command in OpenSearch PPL to support summarized pattern detection, alongside backward-compatible label-mode and aggregation-mode output options. The goal is to:

  • Retain existing patterns functionality (label-mode).
  • Add aggregation-mode for grouped pattern statistics.

2. Expected behaviour

  • with input dataset
_time _raw
03-20-2018 09:37:33 ERROR: Unable to create directory /app/db1
03-20-2018 09:37:35 ERROR: Unable to create directory /app/db1
03-20-2018 09:38:01 DEBUG: Operation completed successfully
03-20-2018 09:38:05 DEBUG: Operation completed successfully
  • label-mode

... | patterns message annotate input events with pattern and tokens

_time message pattern tokens
03-20-2018 09:37:33 ERROR: Unable to create directory /app/db1 ERROR: Unable to create directory {"token1": "/app/db1"}
03-20-2018 09:37:35 ERROR: Unable to create directory /app/db2 ERROR: Unable to create directory {"token1": "/app/db2"}
03-20-2018 09:38:01 DEBUG: Operation completed successfully DEBUG: Operation completed {"token1": "successfully"}
03-20-2018 09:38:05 DEBUG: Operation completed error DEBUG: Operation completed {"token1": "error"}
  • aggregation-mode

... | patterns mode=aggregation-mode message annotate input events with pattern and tokens

pattern tokens pattern_count
ERROR: Unable to create directory [{"token1": ["/app/db1", "/app/db2"]}] 2
DEBUG: Operation completed [{"token1": ["successfully", "error"}] 2

3. Proposed Changes

3.1 Command Syntax

patterns 
  [mode=(label-mode|aggregation-mode)]
  ... // existing paramaters
  • mode
    • label-mode (default): Append per-event fields.
    • aggregation-mode: Group by pattern.

3.2 Output schema

  • Label-mode
Column Data Type Description
label int Unique pattern identifier.
count bigint Frequency of the pattern.
pattern string Pattern template (e.g., <token1> server error, status: <token2>).
tokens struct[] Array of {tokenName: tokenValue} pairs (e.g., [{"token1": "ServiceA"}, ...]).
  • Aggregation-mode
Column Data Type Description
pattern string Deduplicated pattern template.
count bigint Total occurrences of the pattern.
tokens struct[] Merged tokens with distinct values (e.g., [{"token1": ["ServiceA", "ServiceB"]}]).
@penghuo penghuo added enhancement New feature or request untriaged labels Mar 12, 2025
@ashwin-pc
Copy link
Member

Nice, i like this approach!

@ashwin-pc
Copy link
Member

How do we want to address performance issues though? like sample size and limits for fast and slow pattern call?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged
Projects
None yet
Development

No branches or pull requests

2 participants