Skip to content

Conversation

paulstn
Copy link
Collaborator

@paulstn paulstn commented Oct 14, 2025

Description

This PR introduces a new Field Statistics tab to the Explore plugin in OpenSearch Dashboards. The feature provides users with comprehensive statistical analysis of fields in their indices, enabling quick data profiling and exploration without writing complex queries.

The Field Stats tab displays a sortable, expandable table showing field-level statistics including document counts, distinct value counts, and type-specific detailed analytics such as top values, numeric summaries, date ranges, and example values.

Related RFC: #10614

What's New

Field Statistics Tab

  • New tab available in the Explore interface for both Logs and Metrics flavors
  • Positioned after the Visualization tab
  • Supports PPL as the default query language

Features

1. Field Statistics Table

A comprehensive, sortable table displaying all non-meta fields with:

  • Field name and type (with icon)
  • Document count and percentage
  • Distinct value count (approximate)
  • Expandable rows for detailed field-specific statistics
image

2. Expandable Row Details

Dynamic detail sections that adapt to field type:

String/Keyword Fields:

  • Top 10 most frequent values with counts and percentages
image

Numeric Fields:

  • Top 10 values
  • Summary statistics (min, median, average, max)
image

Date Fields:

  • Date range (earliest and latest timestamps)
image

Boolean Fields:

  • Top values distribution

Complex Fields (geo_point, geo_shape, binary, object):

  • Example values from the first 10 documents
image

3. Loading States

  • Centered loading spinner with "Searching in progress..." message during initial data fetch
  • Per-row loading indicators when expanding fields for detailed statistics
image

Architecture & Implementation

Component Structure

src/plugins/explore/public/components/
├── tabs/
│   └── field_stats_tab.tsx              # Main tab component
└── field_stats/
    ├── field_stats_container.tsx        # Data fetching and state management
    ├── field_stats_table.tsx            # Table component with sorting
    ├── field_stats_table_columns.tsx    # Column definitions
    ├── field_stats_row_details.tsx      # Expandable row details renderer
    ├── field_stats_queries.ts           # PPL query generation and execution
    ├── field_stats_detail_sections.ts   # Registry of detail sections
    ├── detail_sections/
    │   ├── top_values_detail.tsx        # Top values component and config
    │   ├── numeric_summary_detail.tsx   # Numeric statistics component and config
    │   ├── date_range_detail.tsx        # Date range component and config
    │   └── examples_detail.tsx          # Examples component and config
    └── utils/
        ├── field_stats_types.ts         # TypeScript interfaces
        ├── field_stats_utils.ts         # Utility functions
        ├── field_stats.stubs.ts         # Test stubs
        └── constants.ts                 # Constants and configurations

Key Design Patterns

1. Plugin Architecture for Detail Sections

The feature uses a registry-based architecture that makes it easy to add new detail sections:

// detail_sections/histogram_detail.tsx
export const histogramDetailConfig: DetailSectionConfig<HistogramData> = {
  id: 'histogram',
  title: 'Distribution',
  applicableToTypes: ['number'],
  fetchData: async (fieldName, dataset, services) => {
    // Fetch histogram data
  },
  component: HistogramSection,
};

// utils.ts
export const DETAIL_SECTIONS: DetailSectionConfig[] = [
  topValuesDetailConfig,
  numericSummaryDetailConfig,
  dateRangeDetailConfig,
  examplesDetailConfig,
  // new configs
];

Each detail section is self-contained with:

  • Its own PPL query function
  • React component for rendering
  • Configuration specifying applicable field types
  • Data fetching logic

2. Lazy Loading with Parallel Queries

  • Basic field statistics are fetched in parallel for all fields on tab load
  • Detailed statistics are only fetched when a row is expanded (lazy loading)
  • All applicable detail sections for a field are fetched in parallel when expanded

3. Field Filtering

Automatically filter out:

  • Meta fields (e.g., _id, _index, _source)
  • Multi-fields (e.g., .keyword subfields)
  • Scripted fields

State Management

The FieldStatsContainer manages multiple state pieces:

  • fieldStats: Basic statistics for all fields (Record<string, FieldStatsItem>)
  • isLoading: Global loading state for initial fetch
  • expandedRows: Set of currently expanded field names
  • fieldDetails: Detailed statistics for expanded fields (FieldDetailsMap)
  • detailsLoading: Set of fields currently loading details

Usage

Accessing the Feature

  1. Navigate to the Explore plugin in OpenSearch Dashboards
  2. Select a data source and index
  3. Click on the Field Stats tab
  4. View field statistics in the sortable table
  5. Click the expand button to see detailed statistics for any field

Use Cases

Data Profiling:

  • Quickly understand field distributions without writing queries
  • Identify fields with high cardinality
  • Find fields with missing data (low document counts)

Data Quality Analysis:

  • Check for unexpected values in string fields
  • Verify numeric ranges are within expected bounds
  • Identify date ranges and time spans

Query Optimization:

  • Find fields with few distinct values (good candidates for term aggregations)
  • Identify high-cardinality fields that may slow queries
  • Understand data distribution before creating visualizations

Follow ups

  • Pagination for the table, make it as simple as just a simple pagination, still need to wait for all top level queries
  • Action Bar, have it show the field count
  • Bug: Top values contains null for documents that this field isn’t a part of, making the percentage go crazy (>100). E.g. phpmemory in sample_logs
  • Hide for other dataset types

Screenshot

Testing the changes

Changelog

  • feat: Explore Field Statistics

paulstn added 14 commits October 7, 2025 13:25
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
…ch, and did other code cleanup such as refactoring, adding typing, adding comments

Signed-off-by: Paul Sebastian <[email protected]>
…ccurate top-level field row counts

Signed-off-by: Paul Sebastian <[email protected]>
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 76.69492% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.39%. Comparing base (dda6ee6) to head (7059665).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...mponents/field_stats/field_stats_table_columns.tsx 38.88% 9 Missing and 2 partials ⚠️
...blic/components/field_stats/field_stats_queries.ts 63.63% 7 Missing and 1 partial ⚠️
...ublic/components/field_stats/field_stats_table.tsx 71.42% 6 Missing and 2 partials ⚠️
...d_stats/detail_sections/numeric_summary_detail.tsx 66.66% 0 Missing and 6 partials ⚠️
.../field_stats/detail_sections/top_values_detail.tsx 72.22% 1 Missing and 4 partials ⚠️
...components/field_stats/field_stats_row_details.tsx 84.61% 2 Missing and 2 partials ⚠️
.../components/field_stats/utils/field_stats.stubs.ts 80.00% 4 Missing ⚠️
.../components/field_stats/utils/field_stats_utils.ts 92.59% 2 Missing and 2 partials ⚠️
...ts/field_stats/detail_sections/examples_detail.tsx 81.25% 0 Missing and 3 partials ⚠️
.../field_stats/detail_sections/date_range_detail.tsx 85.71% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10723      +/-   ##
==========================================
+ Coverage   60.33%   60.39%   +0.05%     
==========================================
  Files        4462     4474      +12     
  Lines      119475   119825     +350     
  Branches    19745    19841      +96     
==========================================
+ Hits        72090    72366     +276     
- Misses      42382    42424      +42     
- Partials     5003     5035      +32     
Flag Coverage Δ
Linux_1 26.58% <ø> (ø)
Linux_2 38.82% <ø> (ø)
Linux_3 38.89% <ø> (-0.01%) ⬇️
Linux_4 ?
Windows_1 26.60% <ø> (ø)
Windows_2 38.79% <ø> (ø)
Windows_3 38.90% <ø> (+<0.01%) ⬆️
Windows_4 33.19% <76.69%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
…izing details testing format

Signed-off-by: Paul Sebastian <[email protected]>
@paulstn paulstn marked this pull request as ready for review October 14, 2025 23:20
@paulstn paulstn requested a review from ananzh as a code owner October 14, 2025 23:20
@paulstn paulstn added the OSD Changes being merged by the OSD team label Oct 16, 2025
@TackAdam
Copy link
Collaborator

Investigating vrs 3.2 back-end issues found to address in follow-up

  1. Empty fields in the table due to missing dependency's needed for count/percentage. Might want to show ? indicator saying why these fields are empty but that they are still sorted correctly.
3 2EmptyFields 2. Body field got the following error 3 2BodyError 3. And tested with very large dataset got the following circuit breaker exception ``` CircuitBreakingException[[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.context.filter.max_compilations_rate] setting];\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.", ```

Copy link
Collaborator

@TackAdam TackAdam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reviewed in call together.

Main concern for follow-up is the ppl calls to get every field will take quite a while as only 5 can run concurrently. Need to test with big-data moving foward and consider optimizations to keep the feature usable.
5Batching

Copy link
Member

@sejli sejli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only main concern I have would be a performance hit if there are a bunch of queries that get lined up from multiple fields.

Others are just nits. I think the file structure can also be improved upon. There's a bunch of components for field stats that can be organized into directories.

Cypress tests should be a fast follow up.

fields.map(async (field) => {
try {
const query = getFieldStatsQuery(dataset.title, field.name);
const result = await executeFieldStatsQuery(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way for user to limit the max number or cancel these? Feel like there's going to be a lot of queries that are executed. I'm not familiar with the queries that are run, but is it possible to pass all the fields in a single query and have the engine return them all?


const allFieldStats: Record<string, FieldStatsItem> = {};
results.forEach((result) => {
if (result.status === 'fulfilled' && result.value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can leverage the existing query status?

/**
* Registry of all detail sections that can be displayed in expanded field rows.
*
* ## How to Add a New Detail Section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add some more context for what a detail section is for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all-star-contributor OSD Changes being merged by the OSD team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants