-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Explore] Field Statistics Tab #10723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
…ch, and did other code cleanup such as refactoring, adding typing, adding comments Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
…ccurate top-level field row counts Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #10723 +/- ##
==========================================
+ Coverage 60.33% 60.39% +0.05%
==========================================
Files 4462 4474 +12
Lines 119475 119825 +350
Branches 19745 19841 +96
==========================================
+ Hits 72090 72366 +276
- Misses 42382 42424 +42
- Partials 5003 5035 +32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
…izing details testing format Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
…c percentage within frontend Signed-off-by: Paul Sebastian <[email protected]>
…time range Signed-off-by: Paul Sebastian <[email protected]>
Signed-off-by: Paul Sebastian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only main concern I have would be a performance hit if there are a bunch of queries that get lined up from multiple fields.
Others are just nits. I think the file structure can also be improved upon. There's a bunch of components for field stats that can be organized into directories.
Cypress tests should be a fast follow up.
fields.map(async (field) => { | ||
try { | ||
const query = getFieldStatsQuery(dataset.title, field.name); | ||
const result = await executeFieldStatsQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way for user to limit the max number or cancel these? Feel like there's going to be a lot of queries that are executed. I'm not familiar with the queries that are run, but is it possible to pass all the fields in a single query and have the engine return them all?
|
||
const allFieldStats: Record<string, FieldStatsItem> = {}; | ||
results.forEach((result) => { | ||
if (result.status === 'fulfilled' && result.value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can leverage the existing query status?
/** | ||
* Registry of all detail sections that can be displayed in expanded field rows. | ||
* | ||
* ## How to Add a New Detail Section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add some more context for what a detail section is for
Description
This PR introduces a new Field Statistics tab to the Explore plugin in OpenSearch Dashboards. The feature provides users with comprehensive statistical analysis of fields in their indices, enabling quick data profiling and exploration without writing complex queries.
The Field Stats tab displays a sortable, expandable table showing field-level statistics including document counts, distinct value counts, and type-specific detailed analytics such as top values, numeric summaries, date ranges, and example values.
Related RFC: #10614
What's New
Field Statistics Tab
Features
1. Field Statistics Table
A comprehensive, sortable table displaying all non-meta fields with:
2. Expandable Row Details
Dynamic detail sections that adapt to field type:
String/Keyword Fields:
Numeric Fields:
Date Fields:
Boolean Fields:
Complex Fields (geo_point, geo_shape, binary, object):
3. Loading States
Architecture & Implementation
Component Structure
Key Design Patterns
1. Plugin Architecture for Detail Sections
The feature uses a registry-based architecture that makes it easy to add new detail sections:
Each detail section is self-contained with:
2. Lazy Loading with Parallel Queries
3. Field Filtering
Automatically filter out:
_id
,_index
,_source
).keyword
subfields)State Management
The
FieldStatsContainer
manages multiple state pieces:fieldStats
: Basic statistics for all fields (Record<string, FieldStatsItem>)isLoading
: Global loading state for initial fetchexpandedRows
: Set of currently expanded field namesfieldDetails
: Detailed statistics for expanded fields (FieldDetailsMap)detailsLoading
: Set of fields currently loading detailsUsage
Accessing the Feature
Use Cases
Data Profiling:
Data Quality Analysis:
Query Optimization:
Follow ups
Screenshot
Testing the changes
Changelog