feat: add individual expression metrics tracking in ProjectionExec #18573
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This update introduces a new configuration option,
individual_expr_metrics, allowing ProjectionExec to track execution time for each expression separately. When enabled, detailed profiling metrics will be generated for each expression, enhancing performance analysis in EXPLAIN ANALYZE output. The implementation includes modifications to the ProjectionStream to conditionally record metrics based on the configuration. Additionally, tests have been added to verify the correct behavior of the new feature when enabled and disabled.Which issue does this PR close?
ProjectExecmetrics (inEXPLAIN ANALYZE) #18456Rationale for this change
This PR addresses the need for granular expression-level performance profiling in DataFusion's EXPLAIN ANALYZE output. Currently, ProjectionExec only provides aggregate metrics for the entire operation, making it difficult to identify which specific expressions are performance bottlenecks. By adding individual expression metrics, users can gain deeper insights into query performance and optimize their queries more effectively.
The implementation follows DataFusion's existing metrics collection patterns and integrates seamlessly with the current configuration system, ensuring backward compatibility and minimal performance overhead when disabled.
What changes are included in this PR?
individual_expr_metricsconfiguration option to enable/disable individual expression trackingProjectionStreamto conditionally track metrics for each expression when enabledEXPLAIN ANALYZEoutput to display individual expression metrics when enabledAre these changes tested?
Yes, this PR includes comprehensive test coverage:
All tests pass successfully and the implementation maintains compatibility with existing functionality.
Are there any user-facing changes?
Yes, this PR introduces user-facing changes by extending the public API and functionality:
New Configuration:
individual_expr_metrics- Boolean configuration option to enable/disable individual expression trackingNew User Impact:
No Breaking Changes:
The changes follow DataFusion's API evolution guidelines and are fully backward compatible.
Note: When
individual_expr_metricsis enabled, there may be a small performance overhead due to the additional string formatting for expression labels and per-expression timing measurements. This overhead is only incurred when the feature is explicitly enabled and provides valuable profiling information for query optimization.