[Sandbox] Drop DatetimeOutputCastRewriter on the analytics-engine route (sql#5420)#21748
[Sandbox] Drop DatetimeOutputCastRewriter on the analytics-engine route (sql#5420)#21748mengweieric wants to merge 3 commits into
Conversation
PR Reviewer Guide 🔍(Review updated until commit b99a1e4)Here are some key observations to aid the review process:
|
4c7c0f6 to
1433e61
Compare
|
Persistent review updated to latest commit 1433e61 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #21748 +/- ##
============================================
- Coverage 73.48% 73.39% -0.10%
+ Complexity 75078 75011 -67
============================================
Files 6012 6012
Lines 340940 340940
Branches 49076 49076
============================================
- Hits 250543 250228 -315
- Misses 70409 70803 +394
+ Partials 19988 19909 -79 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
On the analytics-engine route, the SQL plugin wraps every datetime root column in `CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR)`, and this rewriter translates those casts into DataFusion's `to_char` extension. Whenever the rewriter's format string and the PPL formatter disagree (e.g. trailing `Z`, `T` separator), users see wire-format divergence — opensearch-project/sql#5420. Let the analytics engine return real datetime cells. The companion PR in `opensearch-project/sql` removes the cast rule. The PPL response pipeline already handles datetime → string conversion natively at the formatter layer (`ExprTimestampValue.value()` etc.), so no engine-side formatting is needed. - Delete `DatetimeOutputCastRewriter` and its tests. - Remove the two `convertFragment` / `convertStandalone` callsites in `DataFusionFragmentConvertor`. - Drop the test that asserted `to_char` extension was emitted from `CAST(... VARCHAR)`. - Strip stale doc comments referencing the rewriter. - Keep the `TO_CHAR -> to_char` function mapping in `opensearch_scalar_functions.yaml` for any unrelated paths that may still emit `TO_CHAR` directly. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
1433e61 to
c4c1995
Compare
|
Persistent review updated to latest commit c4c1995 |
|
Persistent review updated to latest commit 859a5db |
|
❌ Gradle check result for 859a5db: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit b99a1e4 |
|
❌ Gradle check result for b99a1e4: TIMEOUT Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Problem
On the analytics-engine route, the SQL plugin wraps every datetime root column in
CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR), andDatetimeOutputCastRewritertranslates those casts into DataFusion'sto_charextension. Whenever the rewriter's format string and the PPL formatter disagree (e.g. trailingZ,Tseparator), users see wire-format divergence — opensearch-project/sql#5420.Solution
Let the analytics engine return real datetime cells. The companion PR opensearch-project/sql#5454 removes the cast rule. The PPL response pipeline already handles datetime → string conversion natively at the formatter layer (
ExprTimestampValue.value()etc.), so no engine-side formatting is needed.Changes
DatetimeOutputCastRewriterand its tests.convertFragment/convertStandalonecallsites inDataFusionFragmentConvertor.to_charextension was emitted fromCAST(... VARCHAR).TO_CHAR -> to_charfunction mapping inopensearch_scalar_functions.yamlfor any unrelated paths that may still emitTO_CHARdirectly.Verification
1. Unit tests
:sandbox:plugins:analytics-backend-datafusion:test— surviving rewriter tests still pass; the deleted rewriter's test class is removed cleanly (no orphan references).:sandbox:plugins:analytics-backend-datafusion:compileJava :compileTestJava— confirms no callsite still referencesDatetimeOutputCastRewriter.:sandbox:plugins:analytics-backend-datafusion:spotlessJavaCheck.2. Companion-PR regression net
End-to-end behavior is asserted on the SQL plugin side (opensearch-project/sql#5454) — that side is where the user-visible wire format is observable:
DatetimeExtensionTest(PPL V3) +DatetimeExtensionSqlTest(SQL V2): RelNode shape + return-type assertions confirm noCAST(... VARCHAR)wrapper survives post-analysis, andtestDatetimeFieldsPreserveStandardTypesasserts JDBCResultSetMetaDatareportsDATE/TIME/TIMESTAMP(neverVARCHAR) end-to-end.CalciteAnalyticsDatetimeWireFormatIT: parquet-backed force-routed AE integration test asserting schema labelstimestamp/date/time(neverstring) and PPL space-separated values (neverTseparator, never trailingZ).3. End-to-end curl wire-format sweep
Driver:
/tmp/ae-curl-verify/run-verify.pyagainst a force-routed runTask cluster with this PR's plugin built into the sandbox bundle._explainconfirms parquet index → AE route (LogicalTableScan+ lowercaseopensearch); legacy probe index → legacy route (CalciteLogicalIndexScan+ capitalOpenSearch). Routing is binary, no Calcite-fallback.to_char/CAST(... VARCHAR)in_explainof any datetime projection — confirms the rewriter is gone end-to-end (not just the SQL-plugin cast rule).min/max/stats,>filter,sort, nanosecond precision,dc().where+fields,head+fields,stats min/max,group by date,dc(d), multi-column projection sorted byts, nanos>filter.All 18 probes return
timestamp/date/timeschema labels with space-separated values; no ISOT, no trailingZ, nostringlabels.Test plan
:sandbox:plugins:analytics-backend-datafusion:compileJava :compileTestJava:sandbox:plugins:analytics-backend-datafusion:spotlessJavaCheck:sandbox:plugins:analytics-backend-datafusion:testCompanion PR: opensearch-project/sql#5454 (Remove DatetimeOutputCastRule)