-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers #1415
fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers #1415
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1415 +/- ##
============================================
+ Coverage 56.12% 57.84% +1.72%
- Complexity 976 985 +9
============================================
Files 119 122 +3
Lines 11743 12130 +387
Branches 2251 2285 +34
============================================
+ Hits 6591 7017 +426
+ Misses 4012 3938 -74
- Partials 1140 1175 +35 ☔ View full report in Codecov by Sentry. |
super.test(testName, testTags: _*)(withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "") { | ||
testFun | ||
})(pos) | ||
// Datasource V2 is not supported by complex readers so force the scan impl back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be consistent and call them native readers. We will likely use these for more than just complex types at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the comment
I've updated the pr and removed the schema adapter changes. The number of failures has gone up but they should be taken care of by #1413 |
@kazuyukitanimura @andygrove @huaxingao @comphead review requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @parthchandra
@@ -260,7 +260,8 @@ public void init() throws URISyntaxException, IOException { | |||
} ////// End get requested schema | |||
|
|||
String timeZoneId = conf.get("spark.sql.session.timeZone"); | |||
Schema arrowSchema = Utils$.MODULE$.toArrowSchema(sparkSchema, timeZoneId); | |||
// Native code uses "UTC" always as the timeZoneId when converting from spark to arrow schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -260,7 +260,8 @@ public void init() throws URISyntaxException, IOException { | |||
} ////// End get requested schema | |||
|
|||
String timeZoneId = conf.get("spark.sql.session.timeZone"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering should we always use UTC then and remove reading the tz from spark parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need be using the session timezone parameter for at lest some conversions (especially those involving timezone_ntz). The conversions of timestamp/timestamp_ntz between Arrow and Spark are somewhat convoluted and sometimes require the timezone offset to be applied to the values to make them consistent.
Safer to make sure the timezone parameter is passed to the native side so it can be applied when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @parthchandra
spark/src/main/scala/org/apache/comet/parquet/CometParquetPartitionReaderFactory.scala
Show resolved
Hide resolved
spark/src/test/scala/org/apache/comet/exec/CometColumnarShuffleSuite.scala
Show resolved
Hide resolved
@@ -1001,7 +1012,7 @@ abstract class ParquetReadSuite extends CometTestBase { | |||
Seq(StructField("_1", LongType, false), StructField("_2", DoubleType, false))) | |||
|
|||
withParquetDataFrame(data, schema = Some(readSchema)) { df => | |||
if (enableSchemaEvolution) { | |||
if (enableSchemaEvolution || usingDataFusionParquetExec(conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm does this mean that usingDataFusionParquetExec==true
we always use schema evolution regardless COMET_SCHEMA_EVOLUTION_ENABLED
This means it will be incompatible with Spark 3.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it may be incompatible. We will address this as we get to the point where we are able to run Spark tests with different versions of Spark. At the moment we are still clearing up the Comet test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we need a TODO comment or an issue ticket to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…ceberg_compat readers major changes : - allow Uint64 to decimal and FixedWidthBinary to Binary conversions in complex readers - do not enable prefetch reads in tests if complex reader is enabled - fix more incompatible checks in uint_8/uint_16 tests - skip datetime rebase tests for complex readers (not supported)
c6ccb59
to
4d2dd2d
Compare
@kazuyukitanimura addressed your last comment and also rebased (there were merge conflicts).
|
merged thank you @parthchandra @comphead @andygrove @mbutrovich |
major changes :
There may be conflicts between this and #1413 (@mbutrovich) which removes the
cast_supported
method but can be reconciled afterwardsWithout #1413 the failure counts are: