-
Notifications
You must be signed in to change notification settings - Fork 246
chore: extract comparison into separate tool #2632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| case (a: Array[_], b: Array[_]) => | ||
| a.length == b.length && a.zip(b).forall(x => same(x._1, x._2)) | ||
| case (a: WrappedArray[_], b: WrappedArray[_]) => | ||
| case (a: mutable.WrappedArray[_], b: mutable.WrappedArray[_]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it from #2614
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2632 +/- ##
============================================
+ Coverage 56.12% 59.16% +3.03%
- Complexity 976 1436 +460
============================================
Files 119 147 +28
Lines 11743 13735 +1992
Branches 2251 2356 +105
============================================
+ Hits 6591 8126 +1535
- Misses 4012 4386 +374
- Partials 1140 1223 +83 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I don't think that we should have a combined fuzz-testing-and-tpc-benchmark tool. They serve quite different purposes. I think it would be better to move the DataFrame comparison logic into a shared class somewhere and then update our benchmarking tool to be able to use it. This probably means that we need to convert our benchmark script from Python to Scala. |
Another option would be to update the existing Python benchmark script to save query results to Parquet, and then implement a command-line tool for comparing the Parquet files produced from the Spark and Comet runs. |
I created #2640 to add a new option to the benchmark script, to write query results to Parquet. |
Right, this option looks better IMO so we can have a command line utility similar to fuzzer and reuse comparison logic. We still need this PR in some way as it has some refactoring to reuse comparison |
Which issue does this PR close?
Related #2614 #2611 .
Rationale for this change
Extract comparison to separate tool to run against already generated Comet and Spark results
What changes are included in this PR?
How are these changes tested?