You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[CARBONDATA-1373] Enhance update performance by increasing parallelism
+ Increase parallelism while processing one segment in update
+ Use partitionBy instead of groupby
+ Return directly for no-rows-update case
+ Add a property to configure the parallelism
+ Clean up local files after update (previous bugs)
This closesapache#1261
Copy file name to clipboardexpand all lines: docs/configuration-parameters.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,7 @@ This section provides the details of all the configurations required for CarbonD
74
74
| carbon.horizontal.compaction.enable | true | This property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold. ||
75
75
| carbon.horizontal.UPDATE.compaction.threshold | 1 | This property specifies the threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and compacted into single UPDATE delta file. | Values between 1 to 10000. |
76
76
| carbon.horizontal.DELETE.compaction.threshold | 1 | This property specifies the threshold limit on number of DELETE delta files within a block of a segment. In case the number of delta files goes beyond the threshold, the DELETE delta files for the particular block of the segment becomes eligible for horizontal compaction and compacted into single DELETE delta file. | Values between 1 to 10000. |
77
-
77
+
| carbon.update.segment.parallelism | 1 | This property specifies the parallelism for each segment during update. If there are segments that contain too many records to update and the spark job encounter data-spill related errors, it is better to increase this property value. It is recommended to set this value to a multiple of the number of executors for balance. | Values between 1 to 1000. |
Copy file name to clipboardexpand all lines: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala
+24
Original file line number
Diff line number
Diff line change
@@ -114,6 +114,30 @@ class UpdateCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
114
114
sql("""drop table if exists iud.dest33""")
115
115
}
116
116
117
+
test("update carbon table with optimized parallelism for segment") {
118
+
sql("""drop table if exists iud.dest_opt_segment_parallelism""")
0 commit comments