Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SortBuffer ensureOutputFits estimateOutputSize inaccurate #11534

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Nov 14, 2024

The output batch reserved size should be rowSize * numRows, missed numRows before.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2024
Copy link

netlify bot commented Nov 14, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6232d5a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6735736e140d8400082a8fb9

@Yuhta Yuhta requested a review from xiaoxmeng November 14, 2024 15:43
@FelixYBW
Copy link
Contributor

Issue is not fixed:

Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: OrderBy, plan node ID: 1]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 4.0 MiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
Current config settings: 
	spark.gluten.memory.offHeap.size.in.bytes=8.3 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=8.3 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=4.2 GiB
	spark.memory.offHeap.enabled=true
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats: 
	Task.139909:                                              Current used bytes:  8.3 GiB, peak bytes:        N/A
	\- Gluten.Tree.61:                                        Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	   \- root.61:                                            Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      +- NativePlanEvaluator-61.0:                        Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      |  \- single:                                       Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      |     +- root:                                      Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      |     |  +- task.Gluten_Stage_2_TID_139909_VTID_61: Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      |     |  |  +- node.1:                              Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      |     |  |  |  \- op.1.0.0.OrderBy:                 Current used bytes:  8.3 GiB, peak bytes:    8.3 GiB
	      |     |  |  +- node.2:                              Current used bytes: 96.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.Window:                  Current used bytes: 96.0 KiB, peak bytes:   96.0 KiB
	      |     |  |  +- node.3:                              Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     |  |  |  \- op.3.0.0.FilterProject:           Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     |  |  \- node.0:                              Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:             Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                           Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                   Current used bytes:    0.0 B, peak bytes:      0.0 B
	      +- ArrowContextInstance.6:                          Current used bytes:  8.0 MiB, peak bytes:    8.0 MiB
	      +- IteratorMetrics.61:                              Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |  \- single:                                       Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     +- root:                                      Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                           Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                   Current used bytes:    0.0 B, peak bytes:      0.0 B
	      +- IndicatorVectorBase#init.61.OverAcquire.0:       Current used bytes:    0.0 B, peak bytes:      0.0 B
	      +- ShuffleReader.3.OverAcquire.0:                   Current used bytes:    0.0 B, peak bytes:      0.0 B
	      +- IndicatorVectorBase#init.61:                     Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |  \- single:                                       Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     +- root:                                      Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                           Current used bytes:    0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                   Current used bytes:    0.0 B, peak bytes:      0.0 B
	      +- IteratorMetrics.61.OverAcquire.0:                Current used bytes:    0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-61.0.OverAcquire.0:          Current used bytes:    0.0 B, peak bytes:      0.0 B
	      \- ShuffleReader.3:                                 Current used bytes:    0.0 B, peak bytes:   16.0 MiB
	         \- single:                                       Current used bytes:    0.0 B, peak bytes:   16.0 MiB
	            +- root:                                      Current used bytes:    0.0 B, peak bytes: 1024.0 KiB
	            |  \- default_leaf:                           Current used bytes:    0.0 B, peak bytes:  579.8 KiB
	            \- gluten::MemoryAllocator:                   Current used bytes:    0.0 B, peak bytes:  401.3 KiB

	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
	at org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at scala.collection.Iterator.isEmpty(Iterator.scala:385)
	at scala.collection.Iterator.isEmpty$(Iterator.scala:385)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.isEmpty(IteratorsV1.scala:90)
	at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:121)
	at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:77)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:949)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:949)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1471)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

@FelixYBW
Copy link
Contributor

@JkSelf @zhztheplayer

Does the fix make sense? Where the 1.2 comes from?

const uint64_t outputBufferSizeToReserve = estimatedOutputRowSize_.value() * 1.2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants