Skip to content

native engine panics Execution overflow #1893

@howardli9175

Description

@howardli9175
  1. Environment
    6 worker node YARN cluster, x86 architecture, each node with 64 cores and 500GB memory.
    Hadoop 3.2.2
    Spark 3.5.4
    Blaze 5.0.0

  2. how to reproduce
    Running TPC-DS benchmark, 10TB dataset, Parquet + ZSTD compression.

spark.executor.cores=1
spark.executor.memory=16g
spark.executor.memoryOverhead=16g
spark.driver.cores=1
spark.driver.memory=20g
spark.blaze.enable true
spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension
spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager
spark.memory.offHeap.enabled false

Queries q24a and q24b failed.
The error message is as shown in the figure.
The failure can be reproduced every time. The failed stage has 200 tasks, with 164 succeeded and 36 failed.

  1. other scenario where the queries succeed
    On 10TB dataset, without Blaze enabled, the queries succeed.
    On the 1TB dataset, with Blaze enabled, the queries also succeed .

Image
Image

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions