Skip to content

Task Stuck in WAITING State During LanceDB Write #115

@Zoxiang

Description

@Zoxiang

Description:
We encountered a task hang in our Spark 3.5.6 job when writing data to LanceDB using lance-spark 0.0.14 on Linux. The task remains stuck in a WAITING (parking) state with no progress for an extended period (elapsed time ~7639 seconds). The hang occurs during a critical data export step in the LanceDB writer flow.

Relevant Log Snippet:
Additional Context:The hang occurs during the execution of Data.exportArrayStream(allocator, reader, arrowStream) within the LanceDB writer flow (likely invoked by LanceArrowWriter.write() at line 68). This suggests a potential blockage during Arrow data stream export.

Questions:

  1. Could glibc-compat on Alpine introduce inconsistencies in semaphore operations (e.g., Semaphore.acquire()) or thread parking (Unsafe.park()) that lead to the observed blockage in LanceArrowWriter?

  2. Are there known issues with lance-spark 0.0.14 or Arrow (used by LanceDB) when running on Alpine with glibc-compat, particularly around Data.exportArrayStream?

  3. Does Spark 3.5.6’s data source V2 execution logic interact differently with semaphores or native libraries on glibc-compat-enabled Alpine systems?

  4. What debugging steps (e.g., testing on a pure glibc OS like Ubuntu to rule out Alpine-specific issues, enabling JVM native trace logs, or inspecting glibc-compat version compatibility) would help isolate the root cause?

Any insights or guidance on resolving this hang would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions