-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Description:
We encountered a task hang in our Spark 3.5.6 job when writing data to LanceDB using lance-spark 0.0.14 on Linux. The task remains stuck in a WAITING (parking) state with no progress for an extended period (elapsed time ~7639 seconds). The hang occurs during a critical data export step in the LanceDB writer flow.
Relevant Log Snippet:
Additional Context:The hang occurs during the execution of Data.exportArrayStream(allocator, reader, arrowStream) within the LanceDB writer flow (likely invoked by LanceArrowWriter.write() at line 68). This suggests a potential blockage during Arrow data stream export.
Questions:
-
Could glibc-compat on Alpine introduce inconsistencies in semaphore operations (e.g., Semaphore.acquire()) or thread parking (Unsafe.park()) that lead to the observed blockage in LanceArrowWriter?
-
Are there known issues with lance-spark 0.0.14 or Arrow (used by LanceDB) when running on Alpine with glibc-compat, particularly around Data.exportArrayStream?
-
Does Spark 3.5.6’s data source V2 execution logic interact differently with semaphores or native libraries on glibc-compat-enabled Alpine systems?
-
What debugging steps (e.g., testing on a pure glibc OS like Ubuntu to rule out Alpine-specific issues, enabling JVM native trace logs, or inspecting glibc-compat version compatibility) would help isolate the root cause?
Any insights or guidance on resolving this hang would be greatly appreciated!