Skip to content

Commit 8cc07d1

Browse files
authored
fix: set finish before notifying load in LanceArrowWriter setFinished method (#92)
Recently I was testing the integration between Lance and Spark. I found that the Lance writer has a certain probability of hanging. After some troubleshooting, I discovered this is related to the LanceArrowWriter.setFinished method. The original code appears to have a bug where it sets the finished status after notifying loadNextBatch, which could cause loadNextBatch to hang. **Root Cause** The ideal flow should be: 1. (thread 1) loadToken.release 2. (thread 1) finished = true 3. (thread 2) loadNextBatch 4. (thread 2) finished is true and count is 0 so return false However, there's a chance it becomes: 1. (thread 1) loadToken.release 2. (thread 2) loadNextBatch 3. (thread 2) finished is false so return true and waiting 4. (thread 1) finished = false If the second scenario occurs, thread 2 will hang indefinitely and cannot receive new notifications. jstack will show stacks hanging in LanceDataWriter.commit. **Reproduction** This issue is hard to reproduce. I encountered it in a very low-resource environment (Spark executor with only 1 core 4g) when creating a new table and writing 600 rows of data at once, where one column is a 1024-dimensional vector column. It also occurs intermittently. **Further Confirmation** Although the current fix seems reasonable, I hope to get confirmation from maintainers to avoid introducing new unknown issues. Any comments about this. @jackye1995
1 parent 9c57adc commit 8cc07d1

File tree

2 files changed

+2
-5
lines changed

2 files changed

+2
-5
lines changed

lance-spark-base_2.12/src/main/java/com/lancedb/lance/spark/write/LanceArrowWriter.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ public class LanceArrowWriter extends ArrowReader {
3434
private final int batchSize;
3535

3636
@GuardedBy("monitor")
37-
private volatile boolean finished;
37+
private volatile boolean finished = false;
3838

3939
private final AtomicLong totalBytesRead = new AtomicLong();
4040
private com.lancedb.lance.spark.arrow.LanceArrowWriter arrowWriter = null;
@@ -70,8 +70,8 @@ void write(InternalRow row) {
7070
}
7171

7272
void setFinished() {
73-
loadToken.release();
7473
finished = true;
74+
loadToken.release();
7575
}
7676

7777
@Override

lance-spark-base_2.12/src/test/java/com/lancedb/lance/spark/write/LanceArrowWriterTest.java

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@
2626
import org.apache.spark.sql.types.DataTypes;
2727
import org.apache.spark.sql.types.StructField;
2828
import org.apache.spark.sql.types.StructType;
29-
import org.junit.jupiter.api.Disabled;
3029
import org.junit.jupiter.api.Test;
3130

3231
import java.util.Collections;
@@ -35,8 +34,6 @@
3534

3635
import static org.junit.jupiter.api.Assertions.assertEquals;
3736

38-
// TODO: Flaky in CI, needs to fix
39-
@Disabled
4037
public class LanceArrowWriterTest {
4138
@Test
4239
public void test() throws Exception {

0 commit comments

Comments
 (0)