OAK-11154: Read partial segments from SegmentWriter #1746

Nicolapps · 2024-09-27T09:40:36Z

This pull request modifies the SegmentWriter interface in oak-segment-tar to add the possibility of reading the state of a segment currently being written to, as described in OAK-11154.

Closes OAK-11154

Why?

oak-segment-tar writes new segments using an implementation of SegmentWriter.

Since segments are immutable, the state of a segment that hasn’t been flushed yet isn’t visible outside of the SegmentWriter instance. However, in some cases, code using SegmentWriter might want to access the partial segment data.

Currently, the only possible way to do it is to call flush, which will force the segment to be flushed right away, and then get the full segment from the underlying segment store. This is bad for performance, because we need to do more flushes that necessary, and because there’s a risk of creating a lot of segments that have a size much smaller than MAX_SEGMENT_SIZE.

To avoid this, I suggest that we add a readPartialSegmentState method to SegmentWriter, which takes the segment ID of an unflushed segment and returns it if possible.

Backwards-compatibility

This change is backwards-compatible with existing users of SegmentWriter (because they’re not using the new method). The new method comes with a default implementation which throws an UnsupportedOperationException.

Concurrency

Previously, the class was marked as not thread-safe, which made sense since it was only expected that a single writer thread uses it at the same time (concurrent calls wouldn’t have made sense since the order in which prepare and writeXYZ methods are called matters).

One major change with SegmentBufferWriter is that its readPartialSegmentState method can now be called concurrently with the other methods in the same class. To support this, we now use synchronized on the methods that are accessible publicly. This shouldn’t cause a drop in performance, because most calls to the class are on the writer thread (so not concurrent between themselves), and it is expected from readPartialSegmentState to be called rarely (compared to the other methods).

I could confirm that there is no noticeable drop in performance by running the write benchmarks without and with the change, and observed no difference:

Without synchronized

# ConcurrentWriteReadTest          C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1       1       5      11      61    258    2535      24
# ConcurrentWriteTest              C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      29      31      36      58    622    1373      44
# BasicWriteTest                   C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      14      15      16      19    320    3448      17

With synchronized

# ConcurrentWriteReadTest          C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1       1       4      12      58    553    2531      24
# ConcurrentWriteTest              C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      29      31      35      65    461    1319      46
# BasicWriteTest                   C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      14      15      16      19     96    3444      17

Testing

The PR adds a new test, readPartialSegmentState, which covers the implementation of the method in SegmentBufferWriter.

OAK-11154: Read partial segments from SegmentWriter

a090aab

Nicolapps force-pushed the oak-11154-partial-segments branch from 67c2790 to a090aab Compare September 27, 2024 09:42

Nicolapps marked this pull request as ready for review September 27, 2024 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAK-11154: Read partial segments from SegmentWriter #1746

OAK-11154: Read partial segments from SegmentWriter #1746

Nicolapps commented Sep 27, 2024

OAK-11154: Read partial segments from SegmentWriter #1746

Are you sure you want to change the base?

OAK-11154: Read partial segments from SegmentWriter #1746

Conversation

Nicolapps commented Sep 27, 2024

Why?

Backwards-compatibility

Concurrency

Testing