OAK-11154: Read partial segments from SegmentWriter #1746
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request modifies the
SegmentWriter
interface in oak-segment-tar to add the possibility of reading the state of a segment currently being written to, as described in OAK-11154.Closes OAK-11154
Why?
oak-segment-tar writes new segments using an implementation of
SegmentWriter
.Since segments are immutable, the state of a segment that hasn’t been flushed yet isn’t visible outside of the
SegmentWriter
instance. However, in some cases, code usingSegmentWriter
might want to access the partial segment data.Currently, the only possible way to do it is to call
flush
, which will force the segment to be flushed right away, and then get the full segment from the underlying segment store. This is bad for performance, because we need to do more flushes that necessary, and because there’s a risk of creating a lot of segments that have a size much smaller thanMAX_SEGMENT_SIZE
.To avoid this, I suggest that we add a
readPartialSegmentState
method toSegmentWriter
, which takes the segment ID of an unflushed segment and returns it if possible.Backwards-compatibility
This change is backwards-compatible with existing users of
SegmentWriter
(because they’re not using the new method). The new method comes with a default implementation which throws anUnsupportedOperationException
.Concurrency
Previously, the class was marked as not thread-safe, which made sense since it was only expected that a single writer thread uses it at the same time (concurrent calls wouldn’t have made sense since the order in which
prepare
andwriteXYZ
methods are called matters).One major change with
SegmentBufferWriter
is that itsreadPartialSegmentState
method can now be called concurrently with the other methods in the same class. To support this, we now usesynchronized
on the methods that are accessible publicly. This shouldn’t cause a drop in performance, because most calls to the class are on the writer thread (so not concurrent between themselves), and it is expected fromreadPartialSegmentState
to be called rarely (compared to the other methods).I could confirm that there is no noticeable drop in performance by running the write benchmarks without and with the change, and observed no difference:
Without
synchronized
With
synchronized
Testing
The PR adds a new test,
readPartialSegmentState
, which covers the implementation of the method inSegmentBufferWriter
.