Skip to content

KAFKA-20491: Add uncommitted bytes limit#22597

Open
nicktelford wants to merge 2 commits into
apache:trunkfrom
nicktelford:KIP-892/txn-uncommitted-bytes
Open

KAFKA-20491: Add uncommitted bytes limit#22597
nicktelford wants to merge 2 commits into
apache:trunkfrom
nicktelford:KIP-892/txn-uncommitted-bytes

Conversation

@nicktelford

@nicktelford nicktelford commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Introduces the statestore.uncommitted.max.bytes config, which triggers
an early commit when the total uncommitted bytes across all
transactional state stores on a thread exceeds the configured limit,
regardless of commit.interval.ms. This bounds the memory consumed by
pending write buffers under high write throughput.

The limit is divided equally across StreamThreads and the
GlobalStreamThread. Each thread independently enforces its share:
StreamThreads trigger an early task commit via maybeCommit(); the
GlobalStreamThread flushes at the end of each pollAndUpdate() cycle.

StateStore gains a default approximateNumUncommittedBytes() method
(returning 0); adapter and wrapper classes delegate to the inner store.
Metered key-value stores register an uncommitted-bytes gauge. Segmented
and versioned stores aggregate across their underlying segments.

The per-thread limit is recomputed whenever stream threads are
dynamically added or removed via
addStreamThread()/removeStreamThread(), mirroring the existing
thread-cache resize logic. The per-thread limit field is volatile so
running threads pick up the new value on their next commit check without
synchronisation.

This is part of
KIP-892.

Reviewers: Bill Bejeck bbejeck@apache.org

@github-actions github-actions Bot added triage PRs from the community streams labels Jun 17, 2026
@nicktelford

Copy link
Copy Markdown
Contributor Author

@bbejeck

}
stateConsumer.pollAndUpdate();

final long uncommittedLimit = maxUncommittedBytes;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to preempt some feedback here and clarify why we copy maxUncommittedBytes into a local uncommittedLimit variable here.

The reason is that maxUncommittedBytes can be modified by another thread (the main thread) via resizeMaxUncommittedBytes below, when adding/removing StreamThreads to the running application. So we copy its value to a local temporary variable to ensure the value remains consistent across the 3 usages below.

if (totalMax <= 0) {
return -1;
}
final int divisor = Math.max(numStreamThreads, 0) + (topologyMetadata.hasGlobalTopology() ? 1 : 0);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may seem strange to floor numStreamThreads at 0, since StreamsConfig.NUM_STREAM_THREADS has a minimum of 1, but it's actually possible for there to be 0 stream threads... If you removeStreamThread the last remaining stream thread; or if you replaceStreamThread, there's a narrow window where the old thread has been removed and the new one has not registered yet.

@bbejeck

bbejeck commented Jun 17, 2026

Copy link
Copy Markdown
Member

@nicktelford there's a compile error

@nicktelford

Copy link
Copy Markdown
Contributor Author

@nicktelford there's a compile error

@bbejeck Sorry about that. Should be good now!

@bbejeck

bbejeck commented Jun 17, 2026

Copy link
Copy Markdown
Member

@nicktelford there's a compile error

@bbejeck Sorry about that. Should be good now!

No worries, thanks!

@github-actions github-actions Bot removed the triage PRs from the community label Jun 18, 2026
Introduces the `statestore.uncommitted.max.bytes` config, which
triggers an early commit when the total uncommitted bytes across all
transactional state stores on a thread exceeds the limit, regardless
of `commit.interval.ms`. This bounds the memory consumed by pending
write buffers under high write throughput.

The limit is divided equally across StreamThreads and the
GlobalStreamThread. Each thread independently enforces its share:
StreamThreads trigger an early task commit via maybeCommit(); the
GlobalStreamThread flushes at the end of each pollAndUpdate() cycle.

StateStore gains a default approximateNumUncommittedBytes() method
(returning 0); adapter and wrapper classes delegate to the inner
store. Metered key-value stores register an uncommitted-bytes gauge.
Segmented and versioned stores aggregate across their underlying
segments.

The per-thread limit is recomputed whenever stream threads are
dynamically added or removed via addStreamThread()/removeStreamThread(),
mirroring the existing thread-cache resize logic. The per-thread limit
field is volatile so running threads pick up the new value on their
next commit check without synchronisation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@nicktelford nicktelford force-pushed the KIP-892/txn-uncommitted-bytes branch from 9af58e4 to 8a02381 Compare June 18, 2026 10:16
@nicktelford

Copy link
Copy Markdown
Contributor Author

@bbejeck There was a test failure; I have fixed it and rebased on latest trunk.

@bbejeck bbejeck left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicktelford - LGTM just have a couple of minor nits otherwise ready for merge

}
stateConsumer.pollAndUpdate();

final long uncommittedLimit = maxUncommittedBytes;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that but I'm wondering if we used AtomicLong would offer any benefit over the current approach?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The crucial thing here is that the following few lines use a consistent value, so we would still need to copy it out of the AtomicLong into a temporary variable. maxUncommittedBytes is already marked volatile, so updates to it from other threads should be immediately published to the GlobalStreamThread (same goes for maxUncommittedBytesPerThread in StreamThread).

Comment thread streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants