Skip to content

KAFKA-12759: Rebalance when a static member's subscription changes (classic protocol)#22593

Open
danoSF wants to merge 1 commit into
apache:trunkfrom
danoSF:KAFKA-12759-static-membership-subscription-rebalance
Open

KAFKA-12759: Rebalance when a static member's subscription changes (classic protocol)#22593
danoSF wants to merge 1 commit into
apache:trunkfrom
danoSF:KAFKA-12759-static-membership-subscription-rebalance

Conversation

@danoSF

@danoSF danoSF commented Jun 16, 2026

Copy link
Copy Markdown

Under the classic rebalance protocol with static membership, a static member that rejoins during the Stable state with the same selected protocol but a different set of subscribed topics did not trigger a rebalance. The newly subscribed topics (for example those discovered by a topics.regex member) were therefore never assigned to the group until the whole group was bounced for longer than session.timeout.ms.

This is a consistency gap rather than a new capability. Every other classic-group join path that can change a member's subscription already rebalances: dynamic and known-member rejoins go through classicGroupJoinExistingMember, which rebalances whenever !member.matches(request.protocols()) (a byte equality of the embedded protocol metadata, so any subscription change trips it), and the classic-to-consumer upgrade bridge already bumps the group epoch via hasMemberSubscriptionChanged. The static-member replace path (updateStaticMemberThenRebalanceOrCompleteJoin) was the sole exception: to avoid needless rebalances when a static instance simply reconnects with a new member id, it short-circuited on the selected protocol name alone and never performed the metadata comparison the other paths rely on.

This patch closes the gap by also comparing the rejoining member's own previous and new subscribed topics. The subscription is parsed with the version-safe V0 prefix already used by ClassicGroup#computeSubscribedTopics (the coordinator has parsed the embedded classic subscription this way for years, e.g. for offset expiration), so no new parsing capability is introduced. Only the subscribed topics are compared; owned partitions and user data do not require a new assignment and must not trigger a rebalance. When the consumer protocol cannot be parsed the coordinator falls back to its protocol-based behaviour, mirroring computeSubscribedTopics. The previous subscription is captured before the member's protocols are replaced.

Tests: GroupMetadataManagerTest covers a subscription that grows and one that shrinks (rebalance), an owned-partitions-only change and an unchanged subscription across protocol versions (no rebalance), and a multi-member static group asserting the decision is based on the rejoining member's own subscription rather than the group-wide union. JoinGroupRequestTest adds an end-to-end test reproducing the reported scenario against a broker: a static member that rejoins with an additional topic triggers a rebalance.

…lassic protocol)

Under the classic rebalance protocol with static membership, a static
member that rejoins during the Stable state with the same selected
protocol but a different set of subscribed topics did not trigger a
rebalance. The newly subscribed topics (for example those discovered by
a topics.regex member) were therefore never assigned to the group until
the whole group was bounced for longer than session.timeout.ms.

This is a consistency gap rather than a new capability. Every other
classic-group join path that can change a member's subscription already
rebalances: dynamic and known-member rejoins go through
classicGroupJoinExistingMember, which rebalances whenever
!member.matches(request.protocols()) (a byte equality of the embedded
protocol metadata, so any subscription change trips it), and the
classic-to-consumer upgrade bridge already bumps the group epoch via
hasMemberSubscriptionChanged. The static-member replace path
(updateStaticMemberThenRebalanceOrCompleteJoin) was the sole exception:
to avoid needless rebalances when a static instance simply reconnects
with a new member id, it short-circuited on the selected protocol name
alone and never performed the metadata comparison the other paths rely
on.

This patch closes the gap by also comparing the rejoining member's own
previous and new subscribed topics. The subscription is parsed with the
version-safe V0 prefix already used by ClassicGroup#computeSubscribedTopics
(the coordinator has parsed the embedded classic subscription this way
for years, e.g. for offset expiration), so no new parsing capability is
introduced. Only the subscribed topics are compared; owned partitions
and user data do not require a new assignment and must not trigger a
rebalance. When the consumer protocol cannot be parsed the coordinator
falls back to its protocol-based behaviour, mirroring
computeSubscribedTopics. The previous subscription is captured before the
member's protocols are replaced.

Tests: GroupMetadataManagerTest covers a subscription that grows and one
that shrinks (rebalance), an owned-partitions-only change and an
unchanged subscription across protocol versions (no rebalance), and a
multi-member static group asserting the decision is based on the
rejoining member's own subscription rather than the group-wide union.
JoinGroupRequestTest adds an end-to-end test reproducing the reported
scenario against a broker: a static member that rejoins with an
additional topic triggers a rebalance.
@github-actions github-actions Bot added triage PRs from the community core Kafka Broker group-coordinator labels Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved core Kafka Broker group-coordinator triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants