Skip to content

Reconnection bug#92

Open
appliedfunctor wants to merge 5 commits into
masterfrom
reconnection-bug
Open

Reconnection bug#92
appliedfunctor wants to merge 5 commits into
masterfrom
reconnection-bug

Conversation

@appliedfunctor

@appliedfunctor appliedfunctor commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Root Cause
The bug was in Fs2RabbitAmqpClient.registerConsumer in the v4 fs2-rabbit backend.

The old code ran the consumer stream in .background and silently discarded the outcome:

consumer.compile.drain
  .background
  .flatMap { _ =>   // ← outcome ignored — errors swallowed
    Resource.onFinalize(...)
  }

When a network blip caused the AMQP channel to close, the fs2-rabbit stream terminated with an error. That error was never observed, and no restart was attempted.

v3 didn't have this problem because it used DefaultConsumer on a Java AutorecoveringChannel, which the AMQP Java client automatically re-registers after recovery — no application-level retry needed. v4 switched to an fs2 stream-based consumer which has no such built-in recovery.

A secondary bug: handler exceptions propagated via Async[F].fromEither(res) and also killed the stream instead of using exceptionalAction.

Fix

  1. Connection recovery — wrapped the consumer loop in a recursive handleErrorWith retry (consumerWithRecovery) that sleeps for config.networkRecoveryInterval then creates a fresh channel and consumer on each attempt. A fresh channel is critical because the auto-recovered channel's dead internal fs2-rabbit queue would otherwise accumulate un-acked messages.

  2. Handler exceptions — now caught and resolved via exceptionalAction (logged + the action applied) rather than crashing the stream.

  3. config: AmqpClientConfig added to the class so the recovery delay tracks networkRecoveryInterval.

Tests
Three regression tests added in ConsumerConnectionRecoverySpec:

  • Demonstrates the old bug (consumer silently stops after failure)
  • Demonstrates the fix (consumer recovers and resumes processing)
  • Verifies handler exceptions don't kill the stream

appliedfunctor and others added 3 commits June 26, 2026 09:52
The v4 fs2-rabbit backend ran the consumer stream in a .background
fiber and silently ignored the fiber outcome (.flatMap { _ => ... }).
When a network blip caused the underlying channel to close, the stream
terminated with an error that was swallowed and the consumer was never
restarted.

The v3 Java backend used DefaultConsumer on an AutorecoveringChannel;
the Java AMQP client transparently re-registers those consumers after
connection recovery.  The fs2-rabbit backend had no equivalent
mechanism.

Two bugs are fixed:

1. Connection recovery: registerConsumer now wraps runConsumer in a
   recursive handleErrorWith retry loop (consumerWithRecovery) that
   sleeps for config.networkRecoveryInterval then creates a fresh
   channel and re-registers the consumer.  A fresh channel is used on
   each retry to avoid the dead internal fs2-rabbit queue that the
   auto-recovered channel would otherwise accumulate messages into.

2. Handler exceptions: previously Async[F].fromEither(res) propagated
   handler errors through the stream and killed it.  Now handler
   exceptions are logged and resolved via exceptionalAction, keeping
   the stream alive.

Also adds config: AmqpClientConfig to Fs2RabbitAmqpClient so the
recovery delay can be driven by networkRecoveryInterval, and
StrictLogging for recovery/error logging.

Regression tests added in ConsumerConnectionRecoverySpec.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a '4.0.3 and above' section explaining the bug (consumer stream
silently dying on connection drop with no restart), which backend is
affected (Fs2RabbitAmqpClient only, not JavaBackendAmqpClient), how
it was fixed, and directs users to upgrade to v4.0.3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restores sonatypeCentralRelease (reverts accidental change to
sonatypeCentralUpload that was included in the connection recovery
commit). sonatypeCentralRelease uploads and automatically promotes
the artifact; sonatypeCentralUpload only uploads and requires a
manual release step.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@snyk-io-eu

snyk-io-eu Bot commented Jun 26, 2026

Copy link
Copy Markdown

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a reconnection failure mode in the fs2-rabbit backend (Fs2RabbitAmqpClient.registerConsumer) where consumer-stream failures could be swallowed and never restarted after an AMQP network blip, plus prevents handler exceptions from terminating the consumer stream.

Changes:

  • Add an error-recovering consumer loop that recreates the channel/consumer and retries after networkRecoveryInterval.
  • Convert handler exceptions into exceptionalAction rather than propagating them through the stream.
  • Add README migration note and regression tests describing the bug/fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
README.md Documents the v4.0.0–4.0.2 fs2-rabbit reconnection bug and upgrade guidance for 4.0.3.
backendFs2Rabbit/src/main/scala/com/itv/bucky/backend/fs2rabbit/Fs2RabbitAmqpClient.scala Implements retry-based recovery with fresh channel creation and handler-exception handling.
backendFs2Rabbit/src/test/scala/fs2rabbit/ConsumerConnectionRecoverySpec.scala Adds regression tests for consumer recovery and handler-exception behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to +90
"not kill the consumer stream when a handler throws an exception" in {
for {
processedCount <- IO.ref(0)
attempt <- IO.ref(0)

// First call simulates a handler exception; second call succeeds.
// With the fix the stream restarts and processes messages.
runConsumer: IO[Unit] = attempt.updateAndGet(_ + 1).flatMap {
case 1 => IO.raiseError(new RuntimeException("Handler blew up"))
case _ => processedCount.update(_ + 1)
}

_ <- consumerWithRecovery(runConsumer, delay = 50.millis)

count <- processedCount.get
} yield count shouldBe 1
}
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Extract acquireConsumerStream as a protected override point in
Fs2RabbitAmqpClient so tests can inject controlled in-memory streams
without a real RabbitMQ connection.

Tests now subclass Fs2RabbitAmqpClient and override
acquireConsumerStream to return a sequence of (acker, stream) pairs
from a Ref, exercising the real registerConsumer implementation:

- retry loop (consumerWithRecovery) recovers after a stream failure
- handler exceptions are caught and routed to exceptionalAction
  rather than killing the stream
- Ack/DeadLetter/RequeueImmediately map to the correct AckResult

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants