Skip to content

Releases: threefoldtech/zos_rmb

v1.5.1

29 Sep 12:10
8d9a0ee

Choose a tag to compare

What's Changed

  • Fix: gracefully close websocket writer on cancellation to prevent half-closed sockets by @sameh-farouk in #234
  • ci: add override flag to rust toolchain installation in release workflow by @0oM4R in #236

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0

21 Sep 16:01
4a44655

Choose a tag to compare

Release Notes

Version 1.5.0 represents a major architectural overhaul of the RMB Relay, focused on delivering production-grade reliability, significant performance enhancements, and deep observability. The entire federation system and Substrate client have been re-engineered to be self-healing and highly resilient, while the core networking stack has been modernized for future performance gains.


Architectural & Reliability Overhaul

  • Federation on Redis Streams: The inter-relay federation mechanism has been completely rebuilt on Redis Streams and consumer groups. This provides at-least-once delivery semantics and automatic recovery of failed messages, ensuring no messages are lost during transit.
  • Resilient Substrate Client: The client interacting with the TFChain is now highly resilient to network failures. It uses arc-swap for lock-free reconnects and a "single-flight" pattern to prevent "thundering herd" scenarios on cache misses or reconnect attempts.
  • Self-Healing Event Listener: The EventListener is now a robust, self-healing service. It includes a watchdog for stall detection, an intelligent catch-up mechanism to recover from disconnects without full resets, and an indefinite retry loop for maximum uptime.

Performance & Efficiency

  • Prioritized Local Routing: Messages are now routed to locally connected peers before attempting federation. This significantly reduces latency for local traffic and lowers the load on external services.
  • Batched Message ACKs: WebSocket message acknowledgments are now batched, drastically reducing Redis round-trips under heavy load and improving overall throughput.
  • Fair Worker Load Balancing: New connections and registrations are now distributed fairly across all available workers, preventing a single worker from being overloaded during connection bursts.
  • Intelligent Redis Pool Sizing: The Redis connection pool is now sized dynamically based on consumer count plus a dedicated headroom, preventing pool exhaustion and ensuring stability under load.

New Features & API Changes

  • Fast-Fail Mode: A new fail-fast option allows clients to receive an immediate error response if a message's destination is known to be offline, perfect for time-sensitive operations that shouldn't wait for a timeout.
  • Deprecation of relays Field: The client-supplied relays field in the message envelope is now deprecated. Routing is now determined by authoritative chain data. While the field is ignored, backward compatibility is maintained to ensure older clients do not break.

Networking & Modernization

  • Hyper 1.0 Upgrade: The entire HTTP stack has been upgraded from Hyper 0.14 to the modern Hyper 1.0 API and its ecosystem of crates (reqwest, hyper-util). This brings significant performance improvements and aligns the project with the latest industry standards.

Observability

  • Expanded Prometheus Metrics: Comprehensive Prometheus metrics have been added to provide deep insight into the relay's health, including detailed metrics for the twin cache (hits, misses, flushes) and the Event Listener's operational status.

Full Changelog: v1.3.4...v1.5.0

v1.5-pre4

28 Aug 15:21

Choose a tag to compare

v1.5-pre4 Pre-release
Pre-release

Full Changelog: v1.5-pre2...v1.5-pre4

  • Registrations are now distributed fairly across workers

v1.5-pre2

25 Aug 15:31

Choose a tag to compare

v1.5-pre2 Pre-release
Pre-release

v1.5-pre2

This release focuses on hardening the relay's core components, introducing a highly resilient Substrate client, improving federation performance, and modernizing the entire HTTP stack by upgrading to Hyper 1.0.

Highlights

  • Resilient Substrate Client: The client for interacting with the TFChain has been completely rewritten to be highly resilient against network failures.
    • It now uses arc-swap for lock-free, atomic client replacement, allowing for seamless reconnects without blocking other tasks.
    • A "singleflight" pattern has been implemented to prevent multiple concurrent reconnect attempts during a network outage.
    • Another singleflight pattern de-duplicates concurrent cache misses for the same twin, reducing load on the chain.
  • Robust Event Listener: The EventListener is now a fully self-healing service. It features a watchdog to detect stalled connections, a robust catch-up mechanism, and an indefinite retry loop to ensure it can recover from prolonged disconnections.

Performance & Efficiency

  • Federation on Redis Streams: The inter-relay federation mechanism has been rebuilt on top of Redis Streams and consumer groups. This replaces the old list-based queue, providing a more reliable and persistent message queue with automatic recovery of failed messages.
  • Smarter Redis Pool Sizing: The Redis connection pool size is now calculated more intelligently based on the number of consumers plus a dedicated headroom for operational commands, preventing pool exhaustion under heavy load.
  • Batched ACKs: WebSocket message acknowledgments are now batched, significantly reducing the number of round-trips to Redis under load.

Dependencies

  • Hyper 1.0 Upgrade: The entire HTTP stack has been migrated from hyper 0.14 to the modern Hyper 1.0 API, along with key ecosystem crates like reqwest and hyper-tungstenite. This brings significant performance improvements and aligns the project with the latest standards.
  • General Updates: Dozens of other dependencies have been updated to their latest versions, incorporating numerous bug fixes, security patches, and performance enhancements from across the Rust ecosystem.

✨ Code Quality & Refinements

  • The Protobuf schema has been updated to mark legacy fields as deprecated.
  • Comprehensive Prometheus metrics have been added for the twin cache (hits, misses, flushes, entries) to improve observability.

Full Changelog: v1.5-pre1...v1.5-pre2

v1.5-pre1

20 Aug 09:25

Choose a tag to compare

v1.5-pre1 Pre-release
Pre-release

rmb-rs v1.5-pre1 Release Notes

Compare: v1.4-pre3...v1.5-pre1

Overview

This release delivers faster local message delivery, a more reliable and parallel federation pipeline, and major improvements to event listener latency, reconnection robustness, and observability. A previously used field is deprecated but remains backward compatible.

Highlights

  • Feature: Prefer local message routing before federation.
  • Feature: Fast-fail mode for time-sensitive requests.
  • Feature: Federation rewrite with at-least-once delivery and retries.
  • Fix: More reliable relay cache with reduced RPC dependency.
  • Observability: New metrics, stall detection, clearer separation of concerns.
  • Deprecation: Client-supplied relays field no longer influences routing.

Fast Message Routing

  • Local sessions are now prioritized for delivery before falling back to federation.
  • Improves performance and resilience (especially when the relay is reachable across multiple domains).
  • Reduces reliance on external services for routine message delivery.

Event Listener Improvements and Fixes

  • Twin update latency reduced from ~12 seconds to almost real-time (a few milliseconds/network latency).
  • Faster recovery after outages with reduced dependence on external RPC.
  • Avoids silent hangs through stall detection.
  • Cleaner separation of responsibilities for easier maintenance and troubleshooting.
  • New metrics expand operational visibility.

Federation Rewrite

  • At-least-once delivery: messages are acknowledged explicitly to prevent loss.
  • Automatic retries: unacknowledged messages remain pending and are reprocessed until they expire.
  • High parallelism: workers fetch messages directly and in batches, eliminating bottlenecks.

Fast-Fail Mode

  • Clients can opt into immediate failure for time-critical operations when the destination is offline.

Deprecations

  • The “relays” field is deprecated:
    • Discovery is determined by authoritative sources rather than client hints.
    • Reduces the risk of clients steering caching behavior.
    • Parsing/serialization remains backward compatible; SDKs warn on use.

Operational Notes

  • Expect improved local delivery performance and lower load on external services.
  • Federation is more resilient with explicit acknowledgments and periodic reprocessing.
  • Additional metrics and stall detection simplify troubleshooting and performance tuning.

v1.4-pre3

22 Apr 12:46
020c891

Choose a tag to compare

Better internal message switching (#213)

* Run cargo update + fix compiler warnings

* Minor improvments:

- Add todos that need to be processed
- Rename some structs
- Handle callback errors (drop the connection from worker connections subset)

* Add TwinID type

* Remove some un-necessary async_trait usage

* Worker rework

- Separate worker to its own strcture
- Use cancellation token to track if either of reader or writer routines exited
  and make sure connection is removed from global sessions set

* Increase connection channel size

* Use semaphore to track capacity

Also run cargo clippy to clean up all warnings

* WIP: Better worker loop

* Implement back pressure for slow clients

* Add some tests

* Fix formatting

v1.3.4

16 Apr 09:12
4fc8b9d

Choose a tag to compare

update ubuntu image for the release workflow (#218)

v1.4-pre1

15 Apr 11:15
834b82b

Choose a tag to compare

Increase connection channel size

v1.3.3

11 Dec 14:13
0ef17e7

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.3.2...v1.3.3

v1.3.2

29 Oct 16:09
v1.3.2

Choose a tag to compare