Harden HTTP relay failover with sequential routing and per-node timeouts #556

PeterFarber · 2025-11-17T15:42:01Z

Problem

Multi-node routes could hang indefinitely because every peer was invoked with the same HTTP options; slow or dead endpoints (e.g., http://google.com/) blocked the chain and the relay never advanced.
dev_router:preprocess/3 flattened multi-node peer metadata, so downstream processors and relays couldn’t replay the original list of candidate nodes or rewrite their URIs consistently.
hb_http ignored {error, Reason} tuples returned by the HTTP client, making it impossible for callers to detect failures and attempt the next peer.

Solution

Capture the recomposed method (TargetMod5) and pass it—along with node-specific HTTP options/timeouts—to a new relay_nodes_in_order/6. Each peer is tried with its own http-timeout, enforced via relay_request_with_timeout/2, so slow peers are abandoned and the relay advances deterministically.
Preserve the <<"nodes">> list (and normalized URIs) inside dev_router:preprocess/3, ensuring multi-node peers remain intact when the request is re-dispatched through [email protected].
Update hb_http to propagate errors from hb_http_client, allowing callers to treat client failures just like HTTP-level failures.

Changes

src/dev_relay.erl
- Use hb_maps:get(<<"method">>, TargetMod5, ...) for outbound requests and add helpers (relay_nodes_in_order/6, relay_request_with_timeout/2, peer_http_opts/3, peer_timeout/3) to merge per-node options, enforce timeouts, and document the behavior.
- Extend relay_failover_test/0 with explicit per-node <<"http-timeout">> settings (10 s for Google, 2 s for the invalid host, 5 s for the local peer) and explanatory comments.
src/dev_router.erl
- Preserve the multi-node <<"nodes">> structure when preprocessing requests, normalizing each node’s URI to point at user-path instead of collapsing to a single peer.
src/hb_http.erl
- Capture the HTTP client result in Res = hb_http_client:request(...) and add a process_response/7 clause that handles {error, Reason} by returning {error, {http_request_failed, Reason}}.

This ensures multi-node relays respect per-peer timeouts, keep the full peer list intact, and bubble up HTTP client failures so the next candidate is tried automatically.

samcamwilliams · 2025-11-19T00:33:08Z

On first read this looks somewhat viable, but it is repeating all of the logic from hb_http_multi. Any particular reason for that?

At worst, we should make an explicit call to hb_http_multi (rather than hb_http) if needed, but ideally hb_http would figure this out on its own and route appropriately. IIRC, the hb_gateway_client already uses this functionality, so dev_relay should be able to, too.

…plicit methods

samcamwilliams and others added 4 commits October 31, 2025 14:49

impr: catch error returns from HTTP client

0298777

chore: add HTTP reouting failover test

c7cf1d6

wip: router processor support for multi-node peer results

19da361

fix: relay multi-node failover and add per-node HTTP timeouts

ac74189

chore: route dev_relay through hb_http multi_dispatch and preserve ex…

44a4d08

…plicit methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden HTTP relay failover with sequential routing and per-node timeouts #556

Harden HTTP relay failover with sequential routing and per-node timeouts #556

PeterFarber commented Nov 17, 2025

Uh oh!

samcamwilliams commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Harden HTTP relay failover with sequential routing and per-node timeouts #556

Are you sure you want to change the base?

Harden HTTP relay failover with sequential routing and per-node timeouts #556

Conversation

PeterFarber commented Nov 17, 2025

Problem

Solution

Changes

Uh oh!

samcamwilliams commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants