Skip to content

Conversation

@PeterFarber
Copy link
Collaborator

Problem

  • Multi-node routes could hang indefinitely because every peer was invoked with the same HTTP options; slow or dead endpoints (e.g., http://google.com/) blocked the chain and the relay never advanced.
  • dev_router:preprocess/3 flattened multi-node peer metadata, so downstream processors and relays couldn’t replay the original list of candidate nodes or rewrite their URIs consistently.
  • hb_http ignored {error, Reason} tuples returned by the HTTP client, making it impossible for callers to detect failures and attempt the next peer.

Solution

  • Capture the recomposed method (TargetMod5) and pass it—along with node-specific HTTP options/timeouts—to a new relay_nodes_in_order/6. Each peer is tried with its own http-timeout, enforced via relay_request_with_timeout/2, so slow peers are abandoned and the relay advances deterministically.
  • Preserve the <<"nodes">> list (and normalized URIs) inside dev_router:preprocess/3, ensuring multi-node peers remain intact when the request is re-dispatched through [email protected].
  • Update hb_http to propagate errors from hb_http_client, allowing callers to treat client failures just like HTTP-level failures.

Changes

  • src/dev_relay.erl
    • Use hb_maps:get(<<"method">>, TargetMod5, ...) for outbound requests and add helpers (relay_nodes_in_order/6, relay_request_with_timeout/2, peer_http_opts/3, peer_timeout/3) to merge per-node options, enforce timeouts, and document the behavior.
    • Extend relay_failover_test/0 with explicit per-node <<"http-timeout">> settings (10 s for Google, 2 s for the invalid host, 5 s for the local peer) and explanatory comments.
  • src/dev_router.erl
    • Preserve the multi-node <<"nodes">> structure when preprocessing requests, normalizing each node’s URI to point at user-path instead of collapsing to a single peer.
  • src/hb_http.erl
    • Capture the HTTP client result in Res = hb_http_client:request(...) and add a process_response/7 clause that handles {error, Reason} by returning {error, {http_request_failed, Reason}}.

This ensures multi-node relays respect per-peer timeouts, keep the full peer list intact, and bubble up HTTP client failures so the next candidate is tried automatically.

@samcamwilliams
Copy link
Collaborator

On first read this looks somewhat viable, but it is repeating all of the logic from hb_http_multi. Any particular reason for that?

At worst, we should make an explicit call to hb_http_multi (rather than hb_http) if needed, but ideally hb_http would figure this out on its own and route appropriately. IIRC, the hb_gateway_client already uses this functionality, so dev_relay should be able to, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants