Skip to content

Conversation

@JohnnyWyles
Copy link
Contributor

@JohnnyWyles JohnnyWyles commented Nov 14, 2025

What is the purpose of the change:

The frontend only used the first RPC/REST endpoint from chain asset lists. When this primary endpoint failed or became slow, the application experienced:

  • ❌ Complete service disruption during endpoint failures
  • ❌ Failed deposit/withdraw operations due to balance/fee check failures
  • ❌ Lost WebSocket connections with no automatic recovery
  • ❌ Poor user experience during network issues or RPC outages
  • ❌ No automatic failover to alternative endpoints

Impact: Users were left unable to perform transactions when primary endpoints experienced issues, despite multiple backup endpoints being available in the chain configuration.

Linear Task

https://linear.app/osmosis/issue/FE-1550/endpoint-failover-for-ibc-transfers

Brief Changelog

Implemented comprehensive multi-endpoint support with automatic retry and failover across all RPC/REST operations:

  1. New MultiEndpointClient utility - Reusable HTTP client with automatic endpoint rotation
  2. Enhanced createNodeQuery - All server queries now try all available endpoints
  3. Updated queryRPCStatus - RPC status queries with multi-endpoint support
  4. IBC bridge improvements - Transfer time estimation uses all RPC endpoints
  5. Block polling resilience - Continuous polling even during RPC failures
  6. WebSocket auto-reconnect - Maintains IBC status tracking with endpoint failover

Testing and Verifying

This change has been tested locally by rebuilding the website and verified content and links are expected

JohnnyWyles and others added 6 commits November 12, 2025 14:22
Implements a reusable HTTP client that supports:
- Multiple endpoints with automatic failover
- Per-endpoint retry with exponential backoff (100ms, 200ms, 400ms...)
- Priority-based endpoint selection
- Remembers last successful endpoint
- Configurable timeout and max retries

This utility will be used to enhance RPC/REST calls across the codebase
to handle endpoint failures gracefully.

Key features:
- Gracefully handles AbortSignal.timeout availability (Node 17.3+)
- Sorts endpoints by priority (higher priority tried first)
- Provides getCurrentEndpoint() to check which endpoint is active
- Comprehensive error messages when all endpoints fail

Includes comprehensive test coverage (11 tests):
- Single/multiple endpoint support
- Retry and fallback behavior
- Priority sorting
- Error handling
- Endpoint memory (remembers last successful)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Enhances createNodeQuery to use all available REST endpoints from chain
asset lists instead of only the first one.

Changes:
- Iterates through all chain.apis.rest endpoints
- Retries each endpoint with exponential backoff before moving to next
- Adds optional maxRetries (default: 3) and timeout (default: 5000ms) params
- Maintains backward compatibility - existing code works unchanged
- Gracefully handles AbortSignal.timeout availability

Endpoint source:
The REST endpoints come from the chain's asset list (osmosis-labs/assetlists).
Each chain can have multiple REST endpoints for redundancy. This function will:
1. Try each endpoint in order from the chain.apis.rest array
2. Retry each endpoint up to maxRetries times with exponential backoff
3. Move to the next endpoint if all retries fail
4. Throw an error only if all endpoints have been exhausted

Benefits ALL queries using createNodeQuery:
- Balance queries (cosmos/bank/balances.ts)
- Fee estimation (osmosis/txfees/*.ts)
- Transaction simulation (cosmos/tx/simulate.ts)
- Staking queries (cosmos/staking/validators.ts)
- Governance queries (cosmos/governance/proposals.ts)

Test coverage:
- Updated 5 existing tests for backward compatibility
- Added 5 new tests for retry/fallback behavior
- All 10 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Updates queryRPCStatus to accept either single endpoint (legacy) or
multiple endpoints with automatic retry/fallback.

Changes:
- New API: queryRPCStatus({ rpcUrls: string[] })
- Legacy API still works: queryRPCStatus({ restUrl: string })
- Uses MultiEndpointClient for automatic failover
- Maintains backward compatibility

This improves resilience for:
- IBC transfer time estimation
- Block height polling
- Chain status checks

Example usage:

  // Old (still works)
  await queryRPCStatus({ restUrl: "https://rpc.osmosis.zone" })

  // New (automatic failover)
  await queryRPCStatus({
    rpcUrls: [
      "https://rpc.osmosis.zone",
      "https://osmosis-rpc.polkachu.com",
      "https://rpc-osmosis.blockapsis.com"
    ]
  })

Implementation details:
- Detects which API is being used via "rpcUrls" in params
- Creates MultiEndpointClient with 3 retries and 5s timeout
- Handles both standard and non-standard RPC response formats

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…tion

Updates IBC bridge provider to pass all RPC endpoints from chain config
to queryRPCStatus instead of only the first one.

Changes:
- Maps all chain.apis.rpc endpoints to array
- Passes rpcUrls array to queryRPCStatus for automatic failover
- Updates error messages to be more descriptive

Before:
  const fromRpc = fromChain?.apis.rpc[0]?.address;
  const toRpc = toChain?.apis.rpc[0]?.address;
  await queryRPCStatus({ restUrl: fromRpc })
  await queryRPCStatus({ restUrl: toRpc })

After:
  const fromRpcUrls = fromChain?.apis.rpc.map(rpc => rpc.address);
  const toRpcUrls = toChain?.apis.rpc.map(rpc => rpc.address);
  await queryRPCStatus({ rpcUrls: fromRpcUrls })
  await queryRPCStatus({ rpcUrls: toRpcUrls })

Impact:
- IBC transfer time estimates no longer fail if primary RPC is down
- Automatically tries all available RPC endpoints with retry logic
- Better user experience during network issues
- More accurate transfer time estimates with increased reliability

Location: estimateTransferTime() method at packages/bridge/src/ibc/index.ts:315

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Enhances PollingStatusSubscription to accept single or multiple RPC URLs
with automatic failover during block polling.

Changes:
- Constructor now accepts string | string[] for rpc parameter
- Converts single string to array internally for consistent handling
- Uses new queryRPCStatus multi-endpoint API when multiple URLs provided
- Maintains backward compatibility with single URL
- Added validation to ensure at least one URL is provided

Benefits:
- Block polling continues even if primary RPC fails
- Automatic failover to alternative endpoints
- More resilient IBC timeout tracking
- Better user experience during network issues

Example usage:

  // Old (still works)
  new PollingStatusSubscription("https://rpc.osmosis.zone")

  // New (automatic failover)
  new PollingStatusSubscription([
    "https://rpc.osmosis.zone",
    "https://osmosis-rpc.polkachu.com"
  ])

Implementation details:
- Stores URLs in protected readonly rpcUrls array
- Detects single vs multiple URLs and calls appropriate queryRPCStatus API
- Enhances error logging to show number of endpoints being used

Location: packages/tx/src/poll-status.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…to TxTracer

Enhances TxTracer with comprehensive WebSocket failover logic to maintain
IBC transfer status tracking during network issues.

Changes:
- Constructor now accepts string | string[] for url parameter
- Automatic reconnection with exponential backoff (1s, 2s, 4s, 8s max)
- Tries each endpoint multiple times before moving to next
- After exhausting all endpoints, waits 10s before cycling back
- Preserves all subscriptions across reconnections
- Prevents reconnection on manual close
- Adds comprehensive logging for debugging

Reconnection flow:
1. Try endpoint 0 (maxReconnectAttempts times with backoff)
2. If all fail, try endpoint 1 (maxReconnectAttempts times)
3. If all fail, try endpoint 2 (maxReconnectAttempts times)
4. After all endpoints exhausted, wait 10s and cycle back to endpoint 0

New state management:
- urls: readonly string[] - Array of WebSocket URLs
- currentUrlIndex: number - Tracks which endpoint is active
- reconnectAttempts: number - Counts retry attempts for current endpoint
- maxReconnectAttempts: number - Configurable (default: 3)
- isManualClose: boolean - Prevents auto-reconnect on user close

Event handlers:
- onOpen: Resets reconnect counter, re-subscribes all handlers
- onClose: Triggers reconnect logic unless manual close
- onError: Logs error and lets onClose handle reconnection

Benefits:
- IBC transfer status tracking continues during RPC issues
- Automatic recovery without user intervention
- Prevents lost WebSocket subscriptions
- Better visibility with console logging

Example usage:

  // Old (still works)
  new TxTracer("https://rpc.osmosis.zone")

  // New (automatic failover)
  new TxTracer([
    "https://rpc.osmosis.zone",
    "https://osmosis-rpc.polkachu.com"
  ], "/websocket", { maxReconnectAttempts: 5 })

Location: packages/tx/src/tracer.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@vercel
Copy link

vercel bot commented Nov 14, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
osmosis-frontend Error Error Nov 14, 2025 11:32am
4 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
osmosis-frontend-datadog Ignored Ignored Nov 14, 2025 11:32am
osmosis-frontend-dev Ignored Ignored Nov 14, 2025 11:32am
osmosis-frontend-edgenet Ignored Ignored Nov 14, 2025 11:32am
osmosis-testnet Ignored Ignored Nov 14, 2025 11:32am

@JohnnyWyles
Copy link
Contributor Author

Assetlist currently doesn't contain multiple endpoints, this is to be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants