-
Notifications
You must be signed in to change notification settings - Fork 446
feat: Add Multi-Endpoint Support with Automatic Retry and Failover #4225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
JohnnyWyles
wants to merge
6
commits into
stage
Choose a base branch
from
jw/fallbackrpcs
base: stage
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implements a reusable HTTP client that supports: - Multiple endpoints with automatic failover - Per-endpoint retry with exponential backoff (100ms, 200ms, 400ms...) - Priority-based endpoint selection - Remembers last successful endpoint - Configurable timeout and max retries This utility will be used to enhance RPC/REST calls across the codebase to handle endpoint failures gracefully. Key features: - Gracefully handles AbortSignal.timeout availability (Node 17.3+) - Sorts endpoints by priority (higher priority tried first) - Provides getCurrentEndpoint() to check which endpoint is active - Comprehensive error messages when all endpoints fail Includes comprehensive test coverage (11 tests): - Single/multiple endpoint support - Retry and fallback behavior - Priority sorting - Error handling - Endpoint memory (remembers last successful) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Enhances createNodeQuery to use all available REST endpoints from chain asset lists instead of only the first one. Changes: - Iterates through all chain.apis.rest endpoints - Retries each endpoint with exponential backoff before moving to next - Adds optional maxRetries (default: 3) and timeout (default: 5000ms) params - Maintains backward compatibility - existing code works unchanged - Gracefully handles AbortSignal.timeout availability Endpoint source: The REST endpoints come from the chain's asset list (osmosis-labs/assetlists). Each chain can have multiple REST endpoints for redundancy. This function will: 1. Try each endpoint in order from the chain.apis.rest array 2. Retry each endpoint up to maxRetries times with exponential backoff 3. Move to the next endpoint if all retries fail 4. Throw an error only if all endpoints have been exhausted Benefits ALL queries using createNodeQuery: - Balance queries (cosmos/bank/balances.ts) - Fee estimation (osmosis/txfees/*.ts) - Transaction simulation (cosmos/tx/simulate.ts) - Staking queries (cosmos/staking/validators.ts) - Governance queries (cosmos/governance/proposals.ts) Test coverage: - Updated 5 existing tests for backward compatibility - Added 5 new tests for retry/fallback behavior - All 10 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updates queryRPCStatus to accept either single endpoint (legacy) or
multiple endpoints with automatic retry/fallback.
Changes:
- New API: queryRPCStatus({ rpcUrls: string[] })
- Legacy API still works: queryRPCStatus({ restUrl: string })
- Uses MultiEndpointClient for automatic failover
- Maintains backward compatibility
This improves resilience for:
- IBC transfer time estimation
- Block height polling
- Chain status checks
Example usage:
// Old (still works)
await queryRPCStatus({ restUrl: "https://rpc.osmosis.zone" })
// New (automatic failover)
await queryRPCStatus({
rpcUrls: [
"https://rpc.osmosis.zone",
"https://osmosis-rpc.polkachu.com",
"https://rpc-osmosis.blockapsis.com"
]
})
Implementation details:
- Detects which API is being used via "rpcUrls" in params
- Creates MultiEndpointClient with 3 retries and 5s timeout
- Handles both standard and non-standard RPC response formats
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
…tion
Updates IBC bridge provider to pass all RPC endpoints from chain config
to queryRPCStatus instead of only the first one.
Changes:
- Maps all chain.apis.rpc endpoints to array
- Passes rpcUrls array to queryRPCStatus for automatic failover
- Updates error messages to be more descriptive
Before:
const fromRpc = fromChain?.apis.rpc[0]?.address;
const toRpc = toChain?.apis.rpc[0]?.address;
await queryRPCStatus({ restUrl: fromRpc })
await queryRPCStatus({ restUrl: toRpc })
After:
const fromRpcUrls = fromChain?.apis.rpc.map(rpc => rpc.address);
const toRpcUrls = toChain?.apis.rpc.map(rpc => rpc.address);
await queryRPCStatus({ rpcUrls: fromRpcUrls })
await queryRPCStatus({ rpcUrls: toRpcUrls })
Impact:
- IBC transfer time estimates no longer fail if primary RPC is down
- Automatically tries all available RPC endpoints with retry logic
- Better user experience during network issues
- More accurate transfer time estimates with increased reliability
Location: estimateTransferTime() method at packages/bridge/src/ibc/index.ts:315
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Enhances PollingStatusSubscription to accept single or multiple RPC URLs
with automatic failover during block polling.
Changes:
- Constructor now accepts string | string[] for rpc parameter
- Converts single string to array internally for consistent handling
- Uses new queryRPCStatus multi-endpoint API when multiple URLs provided
- Maintains backward compatibility with single URL
- Added validation to ensure at least one URL is provided
Benefits:
- Block polling continues even if primary RPC fails
- Automatic failover to alternative endpoints
- More resilient IBC timeout tracking
- Better user experience during network issues
Example usage:
// Old (still works)
new PollingStatusSubscription("https://rpc.osmosis.zone")
// New (automatic failover)
new PollingStatusSubscription([
"https://rpc.osmosis.zone",
"https://osmosis-rpc.polkachu.com"
])
Implementation details:
- Stores URLs in protected readonly rpcUrls array
- Detects single vs multiple URLs and calls appropriate queryRPCStatus API
- Enhances error logging to show number of endpoints being used
Location: packages/tx/src/poll-status.ts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
…to TxTracer
Enhances TxTracer with comprehensive WebSocket failover logic to maintain
IBC transfer status tracking during network issues.
Changes:
- Constructor now accepts string | string[] for url parameter
- Automatic reconnection with exponential backoff (1s, 2s, 4s, 8s max)
- Tries each endpoint multiple times before moving to next
- After exhausting all endpoints, waits 10s before cycling back
- Preserves all subscriptions across reconnections
- Prevents reconnection on manual close
- Adds comprehensive logging for debugging
Reconnection flow:
1. Try endpoint 0 (maxReconnectAttempts times with backoff)
2. If all fail, try endpoint 1 (maxReconnectAttempts times)
3. If all fail, try endpoint 2 (maxReconnectAttempts times)
4. After all endpoints exhausted, wait 10s and cycle back to endpoint 0
New state management:
- urls: readonly string[] - Array of WebSocket URLs
- currentUrlIndex: number - Tracks which endpoint is active
- reconnectAttempts: number - Counts retry attempts for current endpoint
- maxReconnectAttempts: number - Configurable (default: 3)
- isManualClose: boolean - Prevents auto-reconnect on user close
Event handlers:
- onOpen: Resets reconnect counter, re-subscribes all handlers
- onClose: Triggers reconnect logic unless manual close
- onError: Logs error and lets onClose handle reconnection
Benefits:
- IBC transfer status tracking continues during RPC issues
- Automatic recovery without user intervention
- Prevents lost WebSocket subscriptions
- Better visibility with console logging
Example usage:
// Old (still works)
new TxTracer("https://rpc.osmosis.zone")
// New (automatic failover)
new TxTracer([
"https://rpc.osmosis.zone",
"https://osmosis-rpc.polkachu.com"
], "/websocket", { maxReconnectAttempts: 5 })
Location: packages/tx/src/tracer.ts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
4 Skipped Deployments
|
Contributor
Author
|
Assetlist currently doesn't contain multiple endpoints, this is to be added. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change:
The frontend only used the first RPC/REST endpoint from chain asset lists. When this primary endpoint failed or became slow, the application experienced:
Impact: Users were left unable to perform transactions when primary endpoints experienced issues, despite multiple backup endpoints being available in the chain configuration.
Linear Task
https://linear.app/osmosis/issue/FE-1550/endpoint-failover-for-ibc-transfers
Brief Changelog
Implemented comprehensive multi-endpoint support with automatic retry and failover across all RPC/REST operations:
Testing and Verifying
This change has been tested locally by rebuilding the website and verified content and links are expected