Skip to content

Conversation

@parth-soni07
Copy link
Contributor

What was wrong?

Issue #735

How was it fixed?

Prioritizes first available relay with reservations instead of returning obvious first one.

@parth-soni07 parth-soni07 force-pushed the feat/improve-relay-selection branch from ca5b919 to 36f73ae Compare October 3, 2025 07:11
@parth-soni07 parth-soni07 force-pushed the feat/improve-relay-selection branch from 56414bf to 91b3d3b Compare October 12, 2025 10:59
@paschal533
Copy link
Contributor

Hi @parth-soni07 Thanks for tackling this issue. I really appreciate you diving into the relay selection logic. Overall this is a solid first pass and I like the direction you're taking. The test coverage is especially impressive 👏

I do have a few concerns I'd like you to address

The relay_counter isn't protected against concurrent access. Since _select_relay() can be called from multiple places simultaneously, we could have race conditions. Let's wrap it with a lock:

self._relay_counter = 0
self._relay_counter_lock = trio.Lock()

Then use it like:

async with self._relay_counter_lock:
    self._relay_counter += 1
    index = (self._relay_counter - 1) % len(relays_with_reservations)

Right now the counter increments every time we call _select_relay(), even if we don't find any relays. This means the retry loop will keep bumping the counter, leading to uneven distribution. Could you move the increment to only happen when we successfully return a relay?

Calling get_relay_info() in a loop for every selection might get expensive if we have lots of relays. Not a blocker for now, but worth thinking about, maybe we could maintain a cached list of relays with reservations? Just food for thought.

Could you mention the round-robin aspect more explicitly in the newsfragment? Something like:

Implemented round-robin load balancing for CircuitV2 relay selection, prioritizing relays with active reservations for more reliable and evenly distributed relay usage.

Again, really solid work here, the test suite especially shows you're thinking through the edge cases carefully. Once we address the thread safety and counter issues, this should be good to go. Well done @parth-soni07 keep it up with the good work

@paschal533
Copy link
Contributor

Hi @parth-soni07, Thanks for working on this, I can see you are putting good effort into it.
I do have some feedback though. While I appreciate the round-robin approach as an improvement over "first available", I think we should actually aim higher here since we're already making changes.

Round-robin distributes load evenly, but it doesn't actually help users get the best relay for their connection. A relay could be geographically far away, overloaded, or having network issues, and round-robin would still send traffic to it equally.
Instead, I'd like you to consider implementing a performance-based selection strategy using either:

  1. Least response time - Track how long it takes to establish connections through each relay
  2. Least connections - Track active circuit counts per relay
  3. Hybrid approach - Combine both metrics with reservation priority

This would give us better user experience, natural load balancing, self-healing, and still prioritizes reserved relays when performance is similar

I'm happy to pair with you on this if you need any help

CCing @seetadev, @pacrob and @acul71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants