-
Notifications
You must be signed in to change notification settings - Fork 634
CASSGO-72 Connection trouble with Amazon Keyspaces #1873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've seen this issue on other projects, basically if you use aws keyspaces with peering then system.local will return 127.0.0.1 and system.peers will return all the correct IPs (including the local one I think?). I'm not sure if there's any driver option you can explore to work around this issue, it's possible that this will require some code changes on gocql in order to support aws keyspaces with peering |
You can play around with |
Thank you for the information. I will look into |
I checked the code and found that I will try |
I'm trying
On startup, it had 3 connections, including 2 connections to 172.16.1.28, for some reason. After the error I'm not sure if this is a normal behavior or not, but it's working. By the way, I came up with another work around. It might be a better solution to add a HostFilter in cluster config to ignore 127.0.0.1:
So far it's working fine after refreshRing, I will keep it for a while longer. |
I think that could occur if the same hostID appears twice in the peers table, or the local host is in the peers table. Can you paste the output for |
Thank you for telling me that. You are right. I read the code again and now I understand what happened.
Both 127.0.0.1 and 172.16.1.28 have the same hostID. |
Yeah AWS Keyspaces basically returns the full list of hosts (including the local host but with the correct IP address) in the system.peers table when peering is enabled which unfortunately is a behavior that is not consistent with "regular" C* so some drivers have an issue with it. |
Do we need to handle this? We could separate the local from the peers and ignore a duplicate host_id when refreshing the ring. |
I would really appreciate it if GoCQL handle this issue. If there is anything I can help, please let me know. |
Yeah we could do this as long as there's no impact to users not using AWS keyspaces. It should target gocql 2.1.0 though since we're trying to wrap up 2.0 |
Thank you. I'm looking forward to the release. If the release schedule is planed, could you please tell me approximately when 2.1.0 will be released? |
No ETA on 2.1.0 for now. Also no guarantee that this will go in 2.1.0, someone needs to volunteer to open a PR for this but let's start with creating a JIRA, I'll do this. |
OK, so I will try to fix this issue. Do I need to do something in JIRA? Is it OK to just send a PR on GitHub? At first, I'd like to confirm how to handle this. If I understand correctly, the same hostIDs and localhost address are handled properly on initializing connections (maybe NewSession() in session.go), but not on reconnection process (maybe refreshRing() in host_source.go). So now, I need to fix the reconnection process. Is that right? |
I think inside @joao-r-reis you might have more experience with how other drivers handle this. Thoughts? |
Most drivers use maps with ip addresses as keys but java driver 4.x moved to using host ids as the keys. Usually the behavior is to go through the new list of GoCQL has both maps but it uses the map by host id for these checks.
Reading through |
Oh I think the issue is that the control connection queries system.local and updates that host's entry in the map when it connects/reconnects... Other drivers (except java driver 4.x I think) just trigger a full ring refresh when the control connection connects/reconnects and I think we should do this on gocql as well. Basically we can move the The issue is that |
I might have found the cause. In the for-loop of hosts in I think it would fix the issue to move What do you think about this? |
I have a trouble connecting to AWS Amazon Keyspaces (for Apache Cassandra). My program on EC2 can connect to Amazon Keyspaces without any issues for a while, but after a few days or weeks, it loses the connection and any query causes the error below.
Go version: 1.23.4
GoCQL version: 1.7.0
I built the program with gocql_debug enabled, and I got following logs.
On startup, It has two hosts 172.16.1.14 and 172.16.1.28. After a while, the connection to 172.16.1.14 got lost with error
cannot find host
and try to reconnect to 127.0.0.1 instead of 172.16.1.14. After another while, the other connection also got lost with the same error and also try to reconnect to 127.0.0.1 instead of 172.16.1.28. As a result, all connections got lost.So here are my questions:
First, in what situation the error
cannot find host
occur? Is this an expected error? I read the source code, but I couldn't understand it well.Second, what makes it reconnect to 127.0.0.1 instead of original address? Is this an expected behavior?
If anyone has any idea, please let me know.
The text was updated successfully, but these errors were encountered: