-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide performance benchmarks with and without Proxy #574
Comments
Hey @phoenix2x thanks for the issue. Some follow-up questions:
Slightly unrelated, but are you using the built-in traces? |
Thanks for the quick follow up:)
We do use builtin traces, but since we can't use them for no-cloudsqlconn test we added our custom span in the dialer for both tests, just to compare apples to apples. Here they are: |
Thanks, @phoenix2x this is really helpful. How many instances does each dialer connect to? For some background info, the Dialer does this:
|
It's a single Postgres instance in this test. |
These numbers are surprising to me. Let me talk with the backend folks to see if we can shed some light on what's going on. Are you doing manual load testing for this? How are you doing it? |
Also, if you have a support account, you might consider opening a case so the backend team can look at your instance. |
Yes, we use nightly load testing to prevent regressions. As soon as we switched from rds to GCP SQL we noticed this issue. The load test is just a script that hits our services. Thank you. |
Those latency numbers are much higher than I'd expect. We've been thinking about publishing some baseline numbers as part of a benchmark and this helps increase the priority of that work. Otherwise, there might be some insight the backend team can add. |
Meanwhile, I'm going to make this an issue for publishing benchmark numbers with and without the Dialer. |
Related to GoogleCloudPlatform/cloud-sql-proxy#1871. |
Just to circle back here and provide some information for others who run into this issue: One thing to keep in mind is that Auto IAM AuthN has a limit of login requests at 3000 logins to an instance / minute. When traffic spikes hit that threshold, the latency can jump way up (as we see above). Generally, though, we expect p99 latency to be much much lower, while still accounting for the network hops (app with connector -> proxy server -> instance and sometimes a call to verify the IAM user). |
@phoenix2x FYI it's possible to use auto IAM authn without the Go Connector. You'll need to ensure a few things:
We're working on making this path easier for folks, but for now I'll share the mechanics for visibility. Assuming you're using pgx, you can do this: import (
"context"
"fmt"
"time"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
)
func main() {
// use instance IP + native port (5432)
// for best security use client certificates + server cert in DSN
config, err := pgxpool.ParseConfig("host=INSTANCE_IP user=postgres password=empty sslmode=require")
if err != nil {
panic(err)
}
config.BeforeConnect = func(ctx context.Context, cfg *pgx.ConnConfig) error {
// This gets called before a connection is created and allows you to
// refresh the OAuth2 token here as needed. A fancier implementation would cache the token,
// and refresh only if the token were about to expire.
cfg.Password = "mycooltoken"
return nil
}
pool, err := pgxpool.NewWithConfig(context.Background(), config)
if err != nil {
}
conn, err := pool.Acquire(context.Background())
if err != nil {
panic(err)
}
defer conn.Release()
row := conn.QueryRow(context.Background(), "SELECT NOW()")
var t time.Time
if err := row.Scan(&t); err != nil {
panic(err)
}
fmt.Println(t)
} |
Hi @enocom, This is very interesting, thank you:) Is this supposed to target Server Side Proxy on port 3307 or native 5432? |
Native port. We're working on making this more obvious and possibly even providing some helper functions. |
Nice, we should definitely give it a try. |
Question
Hi there,
We're trying to migrate from RDS to GCP SQL. The application is running on GKE. We're using cloud-sql-go-connector v1.3.0 to connect a Postgres instance like this:
We noticed that it takes significantly more time for the Dialer to finish ~300ms as opposed to ~30ms for RDS. We assumed this is caused by additional work that the Cloud SQL Proxy server is doing. And indeed we confirmed it by directly connecting to the Cloud SQL Postgres instance using the IP:5432, latency was staying at ~70ms.
Since we do use connection pooling it works with no errors most of the time. But during traffic spikes or after a lot of connections died under heavy load this added latency causes a lot of errors. What is going on there is that when we try to open a significant number of connections at the same time the dialer latency spikes up with every new connection(latency goes up to ~4s), causing the application to try opening even more connections since it can't get enough to fulfill incoming requests. The end result is that the application opens up to the pool limit number of connections(500 in this specific test) and has a lot of errors when it uses cloudsqlconn. And it only opens ~100 connections with no errors when it connects to the IP:5432 directly. Both tests use the same traffic numbers.
Is there anything we can do to mitigate the issue?
Sorry for a lot of text, just want to make sure I give enough info:)
Code
No response
Additional Details
No response
The text was updated successfully, but these errors were encountered: