Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does solid queue have issues working with PGBouncer ? #537

Open
jwoodrow opened this issue Mar 20, 2025 · 10 comments
Open

Does solid queue have issues working with PGBouncer ? #537

jwoodrow opened this issue Mar 20, 2025 · 10 comments

Comments

@jwoodrow
Copy link

jwoodrow commented Mar 20, 2025

Hi !

I was trying to migrate from Resque to SolidQueue on our staging environment before attempting production, but I'm running into issues.

Our database.yml looks something like this

default_primary: &default_primary
  <<: *default
  url: <%= primary_database_url %>
  migrations_paths: 'db/migrate/primary'
  schema_search_path: 'public,salesforce,solid_queue,heroku_ext'
  schema_cache_path: db/schema_cache.dump

default_primary_replica: &default_primary_replica
  <<: *default
  url: <%= replica_database_url %>
  replica: true
  schema_search_path: 'public,salesforce,solid_queue,heroku_ext'
  schema_cache_path: db/schema_cache.dump

default_queue: &default_queue
  <<: *default
  url: <%= primary_database_url %>
  database_tasks: false
  schema_search_path: 'public,salesforce,solid_queue,heroku_ext'
  schema_cache_path: db/schema_cache.dump

default_queue_replica: &default_queue_replica
  <<: *default
  url: <%= replica_database_url %>
  replica: true
  schema_search_path: 'public,salesforce,solid_queue,heroku_ext'
  schema_cache_path: db/schema_cache.dump

with these applied to all environments equally and where the _database_url variables prioritize a connection pool URL from PGBouncer (we are using heroku) since we use up a lot more than 500 open connections otherwise.

The issue seems to be that when we are using the connection pool when running either bin/jobs and bundle exec rails solid_queue:start then we get random errors like these:

/app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec': PG::UndefinedTable: ERROR:  relation "solid_queue_pauses" does not exist (ActiveRecord::StatementInvalid)
LINE 1: SELECT "solid_queue_pauses"."queue_name" FROM "solid_queue_p...
                                                      ^

	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `perform_query'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:556:in `block (2 levels) in raw_execute'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:1011:in `block in with_raw_connection'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/concurrency/null_lock.rb:9:in `synchronize'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:983:in `with_raw_connection'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:555:in `block in raw_execute'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:1129:in `log'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:554:in `raw_execute'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:591:in `internal_execute'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:547:in `internal_exec_query'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:693:in `select'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:73:in `select_all'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:248:in `block in select_all'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:286:in `block (2 levels) in cache_sql'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:80:in `compute_if_absent'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:284:in `block in cache_sql'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/concurrency/null_lock.rb:9:in `synchronize'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:283:in `cache_sql'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:248:in `select_all'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/relation/calculations.rb:322:in `block (2 levels) in pluck'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:418:in `with_connection'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/connection_handling.rb:310:in `with_connection'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/relation/calculations.rb:321:in `block in pluck'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/relation.rb:1470:in `skip_query_cache_if_necessary'
	from /app/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.1/lib/active_record/relation/calculations.rb:317:in `pluck'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/app/models/solid_queue/queue_selector.rb:73:in `paused_queues'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/app/models/solid_queue/queue_selector.rb:31:in `queue_names'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/app/models/solid_queue/queue_selector.rb:27:in `none?'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/app/models/solid_queue/queue_selector.rb:15:in `scoped_relations'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/app/models/solid_queue/ready_execution.rb:11:in `claim'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/worker.rb:41:in `block in claim_executions'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:47:in `block (2 levels) in with_polling_volume'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/logger_silence.rb:18:in `block in silence'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/logger_thread_safe_level.rb:37:in `log_at'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/logger_silence.rb:18:in `silence'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:47:in `block in with_polling_volume'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/notifications.rb:212:in `instrument'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue.rb:73:in `instrument'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:45:in `with_polling_volume'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/worker.rb:40:in `claim_executions'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/worker.rb:30:in `poll'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:29:in `block (2 levels) in start_loop'
	from /app/vendor/bundle/ruby/3.3.0/gems/activesupport-8.0.1/lib/active_support/execution_wrapper.rb:91:in `wrap'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/app_executor.rb:7:in `wrap_in_app_executor'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:28:in `block in start_loop'
	from <internal:kernel>:187:in `loop'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:25:in `start_loop'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/poller.rb:21:in `run'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/processes/runnable.rb:15:in `start'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:84:in `block in start_process'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:83:in `fork'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:83:in `start_process'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:58:in `block in start_processes'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:58:in `each'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:58:in `start_processes'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:35:in `start'
	from <internal:kernel>:90:in `tap'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/supervisor.rb:16:in `start'
	from /app/vendor/bundle/ruby/3.3.0/gems/solid_queue-1.1.4/lib/solid_queue/cli.rb:26:in `start'
	from /app/vendor/bundle/ruby/3.3.0/gems/thor-1.3.2/lib/thor/command.rb:28:in `run'
	from /app/vendor/bundle/ruby/3.3.0/gems/thor-1.3.2/lib/thor/invocation.rb:127:in `invoke_command'
	from /app/vendor/bundle/ruby/3.3.0/gems/thor-1.3.2/lib/thor.rb:538:in `dispatch'
	from /app/vendor/bundle/ruby/3.3.0/gems/thor-1.3.2/lib/thor/base.rb:584:in `start'
	from bin/jobs:6:in `<main>'

but if I open a bash terminal on that application (disabling the workers from the procfile manually first) then proceed to unset the connection pool URL (so it falls back to direct connection to the database) and run bin/jobs or rails solid_queue:start then the error goes away

I'm thinking maybe solid queue has issues not being directly connected to the database for some queries, maybe from the supervisor ?

I don't mind supervisors and the such having direct access, they don't need many connections. But I want my jobs' perform to run using the connection pool to avoid saturating them, if there was a way to configure that so that different configs in database.yml can be used for SolidQueue actions and others for the execution of the worker itself, maybe pooling would no longer be an issue

@rosa
Copy link
Member

rosa commented Mar 20, 2025

Hey @jwoodrow! Solid Queue shouldn't have any problems with PGBouncer. The error you're getting seems to be related to Solid Queue's database not being correctly created. It's about a table missing:

relation "solid_queue_pauses" does not exist

Have you run db:prepare after solid_queue:install in your production environment?

@jwoodrow
Copy link
Author

Hi @rosa

I saw the other issue mentioning this (using db:prepare) and I've already switched to db:prepare for the release phase on heroku and run it manually too.

In case I had borked my database (since it wasn't a production environment) I even replaced my database with a completely different one and redeployed and regenerated my database on it.

When using the connection pool url (PGBouncer) I get this error almost immediately and enter a crash loop.
But once I use the direct connection to the database instead, these errors go away.

I've also connected directly to the database to observe the DDL and the tables and indices are all there.

@rosa
Copy link
Member

rosa commented Mar 21, 2025

Thanks for the extra information 🙏 That's very strange! Does your main application's database work fine with PGBouncer? Solid Queue doesn't access the DB in a different way from the main app; it uses Active Record and Rails's connection handling under the hood 🤔

@jwoodrow
Copy link
Author

Yes, it's currently the way it is setup actually. I'm using direct connection only for solid_queue in that env and all other dynos are configured to use the connection pool 🤔

@rosa
Copy link
Member

rosa commented Mar 21, 2025

Very strange indeed 🤔 And what happens when you try to enqueue a job from your app? Does it also fail with a PG::UndefinedTable error?

@jwoodrow
Copy link
Author

I just tried it out, both from controller actions and the rails console, no errors pop up as long as I have direct connection only on solid_queue workers

@jwoodrow
Copy link
Author

Ah, I'm sorry it seems there is indeed an issue with PGBouncer (always relating to SolidQueue) also with other dynos, just much fewer because of less activity, I'm also using schema caching could this be related somehow ?

@jwoodrow
Copy link
Author

jwoodrow commented Mar 22, 2025

I'm beginning to think this is really more of a Rails issue than anything.

I've noticed that the queries being generated by SolidQueue did not explicitly provide the schema prefix even though I4ve set the table_name_prefix class attribute. This is because since SolidQueue is isolated the module SolidQueue ends up with table_name_prefix being force defined to solid_queue_. This is all fine all the tables are in the public schema but it seems that something with pg_bouncer prevents the schema_path from being set or used from time to time which causes the issue (because it's searching for the tables in the public schema, when I manually set them via a migration to the solid_queue schema).

My monkey patch is two-fold just as an FYI

  1. I wanted my supervisor to automatically recover from a database truncation (we reset our staging environment database each day with truncation) and also fix an error that was caused by this
  2. I wanted to force SolidQueue queries to be in the form solid_queue.table_name
  3. (I also changed the migrations so basically what used to be solid_queue_jobs for example is now solid_queue.jobs since the solid_queue schema is how I'm isolating things and solid_queue.solid_queue_ is a mouth full 😅 )
Monkey Patch (loaded in the initializers)
module SolidQueueSupervisorPatch
  private

    def supervise
      loop do
        break if stopped?

        set_procline
        process_signal_queue

        next if stopped?

        recover_supervisor
        reap_and_replace_terminated_forks
        interruptible_sleep(1.second)
      end
    ensure
      shutdown
    end

    def handle_claimed_jobs_by(terminated_fork, status)
      return unless (registered_process = process&.supervisees&.find_by(name: terminated_fork.name))

      error = SolidQueue::Processes::ProcessExitError.new(status)
      registered_process.fail_all_claimed_executions_with(error)
    end

    def recover_supervisor
      return if process.present?

      @pid = nil
      register
    end
end

ActiveSupport.on_load(:solid_queue) do
  # We're opting for a different schema approack so we need schema prefixing instead of table prefixing
  SolidQueue.singleton_class.define_method(:table_name_prefix) { 'solid_queue.' }
  SolidQueue::Supervisor.prepend SolidQueueSupervisorPatch
end

I don't believe SolidQueue could fix the issue that comes with the fact that isolate_namespace has no "isolate by schema" when it comes to the table_name_prefix being force defined 🤔

I'll let it run a bit more and keep you posted on if this was actually a Rails issue all along

Update: so far have not seen any issues even loading solid_queue with 3k jobs in a short burst with pg_bouncer with my changes

@rosa
Copy link
Member

rosa commented Mar 25, 2025

Oh huh! Great digging and very interesting find! However, I haven't fully understood it. Do you mean you've set a different table_name_prefix from the one that's automatically set by isolate_namespace from Rails::Engine?

@jwoodrow
Copy link
Author

That's exactly it, Rails::Engine seems to assume single database and single schema isolation, hence the #{engine_name}_ format.

unless mod.respond_to?(:table_name_prefix)
  define_method(:table_name_prefix) { "#{name}_" }

  ActiveSupport.on_load(:active_record) do
    mod.singleton_class.redefine_method(:table_name_prefix) do
      "#{ActiveRecord::Base.table_name_prefix}#{name}_"
    end
  end
end

(There's also something about prefixing the ActiveRecord::Base.table_name_prefix but changing this would impact all queries not just SolidQueue ones)

Usually this isn't an issue, but there appears to be some moments when Rails or PGBouncer or both, lose the notion of the schema_search_path (which I had correctly set to public,solid_queue and worked fine in direct DB connection) and when that happens, since there's no explicit schema_name. in the queries being built it ends up failing saying the relation does not exist when it in fact does.

As the code from Rails::Engine seems to suggest, potentially this could be resolved via a configuration option like solid_queue.config.isolation_strategy = :schema # or :table and defining/redefining this method when this config gets set or before the isolate_namespace SolidQueue gets called ? or I could just keep the PGBouncer monkey patch which is just redefining the method myself.

Either way I do think the base issue is in how Rails and PGBouncer interact within isolated engines, and schemas disappearing out of nowhere 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants