Skip to content

Conversation

@mdearos
Copy link

@mdearos mdearos commented Dec 3, 2025

Overview

This commit adds a fix for the slow performance of full counts in PostgreSQL (Issue #1969).

To achieve this two new PostgreSQL provider settings have been added (postgresql_pseudo_count_enabled and postgresql_pseudo_count_start). Allowing the use of pseudo counts to be configured individually on each use of the PostgreSQL provider.

This fix then uses the PostgreSQL EXPLAIN function to "guess" the number of rows that will be returned by a given request. But this does not affect all queries equally because pseudo counts cannot be run on the following types of query:

  • Requests with a Result Type of Hits.
  • Requests with a CQL filter.
  • Requests with a BBOX filter.
  • Requests with a Temporal filter.

Also, you can use the postgresql_pseudo_count_start setting to tell the system to run a full count if the row estimate is to small meaning there is enough time for a full count to be run.

This commit also adds the required documentation and postgreSQL provider test changes. Including adding a building_type and datetime column to the dummy_data.sql file.

Additional information

Dependency policy (RFC2)

  • I have ensured that this PR meets RFC2 requirements

Updates to public demo

Contributions and licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • I'd like to contribute bugfix Slow Query Performance in Postgres Provider Due to Full count on Large Tables to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
  • I have already previously agreed to the pygeoapi Contributions and Licensing

This commit adds a fix for the slow performance of full counts in PostgreSQL (Issue geopython#1969).

To achieve this two new PostgreSQL provider settings have been added (postgresql_pseudo_count_enabled and postgresql_pseudo_count_start). Allowing the use of pseudo counts to be configured individually on each use of the PostgreSQL provider.

This fix then uses the PostgreSQL EXPLAIN function to "guess" the number of rows that will be returned by a given request. But this does not affect all queries equally because pseudo counts cannot be run on the following types of query:
   - Requests with a Result Type of Hits.
   - Requests with a CQL filter.
   - Requests with a BBOX filter.
   - Requests with a Temporal filter.

Also, you can use the postgresql_pseudo_count_start setting to tell the system to run a full count if the row estimate is to small meaning there is enough time for a full count to be run.

This commit also adds the required documentation and postgreSQL provider test changes. Including adding a building_type and datetime column to the dummy_data.sql file.
@webb-ben
Copy link
Member

webb-ben commented Dec 6, 2025

I wonder if a postgres specific addition to the provider block is the correct approach if we want to maintain a rigid pygeoapi config schema. This appears to port some of the logic implemented in a Psuedocount-specfic pygeoapi Postgres provider.

I wonder if there is a configuration option / solution that could be used across all pygeoapi providers given the numberMatched is not required by the specification. Is better to have no count, or an incorrect one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants