perf: use materialized CTE for orgUnitMode=SELECTED enrollment queries DHIS2-20921#22982
Draft
perf: use materialized CTE for orgUnitMode=SELECTED enrollment queries DHIS2-20921#22982
Conversation
dc3d366 to
da49f75
Compare
…s DHIS2-20921 When orgUnitMode=SELECTED with a known program, replace the trackedentityprogramowner + organisationunit joins with a materialized CTE that finds tracked entity IDs via the (programid, organisationunitid) composite index. Without materialization, PostgreSQL flattens the subquery into a semi-join and scans all enrollments instead. 8,005ms -> 72ms for the district-level org unit (0 matches). 273ms for a facility with 8,597 tracked entities.
da49f75 to
7080583
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Follow-up to #22979. Fixes the
orgUnitMode=SELECTEDperformance problem for enrollment queries.The current query starts from the
enrollmenttable (10.9M rows), joins totrackedentityprogramowner+organisationunit, and filters by org unit ID in WHERE. PostgreSQLscans all enrollments for the program and checks each one against the program owner -- even when the
selected org unit has zero matches.
This PR replaces the
trackedentityprogramowner+organisationunitjoins with a materialized CTEthat finds tracked entity IDs via the
(programid, organisationunitid)composite index ontrackedentityprogramowner. The CTE result drives the join toenrollment, so PostgreSQL startsfrom the small set of tracked entities at the selected org units instead of scanning all enrollments.
The
MATERIALIZEDkeyword is required because without it PostgreSQL flattens the subquery into asemi-join and still scans all enrollments.
The ownership access control clause (
ou.path like ...) is not needed because the mapper validatesthat the user has appropriate access (search or capture scope depending on program access level) to
the requested org units before they reach the store.
SQL
Database Performance
Sierra Leone DB with 10M tracked entities (10.9M enrollments). EXPLAIN ANALYZE, 4 warmup runs.
orgUnits=O6uvpzGd5pu&orgUnitMode=SELECTED(district, 0 matches)orgUnits=DiszpKrYNg8&orgUnitMode=SELECTED(facility, 8,597 TEs)fields=enrollment(no orgUnitMode)orgUnitMode=DESCENDANTSOther org unit modes and queries without org unit filters are unaffected.