Add missing index on task_instance.dag_version_id#62660
Open
Vamsi-klu wants to merge 1 commit intoapache:mainfrom
Open
Add missing index on task_instance.dag_version_id#62660Vamsi-klu wants to merge 1 commit intoapache:mainfrom
Vamsi-klu wants to merge 1 commit intoapache:mainfrom
Conversation
Contributor
Author
|
cc @ephraimbuddy @potiuk @jason810496 — Would appreciate your review. This adds a missing index on |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a missing database index on task_instance.dag_version_id to avoid full-table scans that significantly slow down the DAG processor on large task_instance tables.
Changes:
- Add ORM-level index definition
ti_dag_version_idonTaskInstance.dag_version_id. - Add Alembic migration
0061_3_0_0_add_dag_version_id_index_to_ti.pyto create/drop the index in existing DBs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
airflow/models/taskinstance.py |
Declares the new ti_dag_version_id index in the TaskInstance table metadata. |
airflow/migrations/versions/0061_3_0_0_add_dag_version_id_index_to_ti.py |
Creates the ti_dag_version_id index on upgrade and drops it on downgrade. |
Comments suppressed due to low confidence (1)
airflow/migrations/versions/0061_3_0_0_add_dag_version_id_index_to_ti.py:42
- On PostgreSQL, a plain CREATE INDEX will hold stronger locks than CONCURRENTLY and can block writes to
task_instancefor the duration of the build (which can be minutes+ on very large tables). Consider using a Postgres-specific concurrent index build (via an autocommit block +postgresql_concurrently=True) to reduce downtime risk, or otherwise document/handle the expected locking behavior explicitly in the migration code path.
def upgrade():
"""Add index on dag_version_id to task_instance table."""
op.create_index("ti_dag_version_id", "task_instance", ["dag_version_id"], unique=False)
The dag_version_id column is queried by the DAG processor but had no index, causing full table scans on large task_instance tables. Add ti_dag_version_id index and corresponding Alembic migration. Closes: apache#61894 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a86edf1 to
5a3f293
Compare
|
@codex please review this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ti_dag_version_idindex ontask_instance.dag_version_idcolumn0107_3_2_0_add_dag_version_id_index_to_ti.pyCo-contributors : @codingrealitylabs @girlcoder-gaming
Test plan
airflow db migrateairflow db downgradeSELECT * FROM pg_indexes WHERE indexname = 'ti_dag_version_id'Closes: #61894
Note: On large
task_instancetables, the index creation may take several minutes during migration.🤖 Generated with Claude Code