Skip to content

Add missing index on task_instance.dag_version_id#62660

Open
Vamsi-klu wants to merge 1 commit intoapache:mainfrom
Vamsi-klu:fix/m6-add-dag-version-id-index
Open

Add missing index on task_instance.dag_version_id#62660
Vamsi-klu wants to merge 1 commit intoapache:mainfrom
Vamsi-klu:fix/m6-add-dag-version-id-index

Conversation

@Vamsi-klu
Copy link
Contributor

@Vamsi-klu Vamsi-klu commented Mar 1, 2026

Summary

Co-contributors : @codingrealitylabs @girlcoder-gaming

Test plan

  • Verify Alembic migration applies cleanly: airflow db migrate
  • Verify Alembic migration downgrades cleanly: airflow db downgrade
  • Verify index exists after migration: SELECT * FROM pg_indexes WHERE indexname = 'ti_dag_version_id'
  • Monitor scheduler performance improvement on large deployments

Closes: #61894

Note: On large task_instance tables, the index creation may take several minutes during migration.

🤖 Generated with Claude Code

@Vamsi-klu Vamsi-klu marked this pull request as ready for review March 1, 2026 07:36
@Vamsi-klu
Copy link
Contributor Author

cc @ephraimbuddy @potiuk @jason810496 — Would appreciate your review. This adds a missing index on task_instance.dag_version_id to prevent full table scans during DAG processing on large tables.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a missing database index on task_instance.dag_version_id to avoid full-table scans that significantly slow down the DAG processor on large task_instance tables.

Changes:

  • Add ORM-level index definition ti_dag_version_id on TaskInstance.dag_version_id.
  • Add Alembic migration 0061_3_0_0_add_dag_version_id_index_to_ti.py to create/drop the index in existing DBs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
airflow/models/taskinstance.py Declares the new ti_dag_version_id index in the TaskInstance table metadata.
airflow/migrations/versions/0061_3_0_0_add_dag_version_id_index_to_ti.py Creates the ti_dag_version_id index on upgrade and drops it on downgrade.
Comments suppressed due to low confidence (1)

airflow/migrations/versions/0061_3_0_0_add_dag_version_id_index_to_ti.py:42

  • On PostgreSQL, a plain CREATE INDEX will hold stronger locks than CONCURRENTLY and can block writes to task_instance for the duration of the build (which can be minutes+ on very large tables). Consider using a Postgres-specific concurrent index build (via an autocommit block + postgresql_concurrently=True) to reduce downtime risk, or otherwise document/handle the expected locking behavior explicitly in the migration code path.
def upgrade():
    """Add index on dag_version_id to task_instance table."""
    op.create_index("ti_dag_version_id", "task_instance", ["dag_version_id"], unique=False)

The dag_version_id column is queried by the DAG processor but had no
index, causing full table scans on large task_instance tables. Add
ti_dag_version_id index and corresponding Alembic migration.

Closes: apache#61894

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Vamsi-klu Vamsi-klu force-pushed the fix/m6-add-dag-version-id-index branch from a86edf1 to 5a3f293 Compare March 1, 2026 17:51
@girlcoder-gaming
Copy link

@codex please review this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dag Processor performance issue querying task_instance table by dag_version_id

3 participants