Skip to content

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #22037

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Nov 10, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hfxsd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link

Summary of Changes

Hello @ti-chi-bot, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new documentation page explaining the ANALYZE feature embedded within DDL statements in TiDB. This functionality, available from v8.5.4 and v9.0.0, automates the collection of statistics for newly created or reorganized indexes, such as those resulting from ADD INDEX or MODIFY COLUMN operations. The primary goal is to ensure that the query optimizer always has up-to-date statistics, thereby maintaining optimal query performance and preventing plan regressions that could occur if statistics were missing or outdated after schema modifications.

Highlights

  • New Documentation Page: A new documentation page, ddl_embedded_analyze.md, has been added to describe the ANALYZE feature embedded in DDL statements.
  • Embedded ANALYZE Feature: The document details how TiDB automatically runs an ANALYZE operation during DDL statements that create new indexes (ADD INDEX) or reorganize existing ones (MODIFY COLUMN, CHANGE COLUMN).
  • Improved Optimizer Performance: This feature ensures that statistics for new or reorganized indexes are promptly updated, preventing inaccurate optimizer estimates and suboptimal query plans immediately after schema changes.
  • Configurable Behavior: Users can control this behavior using the new system variable tidb_stats_update_during_ddl, which is OFF by default.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 10, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new documentation page for the ANALYZE feature embedded in DDL statements. The document is well-structured and provides good examples. I've made a few suggestions to improve clarity, readability, and correct minor grammatical issues, following the repository's style guide. These include rephrasing some sentences for better flow, splitting long sentences, and fixing a minor grammatical error.


## Usage scenarios

In scenarios where DDL operations alternately add or modify indexes, existing stable queries might suffer from estimation bias because the new index lacks statistics, causing the optimizer to choose suboptimal plans. For more information, see [Issue #57948](https://github.com/pingcap/tidb/issues/57948).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This sentence is a bit complex. For better clarity and to align with the style guide's preference for addressing the user directly, consider rephrasing it.

Suggested change
In scenarios where DDL operations alternately add or modify indexes, existing stable queries might suffer from estimation bias because the new index lacks statistics, causing the optimizer to choose suboptimal plans. For more information, see [Issue #57948](https://github.com/pingcap/tidb/issues/57948).
When you add or modify indexes, existing stable queries might suffer from estimation bias because the new index lacks statistics, causing the optimizer to choose suboptimal plans. For more information, see [Issue #57948](https://github.com/pingcap/tidb/issues/57948).


When `tidb_stats_update_during_ddl` is `ON`, executing [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) automatically runs an embedded `ANALYZE` operation after the Reorg phase finishes. This `ANALYZE` operation collects statistics for the newly created index before the index becomes visible to users, and then `ADD INDEX` proceeds with its remaining phases.

Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The term 'first Reorg' might be confusing as it implies there could be more than one. Using 'the Reorg phase' is more precise and clearer for the user.

Suggested change
Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.
Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the Reorg phase. If `ANALYZE` times out, `ADD INDEX` stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.

1 rows in set (0.001 sec)
```

From the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `ADD INDEX` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This paragraph contains a long sentence with a parenthetical. For better readability, it's better to split it into separate sentences.

Suggested change
From the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `ADD INDEX` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.
From the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the `ADD INDEX` DDL statement is executed, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory. You can verify this by executing `SHOW STATS_HISTOGRAMS`. As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

When `tidb_stats_update_during_ddl` is `ON`, executing [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) or [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) that reorganizes an index will also run an embedded `ANALYZE` operation after the Reorg phase completes. The mechanism is the same as for `ADD INDEX`:

- Start collecting statistics before the index becomes visible.
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

There's a minor grammatical error here. Since the subject is plural (MODIFY COLUMN and CHANGE COLUMN), the verb should be 'stop' instead of 'stops'. I've also slightly rephrased the sentence for better flow.

Suggested change
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) stop waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.

+------------------------+---------+-----------+-----------------------+----------------------------------+
2 rows in set (0.005 sec)
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This line contains extra whitespace and should be removed for correct formatting.

1 rows in set (0.001 sec)
```

From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `MODIFY COLUMN` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This paragraph contains a long sentence with a parenthetical. For better readability, it's better to split it into separate sentences.

Suggested change
From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `MODIFY COLUMN` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.
From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the `MODIFY COLUMN` DDL statement is executed, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory. You can verify this by executing `SHOW STATS_HISTOGRAMS`. As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants