diff --git a/docs/content/preview/architecture/query-layer/planner-optimizer.md b/docs/content/preview/architecture/query-layer/planner-optimizer.md index dc04d93d0e01..0b70648c9696 100644 --- a/docs/content/preview/architecture/query-layer/planner-optimizer.md +++ b/docs/content/preview/architecture/query-layer/planner-optimizer.md @@ -31,7 +31,7 @@ To account for the distributed nature of the data, YugabyteDB has implemented a {{}} -The YugabyteDB CBO is {{}} and disabled by default. To enable it, turn ON the [yb_enable_base_scans_cost_model](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) configuration parameter as follows: +The YugabyteDB CBO is {{}} and disabled by default. To enable it, turn ON the [yb_enable_base_scans_cost_model](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) configuration parameter as follows: ```sql -- Enable for current session @@ -82,12 +82,22 @@ Some of the factors that the CBO considers in the cost estimation are as follows The CBO estimates the size and number of tuples that will be transferred, with data sent in pages. The page size is determined by the configuration parameters [yb_fetch_row_limit](../../../reference/configuration/yb-tserver/#yb-fetch-row-limit) and [yb_fetch_size_limit](../../../reference/configuration/yb-tserver/#yb-fetch-size-limit). Because each page requires a network round trip for the request and response, the CBO also estimates the total number of pages that will be transferred. Note that the time spent transferring the data also depends on the network bandwidth. -## Plan selection +### Plan selection The CBO evaluates each candidate plan's estimated costs to determine the plan with the lowest cost, which is then selected for execution. This ensures the optimal use of system resources and improved query performance. After the optimal plan is determined, YugabyteDB generates a detailed execution plan with all the necessary steps, such as scanning tables, joining data, filtering rows, sorting, and computing expressions. This execution plan is then passed to the query executor component, which carries out the plan and returns the final query results. +### Best practices + +- If your table already has rows, and you create an additional index (for example, `create index i on t (k);`), you must re-run analyze to populate the index `pg_class.reltuples` with the correct row count. [Issue](https://github.com/yugabyte/yugabyte-db/issues/25394) + + If you need to create a new index to replace a old one while your application is running, create the new one first, run analyze, then drop the old one. + +- After you restore a database in YugabyteDB Anywhere or Aeon, you need to run analyze because the statistics that were in the database when it was backed up do not get restored. + + For consistent RTO from restore in a cost model-enabled environment, run analyze manually after restore has finished and before you allow end users into the application. If you allow end users into the application before analyze is finished, initial execution plans won't be correctly optimized. + ## Learn more - [Exploring the Cost Based Optimizer](https://www.yugabyte.com/blog/yugabytedb-cost-based-optimizer/) diff --git a/docs/content/preview/explore/query-1-performance/_index.md b/docs/content/preview/explore/query-1-performance/_index.md index 318291037d0b..fcd937bdf460 100644 --- a/docs/content/preview/explore/query-1-performance/_index.md +++ b/docs/content/preview/explore/query-1-performance/_index.md @@ -1,6 +1,6 @@ --- title: Query Tuning -headerTitle: Query Tuning +headerTitle: Query tuning linkTitle: Query tuning description: Tuning and optimizing query performance headcontent: Optimize query performance @@ -17,7 +17,7 @@ showRightNav: true Query tuning is the art and science of improving the performance of SQL queries. It involves understanding the database's architecture, query execution plans, and performance metrics. By identifying and addressing performance bottlenecks, you can significantly enhance the responsiveness of your applications and reduce the load on your database infrastructure. -This guide provides a comprehensive overview of query tuning techniques for distributed SQL databases. The following sections explore various strategies, best practices, and tools to help you optimize your queries and achieve optimal performance. +This guide provides an overview of query tuning techniques for distributed SQL databases, including strategies, best practices, and tools to help you optimize queries, and achieve optimal performance. ## Identify slow queries @@ -29,7 +29,7 @@ Learn how to fetch query statistics and improve performance using [pg_stat_state ## Column statistics -The pg_stats view provides a user-friendly display of the column-level data distribution of tables. This view includes information about table columns, such as the fraction of null entries, average width, number of distinct values, and most common values. These statistics are crucial for the query planner to make informed decisions about the most efficient way to execute queries. By regularly analyzing the statistics in pg_stats, you can identify opportunities for optimization, such as creating or dropping indexes, and fine-tune your database configuration for optimal performance. +The pg_stats view provides a user-friendly display of the column-level data distribution of tables. This view includes information about table columns, such as the fraction of null entries, average width, number of distinct values, and most common values. These statistics are crucial for the query planner to make informed decisions about the most efficient way to execute queries. By regularly analyzing the statistics in pg_stats, you can identify opportunities for optimization (such as creating or dropping indexes), and fine-tune your database configuration for optimal performance. {{}} Learn how to understand column level statistics and improve query performance using [pg_stats](./pg-stats/). @@ -61,13 +61,7 @@ $ ./bin/yb-tserver --ysql_log_min_duration_statement 1000 Results are written to the current `postgres*log` file. -{{< note title="Note" >}} - -Depending on the database and the work being performed, long-running queries don't necessarily need to be optimized. - -Ensure that the threshold is high enough so that you don't flood the `postgres*log` log files. - -{{< /note >}} +(Depending on the database and the work being performed, long-running queries don't necessarily need to be optimized. Ensure that the threshold is high enough so that you don't flood the `postgres*log` log files.) {{}} Learn more about [YB-TServer logs](/preview/explore/observability/logging/). diff --git a/docs/content/stable/architecture/query-layer/planner-optimizer.md b/docs/content/stable/architecture/query-layer/planner-optimizer.md index 0e4b9b28cbe8..619a3fb1e22c 100644 --- a/docs/content/stable/architecture/query-layer/planner-optimizer.md +++ b/docs/content/stable/architecture/query-layer/planner-optimizer.md @@ -82,12 +82,22 @@ Some of the factors that the CBO considers in the cost estimation are as follows The CBO estimates the size and number of tuples that will be transferred, with data sent in pages. The page size is determined by the configuration parameters [yb_fetch_row_limit](../../../reference/configuration/yb-tserver/#yb-fetch-row-limit) and [yb_fetch_size_limit](../../../reference/configuration/yb-tserver/#yb-fetch-size-limit). Because each page requires a network round trip for the request and response, the CBO also estimates the total number of pages that will be transferred. Note that the time spent transferring the data also depends on the network bandwidth. -## Plan selection +### Plan selection The CBO evaluates each candidate plan's estimated costs to determine the plan with the lowest cost, which is then selected for execution. This ensures the optimal use of system resources and improved query performance. After the optimal plan is determined, YugabyteDB generates a detailed execution plan with all the necessary steps, such as scanning tables, joining data, filtering rows, sorting, and computing expressions. This execution plan is then passed to the query executor component, which carries out the plan and returns the final query results. +### Best practices + +- If your table already has rows, and you create an additional index (for example, `create index i on t (k);`), you must re-run analyze to populate the index `pg_class.reltuples` with the correct row count. [Issue](https://github.com/yugabyte/yugabyte-db/issues/25394) + + If you need to create a new index to replace a old one while your application is running, create the new one first, run analyze, then drop the old one. + +- After you restore a database in YugabyteDB Anywhere or Aeon, you need to run analyze because the statistics that were in the database when it was backed up do not get restored. + + For consistent RTO from restore in a cost model-enabled environment, run analyze manually after restore has finished and before you allow end users into the application. If you allow end users into the application before analyze is finished, initial execution plans won't be correctly optimized. + ## Learn more - [Exploring the Cost Based Optimizer](https://www.yugabyte.com/blog/yugabytedb-cost-based-optimizer/) diff --git a/docs/content/stable/explore/query-1-performance/_index.md b/docs/content/stable/explore/query-1-performance/_index.md index a28e2b8de69b..81374f6a9c15 100644 --- a/docs/content/stable/explore/query-1-performance/_index.md +++ b/docs/content/stable/explore/query-1-performance/_index.md @@ -1,6 +1,6 @@ --- title: Query Tuning -headerTitle: Query Tuning +headerTitle: Query tuning linkTitle: Query tuning description: Tuning and optimizing query performance headcontent: Optimize query performance @@ -15,7 +15,7 @@ showRightNav: true Query tuning is the art and science of improving the performance of SQL queries. It involves understanding the database's architecture, query execution plans, and performance metrics. By identifying and addressing performance bottlenecks, you can significantly enhance the responsiveness of your applications and reduce the load on your database infrastructure. -This guide provides a comprehensive overview of query tuning techniques for distributed SQL databases. We will explore various strategies, best practices, and tools to help you optimize your queries and achieve optimal performance. +This guide provides an overview of query tuning techniques for distributed SQL databases, including strategies, best practices, and tools to help you optimize queries, and achieve optimal performance. ## Identify slow queries @@ -27,7 +27,7 @@ Learn how to fetch query statistics and improve performance using [pg_stat_state ## Column statistics -The pg_stats view provides a user-friendly display of the column-level data distribution of tables. This view includes information about table columns, such as the fraction of null entries, average width, number of distinct values, and most common values. These statistics are crucial for the query planner to make informed decisions about the most efficient way to execute queries. By regularly analyzing the statistics in pg_stats, you can identify opportunities for optimization, such as creating or dropping indexes, and fine-tune your database configuration for optimal performance. +The pg_stats view provides a user-friendly display of the column-level data distribution of tables. This view includes information about table columns, such as the fraction of null entries, average width, number of distinct values, and most common values. These statistics are crucial for the query planner to make informed decisions about the most efficient way to execute queries. By regularly analyzing the statistics in pg_stats, you can identify opportunities for optimization (such as creating or dropping indexes), and fine-tune your database configuration for optimal performance. {{}} Learn how to understand column level statistics and improve query performance using [pg_stats](./pg-stats/). diff --git a/docs/content/v2024.1/architecture/query-layer/planner-optimizer.md b/docs/content/v2024.1/architecture/query-layer/planner-optimizer.md index bfd1e96e921d..d6c1974e5d24 100644 --- a/docs/content/v2024.1/architecture/query-layer/planner-optimizer.md +++ b/docs/content/v2024.1/architecture/query-layer/planner-optimizer.md @@ -31,7 +31,7 @@ To account for the distributed nature of the data, YugabyteDB has implemented a {{}} -The YugabyteDB CBO is {{}} and disabled by default. To enable it, turn ON the [yb_enable_base_scans_cost_model](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) configuration parameter as follows: +The YugabyteDB CBO is {{}} and disabled by default. To enable it, turn ON the [yb_enable_base_scans_cost_model](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) configuration parameter as follows: ```sql -- Enable for current session @@ -82,12 +82,22 @@ Some of the factors that the CBO considers in the cost estimation are as follows The CBO estimates the size and number of tuples that will be transferred, with data sent in pages. The page size is determined by the configuration parameters [yb_fetch_row_limit](../../../reference/configuration/yb-tserver/#yb-fetch-row-limit) and [yb_fetch_size_limit](../../../reference/configuration/yb-tserver/#yb-fetch-size-limit). Because each page requires a network round trip for the request and response, the CBO also estimates the total number of pages that will be transferred. Note that the time spent transferring the data also depends on the network bandwidth. -## Plan selection +### Plan selection The CBO evaluates each candidate plan's estimated costs to determine the plan with the lowest cost, which is then selected for execution. This ensures the optimal use of system resources and improved query performance. After the optimal plan is determined, YugabyteDB generates a detailed execution plan with all the necessary steps, such as scanning tables, joining data, filtering rows, sorting, and computing expressions. This execution plan is then passed to the query executor component, which carries out the plan and returns the final query results. +### Best practices + +- If your table already has rows, and you create an additional index (for example, `create index i on t (k);`), you must re-run analyze to populate the index `pg_class.reltuples` with the correct row count. [Issue](https://github.com/yugabyte/yugabyte-db/issues/25394) + + If you need to create a new index to replace a old one while your application is running, create the new one first, run analyze, then drop the old one. + +- After you restore a database in YugabyteDB Anywhere or Aeon, you need to run analyze because the statistics that were in the database when it was backed up do not get restored. + + For consistent RTO from restore in a cost model-enabled environment, run analyze manually after restore has finished and before you allow end users into the application. If you allow end users into the application before analyze is finished, initial execution plans won't be correctly optimized. + ## Learn more - [Exploring the Cost Based Optimizer](https://www.yugabyte.com/blog/yugabytedb-cost-based-optimizer/) diff --git a/docs/content/v2025.1/api/ysql/the-sql-language/statements/cmd_analyze.md b/docs/content/v2025.1/api/ysql/the-sql-language/statements/cmd_analyze.md index 69b512bf714b..55d542ff72f1 100644 --- a/docs/content/v2025.1/api/ysql/the-sql-language/statements/cmd_analyze.md +++ b/docs/content/v2025.1/api/ysql/the-sql-language/statements/cmd_analyze.md @@ -18,10 +18,10 @@ ANALYZE collects statistics about the contents of tables in the database, and st The statistics are also used by the YugabyteDB [cost based optimizer](../../../../../architecture/query-layer/planner-optimizer) (CBO) to create optimal execution plans for queries. When run on up-to-date statistics, CBO provides performance improvements and can reduce or eliminate the need to use hints or modify queries to optimize query execution. -{{< warning title="Run ANALYZE manually" >}} -Currently, YugabyteDB doesn't run a background job like PostgreSQL autovacuum to analyze the tables. To collect or update statistics, run the ANALYZE command manually. - +{{< warning title="Run ANALYZE regularly" >}} If you have enabled CBO, you must run ANALYZE on user tables after data load for the CBO to create optimal execution plans. + +You can automate running ANALYZE using the [Auto Analyze service](../../../../../explore/query-1-performance/auto-analyze/). {{< /warning >}} The YugabyteDB implementation is based on the framework provided by PostgreSQL, which requires the storage layer to provide a random sample of rows of a predefined size. The size is calculated based on a number of factors, such as the included columns' data types. diff --git a/docs/content/v2025.1/architecture/query-layer/planner-optimizer.md b/docs/content/v2025.1/architecture/query-layer/planner-optimizer.md index 4866d62ab4ae..f8d3d4dfe463 100644 --- a/docs/content/v2025.1/architecture/query-layer/planner-optimizer.md +++ b/docs/content/v2025.1/architecture/query-layer/planner-optimizer.md @@ -31,7 +31,7 @@ To account for the distributed nature of the data, YugabyteDB has implemented a {{}} -The YugabyteDB CBO is {{}} and disabled by default. To enable it, turn ON the [yb_enable_base_scans_cost_model](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) configuration parameter as follows: +The YugabyteDB CBO is {{}} and disabled by default. To enable it, turn ON the [yb_enable_base_scans_cost_model](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) configuration parameter as follows: ```sql -- Enable for current session @@ -48,9 +48,9 @@ To optimize the search for the best plan, the CBO uses a dynamic programming-bas The optimizer relies on accurate statistics about the tables, including the number of rows, the distribution of data in columns, and the cardinality of results from operations. These statistics are essential for estimating the selectivity of filters and costs of various query plans accurately. These statistics are gathered by the [ANALYZE](../../../api/ysql/the-sql-language/statements/cmd_analyze/) command and are provided in a display-friendly format by the [pg_stats](../../../architecture/system-catalog/#data-statistics) view. -{{}} -Currently, YugabyteDB doesn't run a background job like PostgreSQL autovacuum to analyze the tables. To collect or update statistics, run the ANALYZE command manually. If you have enabled CBO, you must run ANALYZE on user tables after data load for the CBO to create optimal execution plans. Multiple projects are in progress to trigger this automatically. -{{}} +Similar to [PostgreSQL autovacuum](https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM), the YugabyteDB [Auto Analyze](../../../explore/query-1-performance/auto-analyze/) service automates the execution of ANALYZE commands for any table where rows have changed more than a configurable threshold for the table. This ensures table statistics are always up-to-date. + +Even with the Auto Analyze service, for the CBO to create optimal execution plans, you should still run ANALYZE manually on user tables after data load, as well as in other circumstances. Refer to [Best practices](#best-practices). ### Cost estimation @@ -82,12 +82,22 @@ Some of the factors that the CBO considers in the cost estimation are as follows The CBO estimates the size and number of tuples that will be transferred, with data sent in pages. The page size is determined by the configuration parameters [yb_fetch_row_limit](../../../reference/configuration/yb-tserver/#yb-fetch-row-limit) and [yb_fetch_size_limit](../../../reference/configuration/yb-tserver/#yb-fetch-size-limit). Because each page requires a network round trip for the request and response, the CBO also estimates the total number of pages that will be transferred. Note that the time spent transferring the data also depends on the network bandwidth. -## Plan selection +### Plan selection The CBO evaluates each candidate plan's estimated costs to determine the plan with the lowest cost, which is then selected for execution. This ensures the optimal use of system resources and improved query performance. After the optimal plan is determined, YugabyteDB generates a detailed execution plan with all the necessary steps, such as scanning tables, joining data, filtering rows, sorting, and computing expressions. This execution plan is then passed to the query executor component, which carries out the plan and returns the final query results. +### Best practices + +- If your table already has rows, and you create an additional index (for example, `create index i on t (k);`), you must re-run analyze to populate the index `pg_class.reltuples` with the correct row count. [Issue](https://github.com/yugabyte/yugabyte-db/issues/25394) + + If you need to create a new index to replace a old one while your application is running, create the new one first, run analyze, then drop the old one. + +- After you restore a database in YugabyteDB Anywhere or Aeon, you need to run analyze because the statistics that were in the database when it was backed up do not get restored. + + For consistent RTO from restore in a cost model-enabled environment, run analyze manually after restore has finished and before you allow end users into the application. If you allow end users into the application before analyze is finished, initial execution plans won't be correctly optimized. + ## Learn more - [Exploring the Cost Based Optimizer](https://www.yugabyte.com/blog/yugabytedb-cost-based-optimizer/) diff --git a/docs/content/v2025.1/explore/query-1-performance/_index.md b/docs/content/v2025.1/explore/query-1-performance/_index.md index 24b7d27e7f5c..2a238733d210 100644 --- a/docs/content/v2025.1/explore/query-1-performance/_index.md +++ b/docs/content/v2025.1/explore/query-1-performance/_index.md @@ -1,6 +1,6 @@ --- title: Query Tuning -headerTitle: Query Tuning +headerTitle: Query tuning linkTitle: Query tuning description: Tuning and optimizing query performance headcontent: Optimize query performance @@ -15,7 +15,7 @@ showRightNav: true Query tuning is the art and science of improving the performance of SQL queries. It involves understanding the database's architecture, query execution plans, and performance metrics. By identifying and addressing performance bottlenecks, you can significantly enhance the responsiveness of your applications and reduce the load on your database infrastructure. -This guide provides a comprehensive overview of query tuning techniques for distributed SQL databases. The following sections explore various strategies, best practices, and tools to help you optimize your queries and achieve optimal performance. +This guide provides an overview of query tuning techniques for distributed SQL databases, including strategies, best practices, and tools to help you optimize queries, and achieve optimal performance. ## Identify slow queries @@ -27,7 +27,7 @@ Learn how to fetch query statistics and improve performance using [pg_stat_state ## Column statistics -The pg_stats view provides a user-friendly display of the column-level data distribution of tables. This view includes information about table columns, such as the fraction of null entries, average width, number of distinct values, and most common values. These statistics are crucial for the query planner to make informed decisions about the most efficient way to execute queries. By regularly analyzing the statistics in pg_stats, you can identify opportunities for optimization, such as creating or dropping indexes, and fine-tune your database configuration for optimal performance. +The pg_stats view provides a user-friendly display of the column-level data distribution of tables. This view includes information about table columns, such as the fraction of null entries, average width, number of distinct values, and most common values. These statistics are crucial for the query planner to make informed decisions about the most efficient way to execute queries. By regularly analyzing the statistics in pg_stats, you can identify opportunities for optimization (such as creating or dropping indexes), and fine-tune your database configuration for optimal performance. {{}} Learn how to understand column level statistics and improve query performance using [pg_stats](./pg-stats/). @@ -59,16 +59,18 @@ $ ./bin/yb-tserver --ysql_log_min_duration_statement 1000 Results are written to the current `postgres*log` file. -{{< note title="Note" >}} +(Depending on the database and the work being performed, long-running queries don't necessarily need to be optimized. Ensure that the threshold is high enough so that you don't flood the `postgres*log` log files.) -Depending on the database and the work being performed, long-running queries don't necessarily need to be optimized. +{{}} +Learn more about [YB-TServer logs](/preview/explore/observability/logging/). +{{}} -Ensure that the threshold is high enough so that you don't flood the `postgres*log` log files. +## Auto Analyze -{{< /note >}} +To create optimal plans for queries, the query planner needs accurate and up-to-date statistics related to tables and their columns. ANALYZE collects statistics about the contents of tables in the database, and stores the results in the `pg_statistic` system catalog. Similar to [PostgreSQL autovacuum](https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM), the YugabyteDB Auto Analyze service automates the execution of ANALYZE commands for any table where rows have changed more than a configurable threshold for the table. This ensures table statistics are always up-to-date. -{{}} -Learn more about [YB-TServer logs](/preview/explore/observability/logging/). +{{}} +To learn more, see [Auto Analyze service](./auto-analyze/). {{}} ## Export query diagnostics diff --git a/docs/content/v2025.1/explore/query-1-performance/auto-analyze.md b/docs/content/v2025.1/explore/query-1-performance/auto-analyze.md new file mode 100644 index 000000000000..8a0b23a421a5 --- /dev/null +++ b/docs/content/v2025.1/explore/query-1-performance/auto-analyze.md @@ -0,0 +1,89 @@ +--- +title: Auto Analyze service +headerTitle: Auto Analyze service +linkTitle: Auto Analyze +description: Use the Auto Analyze service to keep table statistics up to date +headcontent: Keep table statistics up to date automatically +tags: + feature: early-access +menu: + v2025.1: + identifier: auto-analyze + parent: query-tuning + weight: 700 +type: docs +--- + +To create optimal plans for queries, the query planner needs accurate and up-to-date statistics related to tables and their columns. These statistics are also used by the YugabyteDB [cost-based optimizer](../../../reference/configuration/yb-tserver/#yb-enable-base-scans-cost-model) (CBO) to create optimal execution plans for queries. To generate the statistics, you run the [ANALYZE](../../../api/ysql/the-sql-language/statements/cmd_analyze/) command. ANALYZE collects statistics about the contents of tables in the database, and stores the results in the `pg_statistic` system catalog. + +Similar to [PostgreSQL autovacuum](https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM), the YugabyteDB Auto Analyze service automates the execution of ANALYZE commands for any table where rows have changed more than a configurable threshold for the table. This ensures table statistics are always up-to-date. + +## Enable Auto Analyze + +Before you can use the feature, you must enable it by setting `ysql_enable_auto_analyze_service` to true on all YB-Masters, and both `ysql_enable_auto_analyze_service` and `ysql_enable_table_mutation_counter` to true on all YB-Tservers. + +For example, to create a single-node [yugabyted](../../../reference/configuration/yugabyted/) cluster with Auto Analyze enabled, use the following command: + +```sh +./bin/yugabyted start --master_flags "ysql_enable_auto_analyze_service=true" --tserver_flags "ysql_enable_auto_analyze_service=true,ysql_enable_table_mutation_counter=true" +``` + +To enable Auto Analyze on an existing cluster, a rolling restart is required to set `ysql_enable_auto_analyze_service` and `ysql_enable_table_mutation_counter` to true. + +## Configure Auto Analyze + +You can control how frequently the service updates table statistics using the following YB-TServer flags: + +- `ysql_auto_analyze_threshold` - the minimum number of mutations (INSERT, UPDATE, and DELETE) needed to run ANALYZE on a table. Default is 50. +- `ysql_auto_analyze_scale_factor` - a fraction that determines when enough mutations have been accumulated to run ANALYZE for a table. Default is 0.1. + +Increasing either of these flags reduces the frequency of statistics updates. + +If the total number of mutations for a table is greater than its analyze threshold, then the service runs ANALYZE on the table. The analyze threshold of a table is calculated as follows: + +```sh +analyze_threshold = ysql_auto_analyze_threshold + (ysql_auto_analyze_scale_factor * ) +``` + +where `` is the current `reltuples` column value stored in the `pg_class` catalog. + +`ysql_auto_analyze_threshold` is important for small tables. With default settings, if a table has 100 rows and 20 are mutated, ANALYZE won't run as the threshold is not met, even though 20% of the rows are mutated. + +On the other hand, `ysql_auto_analyze_scale_factor` is especially important for big tables. If a table has 1,000,000,000 rows, 10% (100,000,000 rows) would have to be mutated before ANALYZE runs. Set the scale factor to a lower value to allow for more frequent statistics collection for such large tables. + +In addition, `ysql_auto_analyze_batch_size` controls the maximum number of tables the Auto Analyze service tries to analyze in a single ANALYZE statement. The default is 10. Setting this flag to a larger value can potentially reduce the number of YSQL catalog cache refreshes if Auto Analyze decides to ANALYZE many tables in the same database at the same time. + +For more information on flags used to configure the Auto Analyze service, refer to [Auto Analyze service flags](../../../reference/configuration/yb-tserver#auto-analyze-service-flags). + +## Example + +With Auto Analyze enabled, try the following SQL statements. + +```sql +CREATE TABLE test (k INT PRIMARY KEY, v INT); +SELECT reltuples FROM pg_class WHERE relname = 'test'; +``` + +```output + reltuples +----------- + -1 +(1 row) +``` + +```sql +INSERT INTO test SELECT i, i FROM generate_series(1, 100) i; +-- Wait for few seconds +SELECT reltuples FROM pg_class WHERE relname = 'test'; +``` + +```output + reltuples +----------- + 100 +(1 row) +``` + +## Limitations + +Because ANALYZE is a DDL statement, it can cause DDL conflicts when run concurrently with other DDL statements. As Auto Analyze runs ANALYZE in the background, you should turn off Auto Analyze if you want to execute DDL statements. You can do this by setting `ysql_enable_auto_analyze_service` to false on all YB-TServers at runtime. diff --git a/docs/content/v2025.1/reference/configuration/yb-master.md b/docs/content/v2025.1/reference/configuration/yb-master.md index 0df24a0363a9..feb9d049a3a3 100644 --- a/docs/content/v2025.1/reference/configuration/yb-master.md +++ b/docs/content/v2025.1/reference/configuration/yb-master.md @@ -1026,6 +1026,18 @@ expensive when the number of YB-TServers, or the number of databases goes up. {{< /note >}} +## Auto Analyze service flags + +Auto analyze is {{}}. + +See also [Auto Analyze Service TServer flags](../yb-tserver/#auto-analyze-service-flags). + +##### ysql_enable_auto_analyze_service + +{{}}Enable the Auto Analyze service, which automatically runs ANALYZE to update table statistics for tables that have changed more than a configurable threshold. + +Default: false + ## Advisory lock flags Support for advisory locks is {{}}. diff --git a/docs/content/v2025.1/reference/configuration/yb-tserver.md b/docs/content/v2025.1/reference/configuration/yb-tserver.md index a0d7a8b5f520..30f3e164fb02 100644 --- a/docs/content/v2025.1/reference/configuration/yb-tserver.md +++ b/docs/content/v2025.1/reference/configuration/yb-tserver.md @@ -2048,6 +2048,109 @@ expensive when the number of YB-TServers, or the number of databases goes up. {{< /note >}} +### Auto Analyze service flags + +Auto analyze is {{}}. + +{{< note title="Note" >}} + +To fully enable the Auto Analyze service, you need to enable `ysql_enable_auto_analyze_service` on all YB-Masters and YB-TServers, and `ysql_enable_table_mutation_counter` on all YB-TServers. + +{{< /note >}} + +See also [Auto Analyze Service Master flags](../yb-master#auto-analyze-service-flags). + +##### ysql_enable_auto_analyze_service + +{{% tags/wrap %}} +{{}} +Default: `false` +{{% /tags/wrap %}} + +Enable the Auto Analyze service, which automatically runs ANALYZE to update table statistics for tables that have changed more than a configurable threshold. + +##### ysql_enable_table_mutation_counter + +{{% tags/wrap %}} + + +Default: `false` +{{% /tags/wrap %}} + +Enable per table mutation (INSERT, UPDATE, DELETE) counting. The Auto Analyze service runs ANALYZE when the number of mutations of a table exceeds the threshold determined by the [ysql_auto_analyze_threshold](#ysql-auto-analyze-threshold) and [ysql_auto_analyze_scale_factor](#ysql-auto-analyze-scale-factor) settings. + +##### ysql_auto_analyze_threshold + +{{% tags/wrap %}} + + +Default: `50` +{{% /tags/wrap %}} + +The minimum number of mutations needed to run ANALYZE on a table. + +##### ysql_auto_analyze_scale_factor + +{{% tags/wrap %}} + + +Default: `0.1` +{{% /tags/wrap %}} + +The fraction defining when sufficient mutations have been accumulated to run ANALYZE for a table. + +ANALYZE runs when the mutation count exceeds `ysql_auto_analyze_scale_factor * + ysql_auto_analyze_threshold`, where table_size is the value of the `reltuples` column in the `pg_class` catalog. + +##### ysql_auto_analyze_batch_size + +{{% tags/wrap %}} + + +Default: `10` +{{% /tags/wrap %}} + +The maximum number of tables the Auto Analyze service tries to analyze in a single ANALYZE statement. + +##### ysql_cluster_level_mutation_persist_interval_ms + +{{% tags/wrap %}} + + +Default: `10000` +{{% /tags/wrap %}} + +Interval at which the reported node level table mutation counts are persisted to the underlying auto-analyze mutations table. + +##### ysql_cluster_level_mutation_persist_rpc_timeout_ms + +{{% tags/wrap %}} + + +Default: `10000` +{{% /tags/wrap %}} + +Timeout for the RPCs used to persist mutation counts in the auto-analyze mutations table. + +##### ysql_node_level_mutation_reporting_interval_ms + +{{% tags/wrap %}} + + +Default: `5000` +{{% /tags/wrap %}} + +Interval, in milliseconds, at which the node-level table mutation counts are sent to the Auto Analyze service, which tracks table mutation counts at the cluster level. + +##### ysql_node_level_mutation_reporting_timeout_ms + +{{% tags/wrap %}} + + +Default: `5000` +{{% /tags/wrap %}} + +Timeout, in milliseconds, for the node-level mutation reporting RPC to the Auto Analyze service. + ### Advisory lock flags Support for advisory locks is {{}}.