From ac46d8eaba215e9f7b16a2c674e0988f7abb0a15 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Thu, 11 Sep 2025 17:44:03 -0400 Subject: [PATCH 1/7] document partial stats --- .../_includes/v23.2/misc/session-vars.md | 1 + .../v23.2/misc/table-storage-parameters.md | 2 +- .../_includes/v24.1/misc/session-vars.md | 1 + .../v24.1/misc/table-storage-parameters.md | 2 +- .../_includes/v24.2/misc/session-vars.md | 1 + .../v24.2/misc/table-storage-parameters.md | 2 +- .../_includes/v24.3/misc/session-vars.md | 2 + .../v24.3/misc/table-storage-parameters.md | 2 +- .../_includes/v25.1/misc/session-vars.md | 2 + .../v25.1/misc/table-storage-parameters.md | 22 +++-- .../_includes/v25.2/misc/session-vars.md | 2 + .../v25.2/misc/table-storage-parameters.md | 23 +++-- .../_includes/v25.3/misc/session-vars.md | 2 + .../v25.3/misc/table-storage-parameters.md | 12 ++- .../_includes/v25.4/misc/session-vars.md | 2 + .../v25.4/misc/table-storage-parameters.md | 12 ++- .../_includes/v26.1/misc/session-vars.md | 2 + .../v26.1/misc/table-storage-parameters.md | 12 ++- src/current/v23.2/cost-based-optimizer.md | 33 ++++++- src/current/v23.2/create-statistics.md | 25 ++++- src/current/v23.2/show-statistics.md | 9 +- src/current/v24.1/cost-based-optimizer.md | 33 ++++++- src/current/v24.1/create-statistics.md | 25 ++++- src/current/v24.1/show-statistics.md | 9 +- src/current/v24.2/cost-based-optimizer.md | 33 ++++++- src/current/v24.2/create-statistics.md | 25 ++++- src/current/v24.2/show-statistics.md | 9 +- src/current/v24.3/cost-based-optimizer.md | 35 ++++++- src/current/v24.3/create-statistics.md | 25 ++++- src/current/v24.3/show-statistics.md | 9 +- src/current/v25.1/cost-based-optimizer.md | 92 ++++++++++++++----- src/current/v25.1/create-statistics.md | 22 ++++- src/current/v25.1/show-statistics.md | 9 +- src/current/v25.2/cost-based-optimizer.md | 86 +++++++++++++---- src/current/v25.2/create-statistics.md | 22 ++++- src/current/v25.2/show-statistics.md | 9 +- src/current/v25.3/cost-based-optimizer.md | 86 +++++++++++++---- src/current/v25.3/create-statistics.md | 22 ++++- src/current/v25.3/show-statistics.md | 9 +- src/current/v25.4/cost-based-optimizer.md | 89 ++++++++++++++---- src/current/v25.4/create-statistics.md | 40 +++++++- src/current/v25.4/show-statistics.md | 9 +- src/current/v26.1/cost-based-optimizer.md | 89 ++++++++++++++---- src/current/v26.1/create-statistics.md | 40 +++++++- src/current/v26.1/show-statistics.md | 9 +- 45 files changed, 797 insertions(+), 210 deletions(-) diff --git a/src/current/_includes/v23.2/misc/session-vars.md b/src/current/_includes/v23.2/misc/session-vars.md index b56c253ea9d..29b1fcb2bda 100644 --- a/src/current/_includes/v23.2/misc/session-vars.md +++ b/src/current/_includes/v23.2/misc/session-vars.md @@ -20,6 +20,7 @@ | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v23.2/misc/table-storage-parameters.md b/src/current/_includes/v23.2/misc/table-storage-parameters.md index 32dee5a9aaa..0e60bdb6f05 100644 --- a/src/current/_includes/v23.2/misc/table-storage-parameters.md +++ b/src/current/_includes/v23.2/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | New in v23.2.1: `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v24.1/misc/session-vars.md b/src/current/_includes/v24.1/misc/session-vars.md index 72ed58d4178..b89d13e0976 100644 --- a/src/current/_includes/v24.1/misc/session-vars.md +++ b/src/current/_includes/v24.1/misc/session-vars.md @@ -20,6 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v24.1/misc/table-storage-parameters.md b/src/current/_includes/v24.1/misc/table-storage-parameters.md index 3ca7f601648..51c4fd36db2 100644 --- a/src/current/_includes/v24.1/misc/table-storage-parameters.md +++ b/src/current/_includes/v24.1/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v24.2/misc/session-vars.md b/src/current/_includes/v24.2/misc/session-vars.md index 5a028f4cf5e..55927e4a74a 100644 --- a/src/current/_includes/v24.2/misc/session-vars.md +++ b/src/current/_includes/v24.2/misc/session-vars.md @@ -20,6 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v24.2/misc/table-storage-parameters.md b/src/current/_includes/v24.2/misc/table-storage-parameters.md index 3ca7f601648..51c4fd36db2 100644 --- a/src/current/_includes/v24.2/misc/table-storage-parameters.md +++ b/src/current/_includes/v24.2/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v24.3/misc/session-vars.md b/src/current/_includes/v24.3/misc/session-vars.md index 15c5994c010..793e353cb36 100644 --- a/src/current/_includes/v24.3/misc/session-vars.md +++ b/src/current/_includes/v24.3/misc/session-vars.md @@ -20,6 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | @@ -55,6 +56,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) for cardinality estimation. | `off` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v24.3/misc/table-storage-parameters.md b/src/current/_includes/v24.3/misc/table-storage-parameters.md index 3ca7f601648..51c4fd36db2 100644 --- a/src/current/_includes/v24.3/misc/table-storage-parameters.md +++ b/src/current/_includes/v24.3/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v25.1/misc/session-vars.md b/src/current/_includes/v25.1/misc/session-vars.md index 5c4e5892373..6e700e7e2c6 100644 --- a/src/current/_includes/v25.1/misc/session-vars.md +++ b/src/current/_includes/v25.1/misc/session-vars.md @@ -25,6 +25,7 @@ | `distribute_sort_row_count_threshold` | **New in v25.1:** Minimum number of rows that a sort operation must process in order to be [distributed]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql). | `1000` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes. Distribution preferences for `GROUP BY`, scan, and sort operations are set with [`distribute_group_by_row_count_threshold`](#distribute-group-by-row-count-threshold), [`distribute_scan_row_count_threshold.`](#distribute-scan-row-count-threshold) and [`distribute_sort_row_count_threshold.`](#distribute-sort-row-count-threshold), respectively. All other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_select_for_update` | Indicates whether [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}) statements acquire locks using the `FOR UPDATE` locking mode during their initial row scan, which improves performance for contended workloads.

For more information about how `FOR UPDATE` locking works, see the documentation for [`SELECT FOR UPDATE`]({% link {{ page.version.version }}/select-for-update.md %}). | `on` | Yes | Yes | @@ -59,6 +60,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) for cardinality estimation. | `off` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.1/misc/table-storage-parameters.md b/src/current/_includes/v25.1/misc/table-storage-parameters.md index 3ca7f601648..f5b6ccf9fe4 100644 --- a/src/current/_includes/v25.1/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.1/misc/table-storage-parameters.md @@ -1,15 +1,19 @@ -| Parameter name | Description | Data type | Default value | -|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| -| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | -| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| Parameter name | Description | Data type | Default value | +|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| +| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | {% include_cached new-in.html version="v25.1" %}Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | {% include_cached new-in.html version="v25.1" %}Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | {% include_cached new-in.html version="v25.1" %}Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | +| `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: - `autovacuum_enabled` - `fillfactor` -For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). \ No newline at end of file +For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). diff --git a/src/current/_includes/v25.2/misc/session-vars.md b/src/current/_includes/v25.2/misc/session-vars.md index b700a4ed1f4..df75e86cfea 100644 --- a/src/current/_includes/v25.2/misc/session-vars.md +++ b/src/current/_includes/v25.2/misc/session-vars.md @@ -25,6 +25,7 @@ | `distribute_sort_row_count_threshold` | Minimum number of rows that a sort operation must process in order to be [distributed]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql). | `1000` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes. Distribution preferences for `GROUP BY`, scan, and sort operations are set with [`distribute_group_by_row_count_threshold`](#distribute-group-by-row-count-threshold), [`distribute_scan_row_count_threshold.`](#distribute-scan-row-count-threshold) and [`distribute_sort_row_count_threshold.`](#distribute-sort-row-count-threshold), respectively. All other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_select_for_update` | Indicates whether [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}) statements acquire locks using the `FOR UPDATE` locking mode during their initial row scan, which improves performance for contended workloads.

For more information about how `FOR UPDATE` locking works, see the documentation for [`SELECT FOR UPDATE`]({% link {{ page.version.version }}/select-for-update.md %}). | `on` | Yes | Yes | @@ -59,6 +60,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.2/misc/table-storage-parameters.md b/src/current/_includes/v25.2/misc/table-storage-parameters.md index 3ca7f601648..9e97fe83785 100644 --- a/src/current/_includes/v25.2/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.2/misc/table-storage-parameters.md @@ -1,15 +1,20 @@ -| Parameter name | Description | Data type | Default value | -|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| -| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | -| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| Parameter name | Description | Data type | Default value | +|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| +| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | {% include_cached new-in.html version="v25.2" %} Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | +| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | +| `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: - `autovacuum_enabled` - `fillfactor` -For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). \ No newline at end of file +For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). diff --git a/src/current/_includes/v25.3/misc/session-vars.md b/src/current/_includes/v25.3/misc/session-vars.md index 5d5d657ccd1..703d0e16804 100644 --- a/src/current/_includes/v25.3/misc/session-vars.md +++ b/src/current/_includes/v25.3/misc/session-vars.md @@ -26,6 +26,7 @@ | `distribute_sort_row_count_threshold` | Minimum number of rows that a sort operation must process in order to be [distributed]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql). | `1000` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes. Distribution preferences for `GROUP BY`, scan, and sort operations are set with [`distribute_group_by_row_count_threshold`](#distribute-group-by-row-count-threshold), [`distribute_scan_row_count_threshold.`](#distribute-scan-row-count-threshold) and [`distribute_sort_row_count_threshold.`](#distribute-sort-row-count-threshold), respectively. All other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_select_for_update` | Indicates whether [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}) statements acquire locks using the `FOR UPDATE` locking mode during their initial row scan, which improves performance for contended workloads.

For more information about how `FOR UPDATE` locking works, see the documentation for [`SELECT FOR UPDATE`]({% link {{ page.version.version }}/select-for-update.md %}). | `on` | Yes | Yes | @@ -60,6 +61,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.3/misc/table-storage-parameters.md b/src/current/_includes/v25.3/misc/table-storage-parameters.md index d8f99706496..673ef075a3c 100644 --- a/src/current/_includes/v25.3/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.3/misc/table-storage-parameters.md @@ -1,11 +1,15 @@ | Parameter name | Description | Data type | Default value | |----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | | `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: diff --git a/src/current/_includes/v25.4/misc/session-vars.md b/src/current/_includes/v25.4/misc/session-vars.md index bc2485ec5b9..f34fd9ffd45 100644 --- a/src/current/_includes/v25.4/misc/session-vars.md +++ b/src/current/_includes/v25.4/misc/session-vars.md @@ -34,6 +34,7 @@ | `enable_insert_fast_path` | Indicates whether CockroachDB will use a specialized execution operator for inserting into a table. We recommend leaving this setting `on`. | `on` | Yes | Yes | | `enable_shared_locking_for_serializable` | Indicates whether [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) are enabled for `SERIALIZABLE` transactions. When `off`, `SELECT` statements using `FOR SHARE` are still permitted under `SERIALIZABLE` isolation, but silently do not lock. | `off` | Yes | Yes | | `enable_super_regions` | When enabled, you can define a super region: a set of [database regions]({% link {{ page.version.version }}/multiregion-overview.md %}#super-regions) on a multi-region cluster such that your [schema objects]({% link {{ page.version.version }}/schema-design-overview.md %}#database-schema-objects) will have all of their [replicas]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) stored _only_ in regions that are members of the super region. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_zigzag_join` | Indicates whether the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) will plan certain queries using a [zigzag merge join algorithm]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins), which searches for the desired intersection by jumping back and forth between the indexes based on the fact that after constraining indexes, they share an ordering. | `on` | Yes | Yes | | `enforce_home_region` | If set to `on`, queries return an error and in some cases a suggested resolution if they cannot run entirely in their home region. This can occur if a query has no home region (for example, if it reads from different home regions in a [regional by row table]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables)) or a query's home region differs from the [gateway]({% link {{ page.version.version }}/architecture/life-of-a-distributed-transaction.md %}#gateway) region. Note that only tables with `ZONE` [survivability]({% link {{ page.version.version }}/multiregion-survival-goals.md %}#when-to-use-zone-vs-region-survival-goals) can be scanned without error when this is enabled. For more information about home regions, see [Table localities]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities).

This feature is in preview. It is subject to change. | `off` | Yes | Yes | | `enforce_home_region_follower_reads_enabled` | If `on` while the [`enforce_home_region`]({% link {{ page.version.version }}/cost-based-optimizer.md %}#control-whether-queries-are-limited-to-a-single-region) setting is `on`, allows `enforce_home_region` to perform `AS OF SYSTEM TIME` [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) to detect and report a query's [home region]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities), if any.

This feature is in preview. It is subject to change. | `off` | Yes | Yes | @@ -61,6 +62,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.4/misc/table-storage-parameters.md b/src/current/_includes/v25.4/misc/table-storage-parameters.md index d8f99706496..673ef075a3c 100644 --- a/src/current/_includes/v25.4/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.4/misc/table-storage-parameters.md @@ -1,11 +1,15 @@ | Parameter name | Description | Data type | Default value | |----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | | `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: diff --git a/src/current/_includes/v26.1/misc/session-vars.md b/src/current/_includes/v26.1/misc/session-vars.md index 39c8e0e75e3..1e78e7b1cf5 100644 --- a/src/current/_includes/v26.1/misc/session-vars.md +++ b/src/current/_includes/v26.1/misc/session-vars.md @@ -34,6 +34,7 @@ | `enable_insert_fast_path` | Indicates whether CockroachDB will use a specialized execution operator for inserting into a table. We recommend leaving this setting `on`. | `on` | Yes | Yes | | `enable_shared_locking_for_serializable` | Indicates whether [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) are enabled for `SERIALIZABLE` transactions. When `off`, `SELECT` statements using `FOR SHARE` are still permitted under `SERIALIZABLE` isolation, but silently do not lock. | `off` | Yes | Yes | | `enable_super_regions` | When enabled, you can define a super region: a set of [database regions]({% link {{ page.version.version }}/multiregion-overview.md %}#super-regions) on a multi-region cluster such that your [schema objects]({% link {{ page.version.version }}/schema-design-overview.md %}#database-schema-objects) will have all of their [replicas]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) stored _only_ in regions that are members of the super region. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_zigzag_join` | Indicates whether the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) will plan certain queries using a [zigzag merge join algorithm]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins), which searches for the desired intersection by jumping back and forth between the indexes based on the fact that after constraining indexes, they share an ordering. | `on` | Yes | Yes | | `enforce_home_region` | If set to `on`, queries return an error and in some cases a suggested resolution if they cannot run entirely in their home region. This can occur if a query has no home region (for example, if it reads from different home regions in a [regional by row table]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables)) or a query's home region differs from the [gateway]({% link {{ page.version.version }}/architecture/life-of-a-distributed-transaction.md %}#gateway) region. Note that only tables with `ZONE` [survivability]({% link {{ page.version.version }}/multiregion-survival-goals.md %}#when-to-use-zone-vs-region-survival-goals) can be scanned without error when this is enabled. For more information about home regions, see [Table localities]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities).

This feature is in preview. It is subject to change. | `off` | Yes | Yes | | `enforce_home_region_follower_reads_enabled` | If `on` while the [`enforce_home_region`]({% link {{ page.version.version }}/cost-based-optimizer.md %}#control-whether-queries-are-limited-to-a-single-region) setting is `on`, allows `enforce_home_region` to perform `AS OF SYSTEM TIME` [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) to detect and report a query's [home region]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities), if any.

This feature is in preview. It is subject to change. | `off` | Yes | Yes | @@ -61,6 +62,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v26.1/misc/table-storage-parameters.md b/src/current/_includes/v26.1/misc/table-storage-parameters.md index d8f99706496..ffc13dd7cf1 100644 --- a/src/current/_includes/v26.1/misc/table-storage-parameters.md +++ b/src/current/_includes/v26.1/misc/table-storage-parameters.md @@ -1,11 +1,15 @@ | Parameter name | Description | Data type | Default value | |----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. `sql_stats_automatic_collection_enabled` must be `true`. | Boolean | `true` | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. `sql_stats_automatic_collection_enabled` must be `true`. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | | `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: diff --git a/src/current/v23.2/cost-based-optimizer.md b/src/current/v23.2/cost-based-optimizer.md index 294d9e75f2a..43f8b5d8b84 100644 --- a/src/current/v23.2/cost-based-optimizer.md +++ b/src/current/v23.2/cost-based-optimizer.md @@ -34,8 +34,28 @@ By default, CockroachDB also automatically collects [multi-column statistics]({% [Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). {{site.data.alerts.end}} +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. + +{{site.data.alerts.callout_info}} +Partial statistics can only be collected if full statistics already exist for the table. +{{site.data.alerts.end}} + +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + ### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -101,7 +121,7 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +177,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v23.2/create-statistics.md b/src/current/v23.2/create-statistics.md index c99f607bf6a..0aa2927d1b5 100644 --- a/src/current/v23.2/create-statistics.md +++ b/src/current/v23.2/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT`]({% link {{ page.version.version }}/import.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v23.2/show-statistics.md b/src/current/v23.2/show-statistics.md index 4b50d855b6b..df2014abaac 100644 --- a/src/current/v23.2/show-statistics.md +++ b/src/current/v23.2/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v24.1/cost-based-optimizer.md b/src/current/v24.1/cost-based-optimizer.md index f680fdb8a80..774dcbb8737 100644 --- a/src/current/v24.1/cost-based-optimizer.md +++ b/src/current/v24.1/cost-based-optimizer.md @@ -34,8 +34,28 @@ By default, CockroachDB also automatically collects [multi-column statistics]({% [Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). {{site.data.alerts.end}} +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. + +{{site.data.alerts.callout_info}} +Partial statistics can only be collected if full statistics already exist for the table. +{{site.data.alerts.end}} + +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + ### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -101,7 +121,7 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +177,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.1/create-statistics.md b/src/current/v24.1/create-statistics.md index 1dfa622c46b..719b4451280 100644 --- a/src/current/v24.1/create-statistics.md +++ b/src/current/v24.1/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v24.1/show-statistics.md b/src/current/v24.1/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v24.1/show-statistics.md +++ b/src/current/v24.1/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v24.2/cost-based-optimizer.md b/src/current/v24.2/cost-based-optimizer.md index b7eef38e3e4..80dcb76a0ed 100644 --- a/src/current/v24.2/cost-based-optimizer.md +++ b/src/current/v24.2/cost-based-optimizer.md @@ -34,8 +34,28 @@ By default, CockroachDB also automatically collects [multi-column statistics]({% [Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). {{site.data.alerts.end}} +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. + +{{site.data.alerts.callout_info}} +Partial statistics can only be collected if full statistics already exist for the table. +{{site.data.alerts.end}} + +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + ### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -101,7 +121,7 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +177,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.2/create-statistics.md b/src/current/v24.2/create-statistics.md index 1dfa622c46b..719b4451280 100644 --- a/src/current/v24.2/create-statistics.md +++ b/src/current/v24.2/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v24.2/show-statistics.md b/src/current/v24.2/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v24.2/show-statistics.md +++ b/src/current/v24.2/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v24.3/cost-based-optimizer.md b/src/current/v24.3/cost-based-optimizer.md index 024b6d883af..80576d0a4ab 100644 --- a/src/current/v24.3/cost-based-optimizer.md +++ b/src/current/v24.3/cost-based-optimizer.md @@ -34,8 +34,30 @@ By default, CockroachDB also automatically collects [multi-column statistics]({% [Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). {{site.data.alerts.end}} +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. + +{{site.data.alerts.callout_info}} +Partial statistics can only be collected if full statistics already exist for the table. +{{site.data.alerts.end}} + +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + +{% include_cached new-in.html version="v24.3" %} The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + ### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -101,7 +123,7 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +179,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.3/create-statistics.md b/src/current/v24.3/create-statistics.md index 1dfa622c46b..719b4451280 100644 --- a/src/current/v24.3/create-statistics.md +++ b/src/current/v24.3/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v24.3/show-statistics.md b/src/current/v24.3/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v24.3/show-statistics.md +++ b/src/current/v24.3/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.1/cost-based-optimizer.md b/src/current/v25.1/cost-based-optimizer.md index 024b6d883af..d6ed0ce0c84 100644 --- a/src/current/v25.1/cost-based-optimizer.md +++ b/src/current/v25.1/cost-based-optimizer.md @@ -23,22 +23,28 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. - -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +61,9 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,7 +71,7 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +##### Configure non-default statistics retention By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). @@ -73,9 +79,44 @@ Historical statistics on non-default column sets should not be retained indefini CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). + +Partial statistics have the following requirements: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. + +The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +#### Automatically collect partial statistics + +{% include_cached new-in.html version="v25.1" %} Partial statistics are automatically collected on the highest and lowest index values when: + +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. + +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + ### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -99,9 +140,9 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp ### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +198,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +By default, the optimizer uses forecasts that closely match the historical statistics. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +244,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. @@ -312,8 +358,6 @@ Two types of plans can be cached: Generic plans are **not** included in the plan cache, but are cached per session. This means that they must still be re-optimized each time a session prepares a statement using a generic plan. To reuse generic query plans for maximum performance, a prepared statement should be executed multiple times instead of prepared and executed once. - This feature is in [preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}) and is subject to change. - {{site.data.alerts.callout_success}} Generic query plans will only benefit workloads that use prepared statements, which are issued via explicit `PREPARE` statements or by client libraries using the [PostgreSQL extended wire protocol](https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY). Generic query plans are most beneficial for queries with high planning times, such as queries with many [joins]({% link {{ page.version.version }}/joins.md %}). For more information on reducing planning time for such queries, refer to [Reduce planning time for queries with many joins](#reduce-planning-time-for-queries-with-many-joins). {{site.data.alerts.end}} @@ -322,15 +366,15 @@ To change the type of plan that is cached, use the [`plan_cache_mode`]({% link { The following modes can be set: -- `force_custom_plan` (default): Force the use of custom plans. +- `auto` (default): Automatically determine whether to use custom or generic query plans for prepared statements. Custom plans are used for the first five statement executions. Subsequent executions use a generic plan if its estimated cost is not significantly higher than the average cost of the preceding custom plans. +- `force_custom_plan`: Force the use of custom plans. - `force_generic_plan`: Force the use of generic plans. -- `auto`: Automatically determine whether to use custom or generic query plans for prepared statements. Custom plans are used for the first five statement executions. Subsequent executions use a generic plan if its estimated cost is not significantly higher than the average cost of the preceding custom plans. {{site.data.alerts.callout_info}} Generic plans are always used for non-prepared statements that do not contain placeholders or [stable functions]({% link {{ page.version.version }}/functions-and-operators.md %}#function-volatility), regardless of the `plan_cache_mode` setting. {{site.data.alerts.end}} -In some cases, generic query plans are less efficient than custom plans. For this reason, Cockroach Labs recommends setting `plan_cache_mode` to `auto` instead of `force_generic_plan`. Under the `auto` setting, the optimizer avoids bad generic plans by falling back to custom plans. For example: +In some cases, generic query plans are less efficient than custom plans. For this reason, Cockroach Labs recommends setting `plan_cache_mode` to `auto` (the default mode) instead of `force_generic_plan`. Under the `auto` setting, the optimizer avoids bad generic plans by falling back to custom plans. For example: Set `plan_cache_mode` to `auto` at the session level: diff --git a/src/current/v25.1/create-statistics.md b/src/current/v25.1/create-statistics.md index 1dfa622c46b..69cc07120fd 100644 --- a/src/current/v25.1/create-statistics.md +++ b/src/current/v25.1/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,26 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.1/show-statistics.md b/src/current/v25.1/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.1/show-statistics.md +++ b/src/current/v25.1/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.2/cost-based-optimizer.md b/src/current/v25.2/cost-based-optimizer.md index f17e44bf8cb..7840d7db870 100644 --- a/src/current/v25.2/cost-based-optimizer.md +++ b/src/current/v25.2/cost-based-optimizer.md @@ -23,22 +23,34 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +{% include_cached new-in.html version="v25.2" %} To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| +| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +67,9 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,7 +77,7 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +##### Configure non-default statistics retention By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). @@ -73,9 +85,44 @@ Historical statistics on non-default column sets should not be retained indefini CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). + +Partial statistics have the following requirements: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. + +The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +#### Automatically collect partial statistics + +Partial statistics are automatically collected on the highest and lowest index values when: + +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. + +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + ### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -99,9 +146,9 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp ### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +204,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +250,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v25.2/create-statistics.md b/src/current/v25.2/create-statistics.md index 1dfa622c46b..69cc07120fd 100644 --- a/src/current/v25.2/create-statistics.md +++ b/src/current/v25.2/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,26 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.2/show-statistics.md b/src/current/v25.2/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.2/show-statistics.md +++ b/src/current/v25.2/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.3/cost-based-optimizer.md b/src/current/v25.3/cost-based-optimizer.md index f17e44bf8cb..bf7cddf3e7b 100644 --- a/src/current/v25.3/cost-based-optimizer.md +++ b/src/current/v25.3/cost-based-optimizer.md @@ -23,22 +23,34 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| +| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +67,9 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,7 +77,7 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +##### Configure non-default statistics retention By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). @@ -73,9 +85,44 @@ Historical statistics on non-default column sets should not be retained indefini CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). + +Partial statistics have the following requirements: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. + +The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +#### Automatically collect partial statistics + +Partial statistics are automatically collected on the highest and lowest index values when: + +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. + +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + ### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -99,9 +146,9 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp ### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +204,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +250,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v25.3/create-statistics.md b/src/current/v25.3/create-statistics.md index 1dfa622c46b..69cc07120fd 100644 --- a/src/current/v25.3/create-statistics.md +++ b/src/current/v25.3/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,26 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.3/show-statistics.md b/src/current/v25.3/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.3/show-statistics.md +++ b/src/current/v25.3/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.4/cost-based-optimizer.md b/src/current/v25.4/cost-based-optimizer.md index f17e44bf8cb..fe35018f9e2 100644 --- a/src/current/v25.4/cost-based-optimizer.md +++ b/src/current/v25.4/cost-based-optimizer.md @@ -23,22 +23,34 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| +| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +67,9 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,7 +77,7 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +##### Configure non-default statistics retention By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). @@ -73,9 +85,47 @@ Historical statistics on non-default column sets should not be retained indefini CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics) at extremes and on specific data. + +Partial statistics have the following requirements: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. + +The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +#### Automatically collect partial statistics + +Partial statistics are automatically collected on the highest and lowest index values when: + +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. + +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on: + +- The highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) +- {% include_cached new-in.html version="v25.4" %} Specific columns and values, using the `WHERE` clause: [`CREATE STATISTICS stats ON column FROM table WHERE condition`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-on-specific-data) + ### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -99,9 +149,9 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp ### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +207,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +253,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v25.4/create-statistics.md b/src/current/v25.4/create-statistics.md index 1dfa622c46b..7e3eb52bbab 100644 --- a/src/current/v25.4/create-statistics.md +++ b/src/current/v25.4/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,44 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + +### Create partial statistics on specific data + +{% include_cached new-in.html version="v25.4" %} To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) on a specific column and values matching specific conditions, ensure the column is indexed: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE INDEX ON rides (start_time); +~~~ + +Partial statistics are particularly valuable for timestamp columns where workloads commonly access the most recent data: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS recent_rides_stats ON start_time FROM rides WHERE start_time > '2023-01-01'; +~~~ + +This creates statistics only on rides that started after January 1, 2023, allowing the optimizer to have accurate statistics for recent data without scanning the entire table. + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.4/show-statistics.md b/src/current/v25.4/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.4/show-statistics.md +++ b/src/current/v25.4/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v26.1/cost-based-optimizer.md b/src/current/v26.1/cost-based-optimizer.md index b2b78ce8aae..36c1309be6d 100644 --- a/src/current/v26.1/cost-based-optimizer.md +++ b/src/current/v26.1/cost-based-optimizer.md @@ -23,22 +23,34 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| +| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +67,9 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,7 +77,7 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +##### Configure non-default statistics retention By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). @@ -73,9 +85,47 @@ Historical statistics on non-default column sets should not be retained indefini CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics) at extremes and on specific data. + +Partial statistics have the following requirements: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. + +The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +#### Automatically collect partial statistics + +Partial statistics are automatically collected on the highest and lowest index values when: + +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. + +| Cluster setting | Table storage parameter | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on: + +- The highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) +- Specific columns and values, using the `WHERE` clause: [`CREATE STATISTICS stats ON column FROM table WHERE condition`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-on-specific-data) + ### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -99,9 +149,9 @@ To learn how to manually generate statistics, see the [`CREATE STATISTICS` examp ### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -157,9 +207,16 @@ sql_stats_automatic_collection_min_stale_rows = 2000); Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -### Enable and disable forecasted statistics for tables +### Forecasted statistics + +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +By default, the optimizer uses forecasts that closely match the historical statistics. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +253,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v26.1/create-statistics.md b/src/current/v26.1/create-statistics.md index 1dfa622c46b..c94252fa694 100644 --- a/src/current/v26.1/create-statistics.md +++ b/src/current/v26.1/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,44 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. + +You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + +### Create partial statistics on specific data + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) on a specific column and values matching specific conditions, ensure the column is indexed: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE INDEX ON rides (start_time); +~~~ + +Partial statistics are particularly valuable for timestamp columns where workloads commonly access the most recent data: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS recent_rides_stats ON start_time FROM rides WHERE start_time > '2023-01-01'; +~~~ + +This creates partial statistics covering only rows where `start_time` is greater than `2023-01-01`, providing focused statistics on the most recently accessed data. + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v26.1/show-statistics.md b/src/current/v26.1/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v26.1/show-statistics.md +++ b/src/current/v26.1/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ From 4260af3827c04474f8367707911adb1cef6f73b3 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Thu, 8 Jan 2026 16:30:13 -0500 Subject: [PATCH 2/7] follow-up structural revisions --- src/current/v23.2/cost-based-optimizer.md | 67 +++++++---------- src/current/v24.1/cost-based-optimizer.md | 65 ++++++----------- src/current/v24.2/cost-based-optimizer.md | 65 ++++++----------- src/current/v24.3/cost-based-optimizer.md | 87 +++++++++++------------ src/current/v25.1/cost-based-optimizer.md | 57 +++++++-------- src/current/v25.2/cost-based-optimizer.md | 59 +++++++-------- src/current/v25.3/cost-based-optimizer.md | 59 +++++++-------- src/current/v25.4/cost-based-optimizer.md | 61 ++++++++-------- src/current/v26.1/cost-based-optimizer.md | 61 ++++++++-------- 9 files changed, 264 insertions(+), 317 deletions(-) diff --git a/src/current/v23.2/cost-based-optimizer.md b/src/current/v23.2/cost-based-optimizer.md index 43f8b5d8b84..388f69443d3 100644 --- a/src/current/v23.2/cost-based-optimizer.md +++ b/src/current/v23.2/cost-based-optimizer.md @@ -23,17 +23,6 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. - -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. - -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - The optimizer can use three types of statistics to plan queries: - [Full statistics](#full-statistics) @@ -46,19 +35,16 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: -{{site.data.alerts.callout_info}} -Partial statistics can only be collected if full statistics already exist for the table. -{{site.data.alerts.end}} +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -75,7 +61,7 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -85,17 +71,23 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +Partial statistics have the following constraints: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -117,7 +109,7 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -158,24 +150,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v24.1/cost-based-optimizer.md b/src/current/v24.1/cost-based-optimizer.md index 774dcbb8737..a441da6d19a 100644 --- a/src/current/v24.1/cost-based-optimizer.md +++ b/src/current/v24.1/cost-based-optimizer.md @@ -23,17 +23,6 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. - -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. - -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - The optimizer can use three types of statistics to plan queries: - [Full statistics](#full-statistics) @@ -46,17 +35,14 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: -{{site.data.alerts.callout_info}} -Partial statistics can only be collected if full statistics already exist for the table. -{{site.data.alerts.end}} +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -75,7 +61,7 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -85,17 +71,23 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +Partial statistics have the following constraints: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -117,7 +109,7 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -158,24 +150,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v24.2/cost-based-optimizer.md b/src/current/v24.2/cost-based-optimizer.md index 80dcb76a0ed..b95fac0f42e 100644 --- a/src/current/v24.2/cost-based-optimizer.md +++ b/src/current/v24.2/cost-based-optimizer.md @@ -23,17 +23,6 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. - -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. - -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - The optimizer can use three types of statistics to plan queries: - [Full statistics](#full-statistics) @@ -46,17 +35,14 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: -{{site.data.alerts.callout_info}} -Partial statistics can only be collected if full statistics already exist for the table. -{{site.data.alerts.end}} +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -75,7 +61,7 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -85,17 +71,23 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +Partial statistics have the following constraints: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -117,7 +109,7 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -158,24 +150,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v24.3/cost-based-optimizer.md b/src/current/v24.3/cost-based-optimizer.md index 80576d0a4ab..c1ae0035a88 100644 --- a/src/current/v24.3/cost-based-optimizer.md +++ b/src/current/v24.3/cost-based-optimizer.md @@ -23,17 +23,6 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. - -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. - -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - The optimizer can use three types of statistics to plan queries: - [Full statistics](#full-statistics) @@ -46,19 +35,14 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. - -{{site.data.alerts.callout_info}} -Partial statistics can only be collected if full statistics already exist for the table. -{{site.data.alerts.end}} +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. -{% include_cached new-in.html version="v24.3" %} The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -77,7 +61,25 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. + +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -87,17 +89,25 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +Partial statistics have the following constraints: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -### Enable and disable automatic statistics collection for clusters +{% include_cached new-in.html version="v24.3" %} The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +### Toggle automatic statistics collection -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -119,7 +129,7 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -160,24 +170,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v25.1/cost-based-optimizer.md b/src/current/v25.1/cost-based-optimizer.md index d6ed0ce0c84..a344a433666 100644 --- a/src/current/v25.1/cost-based-optimizer.md +++ b/src/current/v25.1/cost-based-optimizer.md @@ -61,6 +61,24 @@ Full statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. + ##### Small versus large table examples Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -71,25 +89,17 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -##### Configure non-default statistics retention - -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - ### Partial statistics *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -Partial statistics have the following requirements: +Partial statistics have the following constraints: - Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. @@ -114,7 +124,9 @@ To control automatic collection of partial statistics, use the following [cluste You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: @@ -138,7 +150,7 @@ Automatic statistics collection is enabled by default. To disable automatic [ful To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -179,24 +191,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v25.2/cost-based-optimizer.md b/src/current/v25.2/cost-based-optimizer.md index 7840d7db870..e2dd3084647 100644 --- a/src/current/v25.2/cost-based-optimizer.md +++ b/src/current/v25.2/cost-based-optimizer.md @@ -67,6 +67,24 @@ Full statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. + ##### Small versus large table examples Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -77,27 +95,19 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -##### Configure non-default statistics retention - -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - ### Partial statistics *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -Partial statistics have the following requirements: +Partial statistics have the following constraints: - Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. #### Automatically collect partial statistics @@ -120,7 +130,9 @@ To control automatic collection of partial statistics, use the following [cluste You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: @@ -144,7 +156,7 @@ Automatic statistics collection is enabled by default. To disable automatic [ful To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -185,24 +197,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v25.3/cost-based-optimizer.md b/src/current/v25.3/cost-based-optimizer.md index bf7cddf3e7b..efbe4269fa0 100644 --- a/src/current/v25.3/cost-based-optimizer.md +++ b/src/current/v25.3/cost-based-optimizer.md @@ -67,6 +67,24 @@ Full statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. + ##### Small versus large table examples Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -77,27 +95,19 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -##### Configure non-default statistics retention - -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - ### Partial statistics *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -Partial statistics have the following requirements: +Partial statistics have the following constraints: - Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. #### Automatically collect partial statistics @@ -120,7 +130,9 @@ To control automatic collection of partial statistics, use the following [cluste You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: @@ -144,7 +156,7 @@ Automatic statistics collection is enabled by default. To disable automatic [ful To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -185,24 +197,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v25.4/cost-based-optimizer.md b/src/current/v25.4/cost-based-optimizer.md index fe35018f9e2..eed89e15e23 100644 --- a/src/current/v25.4/cost-based-optimizer.md +++ b/src/current/v25.4/cost-based-optimizer.md @@ -67,6 +67,24 @@ Full statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. + ##### Small versus large table examples Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -77,27 +95,19 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -##### Configure non-default statistics retention - -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - ### Partial statistics *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics) at extremes and on specific data. +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -Partial statistics have the following requirements: +Partial statistics have the following constraints: - Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, each specified column must be the first key column of a non-inverted index. When using the `WHERE` clause, the predicate must also filter on the index column. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. #### Automatically collect partial statistics @@ -123,7 +133,9 @@ You can manually create partial statistics on: - The highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) - {% include_cached new-in.html version="v25.4" %} Specific columns and values, using the `WHERE` clause: [`CREATE STATISTICS stats ON column FROM table WHERE condition`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-on-specific-data) -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: @@ -147,7 +159,7 @@ Automatic statistics collection is enabled by default. To disable automatic [ful To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -188,24 +200,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics diff --git a/src/current/v26.1/cost-based-optimizer.md b/src/current/v26.1/cost-based-optimizer.md index 36c1309be6d..7975db99459 100644 --- a/src/current/v26.1/cost-based-optimizer.md +++ b/src/current/v26.1/cost-based-optimizer.md @@ -67,6 +67,24 @@ Full statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. + ##### Small versus large table examples Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -77,27 +95,19 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -##### Configure non-default statistics retention - -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - ### Partial statistics *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics) at extremes and on specific data. +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -Partial statistics have the following requirements: +Partial statistics have the following constraints: - Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if statistics were not previously collected on the specified column, the statement will return an error. +- Partial statistics [collected automatically](#automatically-collect-partial-statistics), or with `USING EXTREMES` and no `ON` clause, are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, each specified column must be the first key column of a non-inverted index. When using the `WHERE` clause, the predicate must also filter on the index column. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. #### Automatically collect partial statistics @@ -123,7 +133,9 @@ You can manually create partial statistics on: - The highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) - Specific columns and values, using the `WHERE` clause: [`CREATE STATISTICS stats ON column FROM table WHERE condition`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-on-specific-data) -### Enable and disable automatic statistics collection for clusters +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: @@ -147,7 +159,7 @@ Automatic statistics collection is enabled by default. To disable automatic [ful To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. @@ -188,24 +200,13 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. ### Forecasted statistics From 1c2a50f016429b57c98699368316d238a9d387e7 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Fri, 9 Jan 2026 11:53:04 -0500 Subject: [PATCH 3/7] partial stats are not used by optimizer pre-v24.3 --- src/current/v23.2/cost-based-optimizer.md | 21 ++++++--------------- src/current/v24.1/cost-based-optimizer.md | 21 ++++++--------------- src/current/v24.2/cost-based-optimizer.md | 21 ++++++--------------- 3 files changed, 18 insertions(+), 45 deletions(-) diff --git a/src/current/v23.2/cost-based-optimizer.md b/src/current/v23.2/cost-based-optimizer.md index 388f69443d3..cda13b27050 100644 --- a/src/current/v23.2/cost-based-optimizer.md +++ b/src/current/v23.2/cost-based-optimizer.md @@ -23,10 +23,9 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -The optimizer can use three types of statistics to plan queries: +The optimizer can use two types of statistics to plan queries: - [Full statistics](#full-statistics) -- [Partial statistics](#partial-statistics) - [Forecasted statistics](#forecasted-statistics) For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. @@ -35,6 +34,10 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +{{site.data.alerts.callout_success}} +You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +{{site.data.alerts.end}} + A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). @@ -71,23 +74,11 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. - -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). - -Partial statistics have the following constraints: - -- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. - ### Toggle automatic statistics collection #### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: diff --git a/src/current/v24.1/cost-based-optimizer.md b/src/current/v24.1/cost-based-optimizer.md index a441da6d19a..35dd8b7ef4d 100644 --- a/src/current/v24.1/cost-based-optimizer.md +++ b/src/current/v24.1/cost-based-optimizer.md @@ -23,10 +23,9 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -The optimizer can use three types of statistics to plan queries: +The optimizer can use two types of statistics to plan queries: - [Full statistics](#full-statistics) -- [Partial statistics](#partial-statistics) - [Forecasted statistics](#forecasted-statistics) For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. @@ -35,6 +34,10 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +{{site.data.alerts.callout_success}} +You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +{{site.data.alerts.end}} + A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). @@ -71,23 +74,11 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. - -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). - -Partial statistics have the following constraints: - -- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. - ### Toggle automatic statistics collection #### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: diff --git a/src/current/v24.2/cost-based-optimizer.md b/src/current/v24.2/cost-based-optimizer.md index b95fac0f42e..6e4c9d97f68 100644 --- a/src/current/v24.2/cost-based-optimizer.md +++ b/src/current/v24.2/cost-based-optimizer.md @@ -23,10 +23,9 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -The optimizer can use three types of statistics to plan queries: +The optimizer can use two types of statistics to plan queries: - [Full statistics](#full-statistics) -- [Partial statistics](#partial-statistics) - [Forecasted statistics](#forecasted-statistics) For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. @@ -35,6 +34,10 @@ For best query performance, most users should leave automatic statistics enabled By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +{{site.data.alerts.callout_success}} +You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). +{{site.data.alerts.end}} + A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). @@ -71,23 +74,11 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -### Partial statistics - -*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. - -You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). - -Partial statistics have the following constraints: - -- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. -- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. - ### Toggle automatic statistics collection #### Enable and disable automatic statistics collection for clusters -Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: +Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: From cd1266cc8a8588006013b5115170f1c5900a7a71 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Fri, 9 Jan 2026 12:34:07 -0500 Subject: [PATCH 4/7] improve setting format --- src/current/v25.1/cost-based-optimizer.md | 20 ++++++++++++----- src/current/v25.2/cost-based-optimizer.md | 27 ++++++++++++++--------- src/current/v25.3/cost-based-optimizer.md | 27 ++++++++++++++--------- src/current/v25.4/cost-based-optimizer.md | 27 ++++++++++++++--------- src/current/v26.1/cost-based-optimizer.md | 27 ++++++++++++++--------- 5 files changed, 82 insertions(+), 46 deletions(-) diff --git a/src/current/v25.1/cost-based-optimizer.md b/src/current/v25.1/cost-based-optimizer.md index a344a433666..9fdb173d5d5 100644 --- a/src/current/v25.1/cost-based-optimizer.md +++ b/src/current/v25.1/cost-based-optimizer.md @@ -112,13 +112,21 @@ The optimizer uses partial statistics for query planning when the [`optimizer_us This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. -To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | -| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | -| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | #### Manually collect partial statistics diff --git a/src/current/v25.2/cost-based-optimizer.md b/src/current/v25.2/cost-based-optimizer.md index e2dd3084647..fb52d0620fa 100644 --- a/src/current/v25.2/cost-based-optimizer.md +++ b/src/current/v25.2/cost-based-optimizer.md @@ -42,11 +42,10 @@ A [background job]({% link {{ page.version.version }}/create-statistics.md %}#vi By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{% include_cached new-in.html version="v25.2" %} To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. +{% include_cached new-in.html version="v25.2" %} To control automatic collection of full statistics, use the following settings: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| -| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. #### Control statistics refresh rate @@ -118,13 +117,21 @@ Partial statistics are automatically collected on the highest and lowest index v This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. -To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | -| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | -| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | #### Manually collect partial statistics diff --git a/src/current/v25.3/cost-based-optimizer.md b/src/current/v25.3/cost-based-optimizer.md index efbe4269fa0..968eb98c94a 100644 --- a/src/current/v25.3/cost-based-optimizer.md +++ b/src/current/v25.3/cost-based-optimizer.md @@ -42,11 +42,10 @@ A [background job]({% link {{ page.version.version }}/create-statistics.md %}#vi By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. +To control automatic collection of full statistics, use the following settings: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| -| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. #### Control statistics refresh rate @@ -118,13 +117,21 @@ Partial statistics are automatically collected on the highest and lowest index v This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. -To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | -| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | -| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | #### Manually collect partial statistics diff --git a/src/current/v25.4/cost-based-optimizer.md b/src/current/v25.4/cost-based-optimizer.md index eed89e15e23..ee903c7b623 100644 --- a/src/current/v25.4/cost-based-optimizer.md +++ b/src/current/v25.4/cost-based-optimizer.md @@ -42,11 +42,10 @@ A [background job]({% link {{ page.version.version }}/create-statistics.md %}#vi By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. +To control automatic collection of full statistics, use the following settings: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| -| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. #### Control statistics refresh rate @@ -118,13 +117,21 @@ Partial statistics are automatically collected on the highest and lowest index v This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. -To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | -| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | -| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | #### Manually collect partial statistics diff --git a/src/current/v26.1/cost-based-optimizer.md b/src/current/v26.1/cost-based-optimizer.md index 7975db99459..2f70a93defc 100644 --- a/src/current/v26.1/cost-based-optimizer.md +++ b/src/current/v26.1/cost-based-optimizer.md @@ -42,11 +42,10 @@ A [background job]({% link {{ page.version.version }}/create-statistics.md %}#vi By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -To control automatic collection of full statistics, use the following settings. The table storage parameter overrides the cluster setting when applied to a specific table. +To control automatic collection of full statistics, use the following settings: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------| -| [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled) | `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of full table statistics. | +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. #### Control statistics refresh rate @@ -118,13 +117,21 @@ Partial statistics are automatically collected on the highest and lowest index v This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. -To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) and [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). Each table parameter overrides the corresponding cluster setting when applied to a specific table. +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: -| Cluster setting | Table storage parameter | Description | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial table statistics. | -| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows that triggers partial statistics collection. | -| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | #### Manually collect partial statistics From 8ec84827ce9f950f5c4c7f86b7733c572002f8ce Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Fri, 9 Jan 2026 13:22:21 -0500 Subject: [PATCH 5/7] fix links --- src/current/_includes/v23.2/misc/session-vars.md | 2 +- src/current/_includes/v24.1/misc/session-vars.md | 2 +- src/current/_includes/v24.2/misc/session-vars.md | 2 +- src/current/v23.2/create-statistics.md | 2 +- src/current/v24.1/create-statistics.md | 2 +- src/current/v24.2/create-statistics.md | 2 +- src/current/v24.3/cost-based-optimizer.md | 2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/current/_includes/v23.2/misc/session-vars.md b/src/current/_includes/v23.2/misc/session-vars.md index 29b1fcb2bda..99c8b1b0f00 100644 --- a/src/current/_includes/v23.2/misc/session-vars.md +++ b/src/current/_includes/v23.2/misc/session-vars.md @@ -20,7 +20,7 @@ | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | -| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of partial statistics using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v24.1/misc/session-vars.md b/src/current/_includes/v24.1/misc/session-vars.md index b89d13e0976..fbbdb01d8a3 100644 --- a/src/current/_includes/v24.1/misc/session-vars.md +++ b/src/current/_includes/v24.1/misc/session-vars.md @@ -20,7 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | -| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of partial statistics using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v24.2/misc/session-vars.md b/src/current/_includes/v24.2/misc/session-vars.md index 55927e4a74a..06b181e1a13 100644 --- a/src/current/_includes/v24.2/misc/session-vars.md +++ b/src/current/_includes/v24.2/misc/session-vars.md @@ -20,7 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | -| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of partial statistics using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/v23.2/create-statistics.md b/src/current/v23.2/create-statistics.md index 0aa2927d1b5..0d2a6d8e798 100644 --- a/src/current/v23.2/create-statistics.md +++ b/src/current/v23.2/create-statistics.md @@ -168,7 +168,7 @@ For more information about how the `AS OF SYSTEM TIME` clause works, including s ### Create partial statistics using extremes -To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: +To create partial statistics that collect statistics on the highest and lowest index values: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.1/create-statistics.md b/src/current/v24.1/create-statistics.md index 719b4451280..236de606043 100644 --- a/src/current/v24.1/create-statistics.md +++ b/src/current/v24.1/create-statistics.md @@ -168,7 +168,7 @@ For more information about how the `AS OF SYSTEM TIME` clause works, including s ### Create partial statistics using extremes -To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: +To create partial statistics that collect statistics on the highest and lowest index values: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.2/create-statistics.md b/src/current/v24.2/create-statistics.md index 719b4451280..236de606043 100644 --- a/src/current/v24.2/create-statistics.md +++ b/src/current/v24.2/create-statistics.md @@ -168,7 +168,7 @@ For more information about how the `AS OF SYSTEM TIME` clause works, including s ### Create partial statistics using extremes -To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: +To create partial statistics that collect statistics on the highest and lowest index values: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.3/cost-based-optimizer.md b/src/current/v24.3/cost-based-optimizer.md index c1ae0035a88..be5d1fc15f5 100644 --- a/src/current/v24.3/cost-based-optimizer.md +++ b/src/current/v24.3/cost-based-optimizer.md @@ -99,7 +99,7 @@ Partial statistics have the following constraints: - Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. - Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. -- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#enable-create-stats-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. +- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. {% include_cached new-in.html version="v24.3" %} The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. From 6ece21cadadb781e802339864af0f14d7de160c8 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Mon, 2 Feb 2026 12:15:27 -0500 Subject: [PATCH 6/7] address review comments --- src/current/v23.2/cost-based-optimizer.md | 4 +--- src/current/v23.2/create-statistics.md | 4 ++-- src/current/v24.1/cost-based-optimizer.md | 4 +--- src/current/v24.1/create-statistics.md | 4 ++-- src/current/v24.2/cost-based-optimizer.md | 4 +--- src/current/v24.2/create-statistics.md | 4 ++-- src/current/v24.3/cost-based-optimizer.md | 4 +--- src/current/v24.3/create-statistics.md | 4 ++-- src/current/v25.1/cost-based-optimizer.md | 6 ++---- src/current/v25.1/create-statistics.md | 4 ++-- src/current/v25.2/cost-based-optimizer.md | 6 ++---- src/current/v25.2/create-statistics.md | 4 ++-- src/current/v25.3/cost-based-optimizer.md | 6 ++---- src/current/v25.3/create-statistics.md | 4 ++-- src/current/v25.4/cost-based-optimizer.md | 6 ++---- src/current/v25.4/create-statistics.md | 12 ++++++------ src/current/v26.1/cost-based-optimizer.md | 6 ++---- src/current/v26.1/create-statistics.md | 12 ++++++------ 18 files changed, 40 insertions(+), 58 deletions(-) diff --git a/src/current/v23.2/cost-based-optimizer.md b/src/current/v23.2/cost-based-optimizer.md index cda13b27050..0ed287c6a43 100644 --- a/src/current/v23.2/cost-based-optimizer.md +++ b/src/current/v23.2/cost-based-optimizer.md @@ -32,7 +32,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. {{site.data.alerts.callout_success}} You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). @@ -156,8 +156,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v23.2/create-statistics.md b/src/current/v23.2/create-statistics.md index 0d2a6d8e798..38bd6868988 100644 --- a/src/current/v23.2/create-statistics.md +++ b/src/current/v23.2/create-statistics.md @@ -180,9 +180,9 @@ SET enable_create_stats_using_extremes = true; CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns: +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.1/cost-based-optimizer.md b/src/current/v24.1/cost-based-optimizer.md index 35dd8b7ef4d..229b5ff5359 100644 --- a/src/current/v24.1/cost-based-optimizer.md +++ b/src/current/v24.1/cost-based-optimizer.md @@ -32,7 +32,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. {{site.data.alerts.callout_success}} You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). @@ -156,8 +156,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.1/create-statistics.md b/src/current/v24.1/create-statistics.md index 236de606043..a6d427e2c17 100644 --- a/src/current/v24.1/create-statistics.md +++ b/src/current/v24.1/create-statistics.md @@ -180,9 +180,9 @@ SET enable_create_stats_using_extremes = true; CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns: +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.2/cost-based-optimizer.md b/src/current/v24.2/cost-based-optimizer.md index 6e4c9d97f68..d24585c1612 100644 --- a/src/current/v24.2/cost-based-optimizer.md +++ b/src/current/v24.2/cost-based-optimizer.md @@ -32,7 +32,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. {{site.data.alerts.callout_success}} You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). @@ -156,8 +156,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.2/create-statistics.md b/src/current/v24.2/create-statistics.md index 236de606043..a6d427e2c17 100644 --- a/src/current/v24.2/create-statistics.md +++ b/src/current/v24.2/create-statistics.md @@ -180,9 +180,9 @@ SET enable_create_stats_using_extremes = true; CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns: +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.3/cost-based-optimizer.md b/src/current/v24.3/cost-based-optimizer.md index be5d1fc15f5..91ee4e8b0b3 100644 --- a/src/current/v24.3/cost-based-optimizer.md +++ b/src/current/v24.3/cost-based-optimizer.md @@ -33,7 +33,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: @@ -185,8 +185,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.3/create-statistics.md b/src/current/v24.3/create-statistics.md index 719b4451280..1633d10b6fd 100644 --- a/src/current/v24.3/create-statistics.md +++ b/src/current/v24.3/create-statistics.md @@ -180,9 +180,9 @@ SET enable_create_stats_using_extremes = true; CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns: +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v25.1/cost-based-optimizer.md b/src/current/v25.1/cost-based-optimizer.md index 9fdb173d5d5..1f81e68c674 100644 --- a/src/current/v25.1/cost-based-optimizer.md +++ b/src/current/v25.1/cost-based-optimizer.md @@ -33,7 +33,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: @@ -93,7 +93,7 @@ In such cases, we recommend that you use the [`sql_stats_automatic_collection_en *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). Partial statistics have the following constraints: @@ -214,8 +214,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v25.1/create-statistics.md b/src/current/v25.1/create-statistics.md index 69cc07120fd..fd2a7d50cde 100644 --- a/src/current/v25.1/create-statistics.md +++ b/src/current/v25.1/create-statistics.md @@ -177,9 +177,9 @@ To create [partial statistics]({% link {{ page.version.version }}/cost-based-opt CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v25.2/cost-based-optimizer.md b/src/current/v25.2/cost-based-optimizer.md index fb52d0620fa..69fcd3df6b1 100644 --- a/src/current/v25.2/cost-based-optimizer.md +++ b/src/current/v25.2/cost-based-optimizer.md @@ -33,7 +33,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: @@ -98,7 +98,7 @@ In such cases, we recommend that you use the [`sql_stats_automatic_collection_en *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). Partial statistics have the following constraints: @@ -219,8 +219,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v25.2/create-statistics.md b/src/current/v25.2/create-statistics.md index 69cc07120fd..fd2a7d50cde 100644 --- a/src/current/v25.2/create-statistics.md +++ b/src/current/v25.2/create-statistics.md @@ -177,9 +177,9 @@ To create [partial statistics]({% link {{ page.version.version }}/cost-based-opt CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v25.3/cost-based-optimizer.md b/src/current/v25.3/cost-based-optimizer.md index 968eb98c94a..6221050a511 100644 --- a/src/current/v25.3/cost-based-optimizer.md +++ b/src/current/v25.3/cost-based-optimizer.md @@ -33,7 +33,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: @@ -98,7 +98,7 @@ In such cases, we recommend that you use the [`sql_stats_automatic_collection_en *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). Partial statistics have the following constraints: @@ -219,8 +219,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v25.3/create-statistics.md b/src/current/v25.3/create-statistics.md index 69cc07120fd..fd2a7d50cde 100644 --- a/src/current/v25.3/create-statistics.md +++ b/src/current/v25.3/create-statistics.md @@ -177,9 +177,9 @@ To create [partial statistics]({% link {{ page.version.version }}/cost-based-opt CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v25.4/cost-based-optimizer.md b/src/current/v25.4/cost-based-optimizer.md index ee903c7b623..46b4b587502 100644 --- a/src/current/v25.4/cost-based-optimizer.md +++ b/src/current/v25.4/cost-based-optimizer.md @@ -33,7 +33,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: @@ -98,7 +98,7 @@ In such cases, we recommend that you use the [`sql_stats_automatic_collection_en *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). Partial statistics have the following constraints: @@ -222,8 +222,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v25.4/create-statistics.md b/src/current/v25.4/create-statistics.md index 7e3eb52bbab..423208dea38 100644 --- a/src/current/v25.4/create-statistics.md +++ b/src/current/v25.4/create-statistics.md @@ -177,9 +177,9 @@ To create [partial statistics]({% link {{ page.version.version }}/cost-based-opt CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql @@ -192,17 +192,17 @@ CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; {% include_cached copy-clipboard.html %} ~~~ sql -CREATE INDEX ON rides (start_time); +CREATE INDEX ON rides (revenue); ~~~ -Partial statistics are particularly valuable for timestamp columns where workloads commonly access the most recent data: +Partial statistics can target any subset of data matching specific conditions. For example, to create statistics on high-value rides: {% include_cached copy-clipboard.html %} ~~~ sql -CREATE STATISTICS recent_rides_stats ON start_time FROM rides WHERE start_time > '2023-01-01'; +CREATE STATISTICS high_value_rides_stats ON revenue FROM rides WHERE revenue > 50; ~~~ -This creates statistics only on rides that started after January 1, 2023, allowing the optimizer to have accurate statistics for recent data without scanning the entire table. +This creates partial statistics covering only high-value rides. ### Delete statistics diff --git a/src/current/v26.1/cost-based-optimizer.md b/src/current/v26.1/cost-based-optimizer.md index 2f70a93defc..98bf14d8ea2 100644 --- a/src/current/v26.1/cost-based-optimizer.md +++ b/src/current/v26.1/cost-based-optimizer.md @@ -33,7 +33,7 @@ For best query performance, most users should leave automatic statistics enabled ### Full statistics -By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: @@ -98,7 +98,7 @@ In such cases, we recommend that you use the [`sql_stats_automatic_collection_en *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). Partial statistics have the following constraints: @@ -222,8 +222,6 @@ CockroachDB deletes statistics on non-default columns according to the `sql.stat - There have been at least 3 historical statistics collections. - The historical statistics closely fit a linear pattern. -By default, the optimizer uses forecasts that closely match the historical statistics. - You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v26.1/create-statistics.md b/src/current/v26.1/create-statistics.md index c94252fa694..815bdd5d231 100644 --- a/src/current/v26.1/create-statistics.md +++ b/src/current/v26.1/create-statistics.md @@ -177,9 +177,9 @@ To create [partial statistics]({% link {{ page.version.version }}/cost-based-opt CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; ~~~ -This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. -You can also create extremes statistics on specific columns, as long as [the column is indexed]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics): +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: {% include_cached copy-clipboard.html %} ~~~ sql @@ -192,17 +192,17 @@ To create [partial statistics]({% link {{ page.version.version }}/cost-based-opt {% include_cached copy-clipboard.html %} ~~~ sql -CREATE INDEX ON rides (start_time); +CREATE INDEX ON rides (revenue); ~~~ -Partial statistics are particularly valuable for timestamp columns where workloads commonly access the most recent data: +Partial statistics can target any subset of data matching specific conditions. For example, to create statistics on high-value rides: {% include_cached copy-clipboard.html %} ~~~ sql -CREATE STATISTICS recent_rides_stats ON start_time FROM rides WHERE start_time > '2023-01-01'; +CREATE STATISTICS high_value_rides_stats ON revenue FROM rides WHERE revenue > 50; ~~~ -This creates partial statistics covering only rows where `start_time` is greater than `2023-01-01`, providing focused statistics on the most recently accessed data. +This creates partial statistics covering only high-value rides. ### Delete statistics From 387721ff63bc21186dc3bd122f55228a5ca35600 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Thu, 12 Feb 2026 17:32:58 -0500 Subject: [PATCH 7/7] copy improvements per docs review --- src/current/v23.2/cost-based-optimizer.md | 6 +++--- src/current/v24.1/cost-based-optimizer.md | 6 +++--- src/current/v24.2/cost-based-optimizer.md | 6 +++--- src/current/v24.3/cost-based-optimizer.md | 8 ++++---- src/current/v25.1/cost-based-optimizer.md | 8 ++++---- src/current/v25.2/cost-based-optimizer.md | 8 ++++---- src/current/v25.3/cost-based-optimizer.md | 8 ++++---- src/current/v25.4/cost-based-optimizer.md | 8 ++++---- 8 files changed, 29 insertions(+), 29 deletions(-) diff --git a/src/current/v23.2/cost-based-optimizer.md b/src/current/v23.2/cost-based-optimizer.md index 0ed287c6a43..1a6a200f378 100644 --- a/src/current/v23.2/cost-based-optimizer.md +++ b/src/current/v23.2/cost-based-optimizer.md @@ -143,11 +143,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v24.1/cost-based-optimizer.md b/src/current/v24.1/cost-based-optimizer.md index 229b5ff5359..c70cb177f1f 100644 --- a/src/current/v24.1/cost-based-optimizer.md +++ b/src/current/v24.1/cost-based-optimizer.md @@ -143,11 +143,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v24.2/cost-based-optimizer.md b/src/current/v24.2/cost-based-optimizer.md index d24585c1612..f388174dca2 100644 --- a/src/current/v24.2/cost-based-optimizer.md +++ b/src/current/v24.2/cost-based-optimizer.md @@ -143,11 +143,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v24.3/cost-based-optimizer.md b/src/current/v24.3/cost-based-optimizer.md index 91ee4e8b0b3..42eb09337ff 100644 --- a/src/current/v24.3/cost-based-optimizer.md +++ b/src/current/v24.3/cost-based-optimizer.md @@ -77,7 +77,7 @@ SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, sql_stats_automatic_collection_min_stale_rows = 2000); ~~~ -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. ##### Small versus large table examples @@ -172,11 +172,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v25.1/cost-based-optimizer.md b/src/current/v25.1/cost-based-optimizer.md index 1f81e68c674..45eaacd642f 100644 --- a/src/current/v25.1/cost-based-optimizer.md +++ b/src/current/v25.1/cost-based-optimizer.md @@ -77,7 +77,7 @@ SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, sql_stats_automatic_collection_min_stale_rows = 2000); ~~~ -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. ##### Small versus large table examples @@ -201,11 +201,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v25.2/cost-based-optimizer.md b/src/current/v25.2/cost-based-optimizer.md index 69fcd3df6b1..b98823992ba 100644 --- a/src/current/v25.2/cost-based-optimizer.md +++ b/src/current/v25.2/cost-based-optimizer.md @@ -82,7 +82,7 @@ SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, sql_stats_automatic_collection_min_stale_rows = 2000); ~~~ -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. ##### Small versus large table examples @@ -206,11 +206,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v25.3/cost-based-optimizer.md b/src/current/v25.3/cost-based-optimizer.md index 6221050a511..e343c84b199 100644 --- a/src/current/v25.3/cost-based-optimizer.md +++ b/src/current/v25.3/cost-based-optimizer.md @@ -82,7 +82,7 @@ SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, sql_stats_automatic_collection_min_stale_rows = 2000); ~~~ -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. ##### Small versus large table examples @@ -206,11 +206,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics diff --git a/src/current/v25.4/cost-based-optimizer.md b/src/current/v25.4/cost-based-optimizer.md index 46b4b587502..4ba6fdc4e60 100644 --- a/src/current/v25.4/cost-based-optimizer.md +++ b/src/current/v25.4/cost-based-optimizer.md @@ -82,7 +82,7 @@ SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, sql_stats_automatic_collection_min_stale_rows = 2000); ~~~ -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. ##### Small versus large table examples @@ -209,11 +209,11 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE ### Configure non-default statistics retention -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. ### Forecasted statistics