diff --git a/src/current/_includes/v23.2/misc/session-vars.md b/src/current/_includes/v23.2/misc/session-vars.md index b56c253ea9d..99c8b1b0f00 100644 --- a/src/current/_includes/v23.2/misc/session-vars.md +++ b/src/current/_includes/v23.2/misc/session-vars.md @@ -20,6 +20,7 @@ | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of partial statistics using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v23.2/misc/table-storage-parameters.md b/src/current/_includes/v23.2/misc/table-storage-parameters.md index 32dee5a9aaa..0e60bdb6f05 100644 --- a/src/current/_includes/v23.2/misc/table-storage-parameters.md +++ b/src/current/_includes/v23.2/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | New in v23.2.1: `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v24.1/misc/session-vars.md b/src/current/_includes/v24.1/misc/session-vars.md index 72ed58d4178..fbbdb01d8a3 100644 --- a/src/current/_includes/v24.1/misc/session-vars.md +++ b/src/current/_includes/v24.1/misc/session-vars.md @@ -20,6 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of partial statistics using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v24.1/misc/table-storage-parameters.md b/src/current/_includes/v24.1/misc/table-storage-parameters.md index 3ca7f601648..51c4fd36db2 100644 --- a/src/current/_includes/v24.1/misc/table-storage-parameters.md +++ b/src/current/_includes/v24.1/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v24.2/misc/session-vars.md b/src/current/_includes/v24.2/misc/session-vars.md index 5a028f4cf5e..06b181e1a13 100644 --- a/src/current/_includes/v24.2/misc/session-vars.md +++ b/src/current/_includes/v24.2/misc/session-vars.md @@ -20,6 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of partial statistics using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `off` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | diff --git a/src/current/_includes/v24.2/misc/table-storage-parameters.md b/src/current/_includes/v24.2/misc/table-storage-parameters.md index 3ca7f601648..51c4fd36db2 100644 --- a/src/current/_includes/v24.2/misc/table-storage-parameters.md +++ b/src/current/_includes/v24.2/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v24.3/misc/session-vars.md b/src/current/_includes/v24.3/misc/session-vars.md index 15c5994c010..793e353cb36 100644 --- a/src/current/_includes/v24.3/misc/session-vars.md +++ b/src/current/_includes/v24.3/misc/session-vars.md @@ -20,6 +20,7 @@ | `disable_changefeed_replication` | When `true`, [changefeeds]({% link {{ page.version.version }}/changefeed-messages.md %}#filtering-changefeed-messages) will not emit messages for any changes (e.g., `INSERT`, `UPDATE`) issued to watched tables during that session. | `false` | Yes | Yes | | `disallow_full_table_scans` | If set to `on`, queries on "large" tables with a row count greater than [`large_full_scan_rows`](#large-full-scan-rows) will not use full table or index scans. If no other query plan is possible, queries will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable. | `off` | Yes | Yes || `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes, and all other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_experimental_alter_column_type_general` | If `on`, it is possible to [alter column data types]({% link {{ page.version.version }}/alter-table.md %}#alter-column-data-types). | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | @@ -55,6 +56,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) for cardinality estimation. | `off` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v24.3/misc/table-storage-parameters.md b/src/current/_includes/v24.3/misc/table-storage-parameters.md index 3ca7f601648..51c4fd36db2 100644 --- a/src/current/_includes/v24.3/misc/table-storage-parameters.md +++ b/src/current/_includes/v24.3/misc/table-storage-parameters.md @@ -2,7 +2,7 @@ |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | | `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | | `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | diff --git a/src/current/_includes/v25.1/misc/session-vars.md b/src/current/_includes/v25.1/misc/session-vars.md index 5c4e5892373..6e700e7e2c6 100644 --- a/src/current/_includes/v25.1/misc/session-vars.md +++ b/src/current/_includes/v25.1/misc/session-vars.md @@ -25,6 +25,7 @@ | `distribute_sort_row_count_threshold` | **New in v25.1:** Minimum number of rows that a sort operation must process in order to be [distributed]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql). | `1000` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes. Distribution preferences for `GROUP BY`, scan, and sort operations are set with [`distribute_group_by_row_count_threshold`](#distribute-group-by-row-count-threshold), [`distribute_scan_row_count_threshold.`](#distribute-scan-row-count-threshold) and [`distribute_sort_row_count_threshold.`](#distribute-sort-row-count-threshold), respectively. All other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_select_for_update` | Indicates whether [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}) statements acquire locks using the `FOR UPDATE` locking mode during their initial row scan, which improves performance for contended workloads.

For more information about how `FOR UPDATE` locking works, see the documentation for [`SELECT FOR UPDATE`]({% link {{ page.version.version }}/select-for-update.md %}). | `on` | Yes | Yes | @@ -59,6 +60,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) for cardinality estimation. | `off` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.1/misc/table-storage-parameters.md b/src/current/_includes/v25.1/misc/table-storage-parameters.md index 3ca7f601648..f5b6ccf9fe4 100644 --- a/src/current/_includes/v25.1/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.1/misc/table-storage-parameters.md @@ -1,15 +1,19 @@ -| Parameter name | Description | Data type | Default value | -|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| -| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | -| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| Parameter name | Description | Data type | Default value | +|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| +| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | {% include_cached new-in.html version="v25.1" %}Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | {% include_cached new-in.html version="v25.1" %}Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | {% include_cached new-in.html version="v25.1" %}Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | +| `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: - `autovacuum_enabled` - `fillfactor` -For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). \ No newline at end of file +For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). diff --git a/src/current/_includes/v25.2/misc/session-vars.md b/src/current/_includes/v25.2/misc/session-vars.md index b700a4ed1f4..df75e86cfea 100644 --- a/src/current/_includes/v25.2/misc/session-vars.md +++ b/src/current/_includes/v25.2/misc/session-vars.md @@ -25,6 +25,7 @@ | `distribute_sort_row_count_threshold` | Minimum number of rows that a sort operation must process in order to be [distributed]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql). | `1000` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes. Distribution preferences for `GROUP BY`, scan, and sort operations are set with [`distribute_group_by_row_count_threshold`](#distribute-group-by-row-count-threshold), [`distribute_scan_row_count_threshold.`](#distribute-scan-row-count-threshold) and [`distribute_sort_row_count_threshold.`](#distribute-sort-row-count-threshold), respectively. All other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_select_for_update` | Indicates whether [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}) statements acquire locks using the `FOR UPDATE` locking mode during their initial row scan, which improves performance for contended workloads.

For more information about how `FOR UPDATE` locking works, see the documentation for [`SELECT FOR UPDATE`]({% link {{ page.version.version }}/select-for-update.md %}). | `on` | Yes | Yes | @@ -59,6 +60,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.2/misc/table-storage-parameters.md b/src/current/_includes/v25.2/misc/table-storage-parameters.md index 3ca7f601648..9e97fe83785 100644 --- a/src/current/_includes/v25.2/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.2/misc/table-storage-parameters.md @@ -1,15 +1,20 @@ -| Parameter name | Description | Data type | Default value | -|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| -| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on this table. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | -| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| Parameter name | Description | Data type | Default value | +|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| +| `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | {% include_cached new-in.html version="v25.2" %} Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | +| `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | +| `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: - `autovacuum_enabled` - `fillfactor` -For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). \ No newline at end of file +For the list of storage parameters that affect how [Row-Level TTL]({% link {{ page.version.version }}/row-level-ttl.md %}) works, see the list of [TTL storage parameters]({% link {{ page.version.version }}/row-level-ttl.md %}#ttl-storage-parameters). diff --git a/src/current/_includes/v25.3/misc/session-vars.md b/src/current/_includes/v25.3/misc/session-vars.md index 5d5d657ccd1..703d0e16804 100644 --- a/src/current/_includes/v25.3/misc/session-vars.md +++ b/src/current/_includes/v25.3/misc/session-vars.md @@ -26,6 +26,7 @@ | `distribute_sort_row_count_threshold` | Minimum number of rows that a sort operation must process in order to be [distributed]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql). | `1000` | Yes | Yes | | `distsql` | The query distribution mode for the session. By default, CockroachDB determines which queries are faster to execute if distributed across multiple nodes. Distribution preferences for `GROUP BY`, scan, and sort operations are set with [`distribute_group_by_row_count_threshold`](#distribute-group-by-row-count-threshold), [`distribute_scan_row_count_threshold.`](#distribute-scan-row-count-threshold) and [`distribute_sort_row_count_threshold.`](#distribute-sort-row-count-threshold), respectively. All other queries are run through the gateway node. | `auto` | Yes | Yes | | `enable_auto_rehoming` | When enabled, the [home regions]({% link {{ page.version.version }}/alter-table.md %}#crdb_region) of rows in [`REGIONAL BY ROW`]({% link {{ page.version.version }}/alter-table.md %}#set-the-table-locality-to-regional-by-row) tables are automatically set to the region of the [gateway node]({% link {{ page.version.version }}/ui-sessions-page.md %}#session-details-gateway-node) from which any [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}) statements that operate on those rows originate. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_durable_locking_for_serializable` | Indicates whether CockroachDB replicates [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) locks via [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), allowing locks to be preserved when leases are transferred. Note that replicating `FOR UPDATE` and `FOR SHARE` locks will add latency to those statements. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_fk_locking_for_serializable` | Indicates whether CockroachDB uses [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) to perform [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) checks. To take effect, the [`enable_shared_locking_for_serializable`](#enable-shared-locking-for-serializable) setting must also be enabled. This setting only affects `SERIALIZABLE` transactions and matches the default `READ COMMITTED` behavior when enabled. | `off` | Yes | Yes | | `enable_implicit_select_for_update` | Indicates whether [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}) statements acquire locks using the `FOR UPDATE` locking mode during their initial row scan, which improves performance for contended workloads.

For more information about how `FOR UPDATE` locking works, see the documentation for [`SELECT FOR UPDATE`]({% link {{ page.version.version }}/select-for-update.md %}). | `on` | Yes | Yes | @@ -60,6 +61,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.3/misc/table-storage-parameters.md b/src/current/_includes/v25.3/misc/table-storage-parameters.md index d8f99706496..673ef075a3c 100644 --- a/src/current/_includes/v25.3/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.3/misc/table-storage-parameters.md @@ -1,11 +1,15 @@ | Parameter name | Description | Data type | Default value | |----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | | `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: diff --git a/src/current/_includes/v25.4/misc/session-vars.md b/src/current/_includes/v25.4/misc/session-vars.md index bc2485ec5b9..f34fd9ffd45 100644 --- a/src/current/_includes/v25.4/misc/session-vars.md +++ b/src/current/_includes/v25.4/misc/session-vars.md @@ -34,6 +34,7 @@ | `enable_insert_fast_path` | Indicates whether CockroachDB will use a specialized execution operator for inserting into a table. We recommend leaving this setting `on`. | `on` | Yes | Yes | | `enable_shared_locking_for_serializable` | Indicates whether [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) are enabled for `SERIALIZABLE` transactions. When `off`, `SELECT` statements using `FOR SHARE` are still permitted under `SERIALIZABLE` isolation, but silently do not lock. | `off` | Yes | Yes | | `enable_super_regions` | When enabled, you can define a super region: a set of [database regions]({% link {{ page.version.version }}/multiregion-overview.md %}#super-regions) on a multi-region cluster such that your [schema objects]({% link {{ page.version.version }}/schema-design-overview.md %}#database-schema-objects) will have all of their [replicas]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) stored _only_ in regions that are members of the super region. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_zigzag_join` | Indicates whether the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) will plan certain queries using a [zigzag merge join algorithm]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins), which searches for the desired intersection by jumping back and forth between the indexes based on the fact that after constraining indexes, they share an ordering. | `on` | Yes | Yes | | `enforce_home_region` | If set to `on`, queries return an error and in some cases a suggested resolution if they cannot run entirely in their home region. This can occur if a query has no home region (for example, if it reads from different home regions in a [regional by row table]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables)) or a query's home region differs from the [gateway]({% link {{ page.version.version }}/architecture/life-of-a-distributed-transaction.md %}#gateway) region. Note that only tables with `ZONE` [survivability]({% link {{ page.version.version }}/multiregion-survival-goals.md %}#when-to-use-zone-vs-region-survival-goals) can be scanned without error when this is enabled. For more information about home regions, see [Table localities]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities).

This feature is in preview. It is subject to change. | `off` | Yes | Yes | | `enforce_home_region_follower_reads_enabled` | If `on` while the [`enforce_home_region`]({% link {{ page.version.version }}/cost-based-optimizer.md %}#control-whether-queries-are-limited-to-a-single-region) setting is `on`, allows `enforce_home_region` to perform `AS OF SYSTEM TIME` [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) to detect and report a query's [home region]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities), if any.

This feature is in preview. It is subject to change. | `off` | Yes | Yes | @@ -61,6 +62,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v25.4/misc/table-storage-parameters.md b/src/current/_includes/v25.4/misc/table-storage-parameters.md index d8f99706496..673ef075a3c 100644 --- a/src/current/_includes/v25.4/misc/table-storage-parameters.md +++ b/src/current/_includes/v25.4/misc/table-storage-parameters.md @@ -1,11 +1,15 @@ | Parameter name | Description | Data type | Default value | |----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. | Boolean | `true` | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | | `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: diff --git a/src/current/_includes/v26.1/misc/session-vars.md b/src/current/_includes/v26.1/misc/session-vars.md index 39c8e0e75e3..1e78e7b1cf5 100644 --- a/src/current/_includes/v26.1/misc/session-vars.md +++ b/src/current/_includes/v26.1/misc/session-vars.md @@ -34,6 +34,7 @@ | `enable_insert_fast_path` | Indicates whether CockroachDB will use a specialized execution operator for inserting into a table. We recommend leaving this setting `on`. | `on` | Yes | Yes | | `enable_shared_locking_for_serializable` | Indicates whether [shared locks]({% link {{ page.version.version }}/select-for-update.md %}#lock-strengths) are enabled for `SERIALIZABLE` transactions. When `off`, `SELECT` statements using `FOR SHARE` are still permitted under `SERIALIZABLE` isolation, but silently do not lock. | `off` | Yes | Yes | | `enable_super_regions` | When enabled, you can define a super region: a set of [database regions]({% link {{ page.version.version }}/multiregion-overview.md %}#super-regions) on a multi-region cluster such that your [schema objects]({% link {{ page.version.version }}/schema-design-overview.md %}#database-schema-objects) will have all of their [replicas]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) stored _only_ in regions that are members of the super region. | `off` | Yes | Yes | +| `enable_create_stats_using_extremes` | If `on`, allows manual creation of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) using the [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) syntax. | `on` | Yes | Yes | | `enable_zigzag_join` | Indicates whether the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) will plan certain queries using a [zigzag merge join algorithm]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins), which searches for the desired intersection by jumping back and forth between the indexes based on the fact that after constraining indexes, they share an ordering. | `on` | Yes | Yes | | `enforce_home_region` | If set to `on`, queries return an error and in some cases a suggested resolution if they cannot run entirely in their home region. This can occur if a query has no home region (for example, if it reads from different home regions in a [regional by row table]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables)) or a query's home region differs from the [gateway]({% link {{ page.version.version }}/architecture/life-of-a-distributed-transaction.md %}#gateway) region. Note that only tables with `ZONE` [survivability]({% link {{ page.version.version }}/multiregion-survival-goals.md %}#when-to-use-zone-vs-region-survival-goals) can be scanned without error when this is enabled. For more information about home regions, see [Table localities]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities).

This feature is in preview. It is subject to change. | `off` | Yes | Yes | | `enforce_home_region_follower_reads_enabled` | If `on` while the [`enforce_home_region`]({% link {{ page.version.version }}/cost-based-optimizer.md %}#control-whether-queries-are-limited-to-a-single-region) setting is `on`, allows `enforce_home_region` to perform `AS OF SYSTEM TIME` [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) to detect and report a query's [home region]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities), if any.

This feature is in preview. It is subject to change. | `off` | Yes | Yes | @@ -61,6 +62,7 @@ | `optimizer_use_improved_multi_column_selectivity_estimate` | If `on`, the optimizer uses an improved selectivity estimate for multi-column predicates. | `on` | Yes | Yes | | `optimizer_use_improved_zigzag_join_costing` | If `on`, the cost of [zigzag joins]({% link {{ page.version.version }}/cost-based-optimizer.md %}#zigzag-joins) is updated so they will be never be chosen over scans unless they produce fewer rows. To take effect, the [`enable_zigzag_join`](#enable-zigzag-join) setting must also be enabled. | `on` | Yes | Yes | | `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes | +| `optimizer_use_merged_partial_statistics` | If `on`, the optimizer uses [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) merged with existing full [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to produce more accurate cardinality estimates. | `on` | Yes | Yes | | `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes | | `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes | | `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes | diff --git a/src/current/_includes/v26.1/misc/table-storage-parameters.md b/src/current/_includes/v26.1/misc/table-storage-parameters.md index d8f99706496..ffc13dd7cf1 100644 --- a/src/current/_includes/v26.1/misc/table-storage-parameters.md +++ b/src/current/_includes/v26.1/misc/table-storage-parameters.md @@ -1,11 +1,15 @@ | Parameter name | Description | Data type | Default value | |----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|---------------| | `exclude_data_from_backup` | Exclude the data in this table from any future backups. | Boolean | `false` | -| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | -| `sql_stats_automatic_collection_enabled` | Enable [automatic statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#enable-and-disable-automatic-statistics-collection-for-tables) for this table. | Boolean | `true` | -| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a statistics refresh. | Integer | 500 | -| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a statistics refresh. | Float | 0.2 | +| `schema_locked` | Indicates that a [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) is not currently ongoing on this table. CockroachDB automatically unsets this parameter before performing a schema change and reapplies it when done. Enabling `schema_locked` can help [improve performance of changefeeds]({% link {{ page.version.version }}/create-changefeed.md %}#disallow-schema-changes-on-tables-to-improve-changefeed-performance) running on this table. | Boolean | `false` | +| `sql_stats_automatic_collection_enabled` | Enable automatic collection of [full]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) and [partial]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) statistics for this table. | Boolean | `true` | +| `sql_stats_automatic_collection_min_stale_rows` | Minimum number of stale rows in this table that will trigger a full statistics refresh. | Integer | 500 | +| `sql_stats_automatic_collection_fraction_stale_rows` | Fraction of stale rows in this table that will trigger a full statistics refresh. | Float | 0.2 | +| `sql_stats_automatic_full_collection_enabled` | Enable automatic collection of [full statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#full-statistics) for this table. `sql_stats_automatic_collection_enabled` must be `true`. | Boolean | `true` | | `sql_stats_forecasts_enabled` | Enable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for this table. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_enabled` | Enable automatic collection of [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. `sql_stats_automatic_collection_enabled` must be `true`. | Boolean | `true` | +| `sql_stats_automatic_partial_collection_min_stale_rows` | Minimum number of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Integer | 100 | +| `sql_stats_automatic_partial_collection_fraction_stale_rows` | Target fraction of stale rows that triggers [partial statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#automatically-collect-partial-statistics) for this table. | Float | 0.05 | | `infer_rbr_region_col_using_constraint` | For [`REGIONAL BY ROW`]({% link {{ page.version.version }}/table-localities.md %}#regional-by-row-tables) tables, automatically populate the hidden `crdb_region` column on `INSERT`, `UPDATE`, and `UPSERT` by looking up the region of the referenced parent row. Set this parameter to the name of a [foreign key]({% link {{ page.version.version }}/foreign-key.md %}) constraint on the table that includes the `crdb_region` column. The foreign key cannot be dropped while the parameter is set. | String | `NULL` | The following parameters are included for PostgreSQL compatibility and do not affect how CockroachDB runs: diff --git a/src/current/v23.2/cost-based-optimizer.md b/src/current/v23.2/cost-based-optimizer.md index 294d9e75f2a..1a6a200f378 100644 --- a/src/current/v23.2/cost-based-optimizer.md +++ b/src/current/v23.2/cost-based-optimizer.md @@ -23,22 +23,31 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use two types of statistics to plan queries: -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. +- [Full statistics](#full-statistics) +- [Forecasted statistics](#forecasted-statistics) -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +{{site.data.alerts.callout_success}} +You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). {{site.data.alerts.end}} -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: + +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. + +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,7 +64,7 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -65,15 +74,9 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Toggle automatic statistics collection -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - -### Enable and disable automatic statistics collection for clusters +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: @@ -97,11 +100,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +141,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. + +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. + +### Forecasted statistics -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -### Enable and disable forecasted statistics for tables +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v23.2/create-statistics.md b/src/current/v23.2/create-statistics.md index c99f607bf6a..38bd6868988 100644 --- a/src/current/v23.2/create-statistics.md +++ b/src/current/v23.2/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT`]({% link {{ page.version.version }}/import.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create partial statistics that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v23.2/show-statistics.md b/src/current/v23.2/show-statistics.md index 4b50d855b6b..df2014abaac 100644 --- a/src/current/v23.2/show-statistics.md +++ b/src/current/v23.2/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v24.1/cost-based-optimizer.md b/src/current/v24.1/cost-based-optimizer.md index f680fdb8a80..c70cb177f1f 100644 --- a/src/current/v24.1/cost-based-optimizer.md +++ b/src/current/v24.1/cost-based-optimizer.md @@ -23,20 +23,29 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use two types of statistics to plan queries: -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. +- [Full statistics](#full-statistics) +- [Forecasted statistics](#forecasted-statistics) -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +{{site.data.alerts.callout_success}} +You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). {{site.data.alerts.end}} -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: + +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. + +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -55,7 +64,7 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -65,15 +74,9 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Toggle automatic statistics collection -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - -### Enable and disable automatic statistics collection for clusters +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: @@ -97,11 +100,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +141,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. + +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. + +### Forecasted statistics -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -### Enable and disable forecasted statistics for tables +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.1/create-statistics.md b/src/current/v24.1/create-statistics.md index 1dfa622c46b..a6d427e2c17 100644 --- a/src/current/v24.1/create-statistics.md +++ b/src/current/v24.1/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create partial statistics that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v24.1/show-statistics.md b/src/current/v24.1/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v24.1/show-statistics.md +++ b/src/current/v24.1/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v24.2/cost-based-optimizer.md b/src/current/v24.2/cost-based-optimizer.md index b7eef38e3e4..f388174dca2 100644 --- a/src/current/v24.2/cost-based-optimizer.md +++ b/src/current/v24.2/cost-based-optimizer.md @@ -23,20 +23,29 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use two types of statistics to plan queries: -- Columns that are part of the primary key or an index (in other words, all indexed columns). -- Up to 100 non-indexed columns. +- [Full statistics](#full-statistics) +- [Forecasted statistics](#forecasted-statistics) -By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +{{site.data.alerts.callout_success}} +You can manually collect *partial statistics* on a subset of table data without scanning the full table. Refer to [Create partial statistics using extremes]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). {{site.data.alerts.end}} -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: + +- Columns that are part of the primary key or an index (in other words, all indexed columns). +- Up to 100 non-indexed columns. + +By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -### Control statistics refresh rate +#### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -55,7 +64,7 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -65,15 +74,9 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Toggle automatic statistics collection -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). - -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. - -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. - -### Enable and disable automatic statistics collection for clusters +#### Enable and disable automatic statistics collection for clusters Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: @@ -97,11 +100,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +141,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. + +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. + +### Forecasted statistics -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -### Enable and disable forecasted statistics for tables +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.2/create-statistics.md b/src/current/v24.2/create-statistics.md index 1dfa622c46b..a6d427e2c17 100644 --- a/src/current/v24.2/create-statistics.md +++ b/src/current/v24.2/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create partial statistics that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v24.2/show-statistics.md b/src/current/v24.2/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v24.2/show-statistics.md +++ b/src/current/v24.2/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v24.3/cost-based-optimizer.md b/src/current/v24.3/cost-based-optimizer.md index 024b6d883af..42eb09337ff 100644 --- a/src/current/v24.3/cost-based-optimizer.md +++ b/src/current/v24.3/cost-based-optimizer.md @@ -23,20 +23,26 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. - -### Control statistics refresh rate +#### Control statistics refresh rate Statistics are refreshed in the following cases: @@ -55,7 +61,25 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. + +##### Small versus large table examples Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. @@ -65,17 +89,25 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion is regularly updated or queried. -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +You can manually collect partial statistics on the highest and lowest index values using [`CREATE STATISTICS ... USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Partial statistics have the following constraints: -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection with specific columns]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes), an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -### Enable and disable automatic statistics collection for clusters +{% include_cached new-in.html version="v24.3" %} The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -97,11 +129,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [table storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +170,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. + +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. + +### Forecasted statistics -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -### Enable and disable forecasted statistics for tables +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: diff --git a/src/current/v24.3/create-statistics.md b/src/current/v24.3/create-statistics.md index 1dfa622c46b..1633d10b6fd 100644 --- a/src/current/v24.3/create-statistics.md +++ b/src/current/v24.3/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,29 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET enable_create_stats_using_extremes = true; +~~~ + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS revenue_extremes_stats ON revenue FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v24.3/show-statistics.md b/src/current/v24.3/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v24.3/show-statistics.md +++ b/src/current/v24.3/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.1/cost-based-optimizer.md b/src/current/v25.1/cost-based-optimizer.md index 024b6d883af..45eaacd642f 100644 --- a/src/current/v25.1/cost-based-optimizer.md +++ b/src/current/v25.1/cost-based-optimizer.md @@ -23,22 +23,28 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} - -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. - -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +61,27 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. + +##### Small versus large table examples + +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,17 +89,54 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +Partial statistics have the following constraints: -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +The optimizer uses partial statistics for query planning when the [`optimizer_use_merged_partial_statistics`]({% link {{ page.version.version }}/session-variables.md %}#optimizer-use-merged-partial-statistics) session variable is enabled. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. -### Enable and disable automatic statistics collection for clusters +#### Automatically collect partial statistics -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +{% include_cached new-in.html version="v25.1" %} Partial statistics are automatically collected on the highest and lowest index values when: + +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: + +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -97,11 +158,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +199,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. + +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +### Forecasted statistics -### Enable and disable forecasted statistics for tables +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +251,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. @@ -312,8 +365,6 @@ Two types of plans can be cached: Generic plans are **not** included in the plan cache, but are cached per session. This means that they must still be re-optimized each time a session prepares a statement using a generic plan. To reuse generic query plans for maximum performance, a prepared statement should be executed multiple times instead of prepared and executed once. - This feature is in [preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}) and is subject to change. - {{site.data.alerts.callout_success}} Generic query plans will only benefit workloads that use prepared statements, which are issued via explicit `PREPARE` statements or by client libraries using the [PostgreSQL extended wire protocol](https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY). Generic query plans are most beneficial for queries with high planning times, such as queries with many [joins]({% link {{ page.version.version }}/joins.md %}). For more information on reducing planning time for such queries, refer to [Reduce planning time for queries with many joins](#reduce-planning-time-for-queries-with-many-joins). {{site.data.alerts.end}} @@ -322,15 +373,15 @@ To change the type of plan that is cached, use the [`plan_cache_mode`]({% link { The following modes can be set: -- `force_custom_plan` (default): Force the use of custom plans. +- `auto` (default): Automatically determine whether to use custom or generic query plans for prepared statements. Custom plans are used for the first five statement executions. Subsequent executions use a generic plan if its estimated cost is not significantly higher than the average cost of the preceding custom plans. +- `force_custom_plan`: Force the use of custom plans. - `force_generic_plan`: Force the use of generic plans. -- `auto`: Automatically determine whether to use custom or generic query plans for prepared statements. Custom plans are used for the first five statement executions. Subsequent executions use a generic plan if its estimated cost is not significantly higher than the average cost of the preceding custom plans. {{site.data.alerts.callout_info}} Generic plans are always used for non-prepared statements that do not contain placeholders or [stable functions]({% link {{ page.version.version }}/functions-and-operators.md %}#function-volatility), regardless of the `plan_cache_mode` setting. {{site.data.alerts.end}} -In some cases, generic query plans are less efficient than custom plans. For this reason, Cockroach Labs recommends setting `plan_cache_mode` to `auto` instead of `force_generic_plan`. Under the `auto` setting, the optimizer avoids bad generic plans by falling back to custom plans. For example: +In some cases, generic query plans are less efficient than custom plans. For this reason, Cockroach Labs recommends setting `plan_cache_mode` to `auto` (the default mode) instead of `force_generic_plan`. Under the `auto` setting, the optimizer avoids bad generic plans by falling back to custom plans. For example: Set `plan_cache_mode` to `auto` at the session level: diff --git a/src/current/v25.1/create-statistics.md b/src/current/v25.1/create-statistics.md index 1dfa622c46b..fd2a7d50cde 100644 --- a/src/current/v25.1/create-statistics.md +++ b/src/current/v25.1/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,26 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.1/show-statistics.md b/src/current/v25.1/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.1/show-statistics.md +++ b/src/current/v25.1/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.2/cost-based-optimizer.md b/src/current/v25.2/cost-based-optimizer.md index f17e44bf8cb..b98823992ba 100644 --- a/src/current/v25.2/cost-based-optimizer.md +++ b/src/current/v25.2/cost-based-optimizer.md @@ -23,22 +23,33 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +{% include_cached new-in.html version="v25.2" %} To control automatic collection of full statistics, use the following settings: -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +66,27 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. + +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,17 +94,54 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). + +Partial statistics have the following constraints: -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +#### Automatically collect partial statistics -### Enable and disable automatic statistics collection for clusters +Partial statistics are automatically collected on the highest and lowest index values when: -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: + +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -97,11 +163,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +204,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. -### Enable and disable forecasted statistics for tables +### Forecasted statistics -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +256,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v25.2/create-statistics.md b/src/current/v25.2/create-statistics.md index 1dfa622c46b..fd2a7d50cde 100644 --- a/src/current/v25.2/create-statistics.md +++ b/src/current/v25.2/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,26 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.2/show-statistics.md b/src/current/v25.2/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.2/show-statistics.md +++ b/src/current/v25.2/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.3/cost-based-optimizer.md b/src/current/v25.3/cost-based-optimizer.md index f17e44bf8cb..e343c84b199 100644 --- a/src/current/v25.3/cost-based-optimizer.md +++ b/src/current/v25.3/cost-based-optimizer.md @@ -23,22 +23,33 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +To control automatic collection of full statistics, use the following settings: -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +66,27 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. + +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,17 +94,54 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). + +Partial statistics have the following constraints: -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, an index must exist with a prefix matching those columns. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +#### Automatically collect partial statistics -### Enable and disable automatic statistics collection for clusters +Partial statistics are automatically collected on the highest and lowest index values when: -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: + +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on the highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes). + +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -97,11 +163,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +204,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. -### Enable and disable forecasted statistics for tables +### Forecasted statistics -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +256,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v25.3/create-statistics.md b/src/current/v25.3/create-statistics.md index 1dfa622c46b..fd2a7d50cde 100644 --- a/src/current/v25.3/create-statistics.md +++ b/src/current/v25.3/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,26 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.3/show-statistics.md b/src/current/v25.3/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.3/show-statistics.md +++ b/src/current/v25.3/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v25.4/cost-based-optimizer.md b/src/current/v25.4/cost-based-optimizer.md index f17e44bf8cb..4ba6fdc4e60 100644 --- a/src/current/v25.4/cost-based-optimizer.md +++ b/src/current/v25.4/cost-based-optimizer.md @@ -23,22 +23,33 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +To control automatic collection of full statistics, use the following settings: -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +66,27 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. Altered automatic statistics table settings take immediate effect for subsequent DML statements on a table. However, row mutations that started before you modified the table settings can still trigger statistics collection based on the previous settings. + +##### Small versus large table examples -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,17 +94,57 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics + +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. + +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). + +Partial statistics have the following constraints: -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics created with `USING EXTREMES` and no `ON` clause are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, each specified column must be the first key column of a non-inverted index. When using the `WHERE` clause, the predicate must also filter on the index column. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +#### Automatically collect partial statistics -### Enable and disable automatic statistics collection for clusters +Partial statistics are automatically collected on the highest and lowest index values when: -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: + +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on: + +- The highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) +- {% include_cached new-in.html version="v25.4" %} Specific columns and values, using the `WHERE` clause: [`CREATE STATISTICS stats ON column FROM table WHERE condition`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-on-specific-data) + +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -97,11 +166,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +207,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column while retaining the most recent four to five historical statistics. When CockroachDB refreshes statistics, it also deletes the statistics for any columns whose statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Do not retain historical statistics on non-default column sets indefinitely, because they are not refreshed automatically and can cause the optimizer to choose a suboptimal plan if they become stale. These non-default historical statistics can exist when columns are deleted or removed from an index and are no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to 24 hours. -### Enable and disable forecasted statistics for tables +### Forecasted statistics -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +259,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v25.4/create-statistics.md b/src/current/v25.4/create-statistics.md index 1dfa622c46b..423208dea38 100644 --- a/src/current/v25.4/create-statistics.md +++ b/src/current/v25.4/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,44 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + +### Create partial statistics on specific data + +{% include_cached new-in.html version="v25.4" %} To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) on a specific column and values matching specific conditions, ensure the column is indexed: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE INDEX ON rides (revenue); +~~~ + +Partial statistics can target any subset of data matching specific conditions. For example, to create statistics on high-value rides: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS high_value_rides_stats ON revenue FROM rides WHERE revenue > 50; +~~~ + +This creates partial statistics covering only high-value rides. + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v25.4/show-statistics.md b/src/current/v25.4/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v25.4/show-statistics.md +++ b/src/current/v25.4/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~ diff --git a/src/current/v26.1/cost-based-optimizer.md b/src/current/v26.1/cost-based-optimizer.md index b2b78ce8aae..98bf14d8ea2 100644 --- a/src/current/v26.1/cost-based-optimizer.md +++ b/src/current/v26.1/cost-based-optimizer.md @@ -23,22 +23,33 @@ The most important factor in determining the quality of a plan is cardinality (i The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and [refreshed periodically](#control-statistics-refresh-rate) for existing tables. -By default, CockroachDB automatically generates table statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}), and as they are [updated]({% link {{ page.version.version }}/update.md %}). It does this using a [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) that automatically determines which columns to get statistics on. Specifically, the optimizer chooses: +The optimizer can use three types of statistics to plan queries: + +- [Full statistics](#full-statistics) +- [Partial statistics](#partial-statistics) +- [Forecasted statistics](#forecasted-statistics) + +For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting. + +### Full statistics + +By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and after [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. + +A [background job]({% link {{ page.version.version }}/create-statistics.md %}#view-statistics-jobs) automatically determines which columns to get statistics on. Specifically, the optimizer chooses: - Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns. By default, CockroachDB also automatically collects [multi-column statistics]({% link {{ page.version.version }}/create-statistics.md %}#create-statistics-on-multiple-columns) on columns that prefix an index. -{{site.data.alerts.callout_info}} -[Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) trigger automatic statistics collection for the affected table(s). -{{site.data.alerts.end}} +To control automatic collection of full statistics, use the following settings: -For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in this section for performance tuning and troubleshooting. +- [`sql.stats.automatic_full_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-full-collection-enabled): Cluster setting that enables automatic collection of full table statistics across all tables in the cluster. +- [`sql_stats_automatic_full_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): Table storage parameter that overrides the cluster setting when applied to a specific table. -### Control statistics refresh rate +#### Control statistics refresh rate -Statistics are refreshed in the following cases: +Full statistics are refreshed in the following cases: - When there are no statistics. - When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes. @@ -55,9 +66,27 @@ Statistics are refreshed in the following cases: Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated. {{site.data.alerts.end}} -#### Small versus large table examples +The "stale row" cluster settings also have the table setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: + +~~~ sql +CREATE TABLE accounts ( + id INT PRIMARY KEY, + balance DECIMAL) +WITH (sql_stats_automatic_collection_enabled = true, +sql_stats_automatic_collection_min_stale_rows = 1000000, +sql_stats_automatic_collection_fraction_stale_rows= 0.05 +); + +ALTER TABLE accounts +SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, +sql_stats_automatic_collection_min_stale_rows = 2000); +~~~ + +Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. -Suppose the [clusters settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. +##### Small versus large table examples + +Suppose the [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.automatic_collection.fraction_stale_rows` and `sql.stats.automatic_collection.min_stale_rows` have the default values .2 and 500 as shown in the preceding table. If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum. @@ -65,17 +94,57 @@ On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,0 In such cases, we recommend that you use the [`sql_stats_automatic_collection_enabled` storage parameter](#enable-and-disable-automatic-statistics-collection-for-tables), which lets you configure automatic statistics collection on a per-table basis. -#### Configure non-default statistics retention +### Partial statistics -By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). +*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. -Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. +Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a [lower threshold](#automatically-collect-partial-statistics) of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). -CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. +Partial statistics have the following constraints: + +- Partial statistics can only be collected if [full statistics](#full-statistics) already exist for the table. +- Partial statistics [collected automatically](#automatically-collect-partial-statistics), or with `USING EXTREMES` and no `ON` clause, are collected on all single-column prefixes of non-inverted indexes. Indexes that are [partial]({% link {{ page.version.version }}/partial-indexes.md %}), [hash-sharded]({% link {{ page.version.version }}/hash-sharded-indexes.md %}), or implicitly partitioned (such as in [`REGIONAL BY ROW` tables]({% link {{ page.version.version }}/regional-tables.md %}#regional-by-row-tables)) are excluded. +- For [manual collection](#manually-collect-partial-statistics) with specific columns, each specified column must be the first key column of a non-inverted index. When using the `WHERE` clause, the predicate must also filter on the index column. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error. + +By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates. + +#### Automatically collect partial statistics -### Enable and disable automatic statistics collection for clusters +Partial statistics are automatically collected on the highest and lowest index values when: -Automatic statistics collection is enabled by default. To disable automatic statistics collection, follow these steps: +- Automatic collection is enabled. +- The number of stale rows in a table reaches a specified threshold. + +This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed. + +To control automatic collection of partial statistics, use the following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) to configure behavior across all tables in the cluster: + +| Cluster setting | Description | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql.stats.automatic_partial_collection.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-enabled) | Enable automatic collection of partial table statistics. | +| [`sql.stats.automatic_partial_collection.min_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-min-stale-rows) | Minimum number of stale rows that triggers partial statistics collection. | +| [`sql.stats.automatic_partial_collection.fraction_stale_rows`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-automatic-partial-collection-fraction-stale-rows) | Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +Override cluster settings for specific tables using the following [table storage parameters]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters): + +| Table storage parameter | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`sql_stats_automatic_partial_collection_enabled`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Enable automatic collection of partial statistics on the table. | +| [`sql_stats_automatic_partial_collection_min_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Minimum number of stale rows on the table that triggers partial statistics collection. | +| [`sql_stats_automatic_partial_collection_fraction_stale_rows`]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters) | Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics. | + +#### Manually collect partial statistics + +You can manually create partial statistics on: + +- The highest and lowest index values, when [`enable_create_stats_using_extremes`]({% link {{ page.version.version }}/session-variables.md %}#enable-create-stats-using-extremes) session variable is enabled, using the `USING EXTREMES` clause: [`CREATE STATISTICS stats FROM table USING EXTREMES`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-using-extremes) +- Specific columns and values, using the `WHERE` clause: [`CREATE STATISTICS stats ON column FROM table WHERE condition`]({% link {{ page.version.version }}/create-statistics.md %}#create-partial-statistics-on-specific-data) + +### Toggle automatic statistics collection + +#### Enable and disable automatic statistics collection for clusters + +Automatic statistics collection is enabled by default. To disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection, follow these steps: 1. Set the `sql.stats.automatic_collection.enabled` cluster setting to `false`: @@ -97,11 +166,11 @@ Automatic statistics collection is enabled by default. To disable automatic stat To learn how to manually generate statistics, see the [`CREATE STATISTICS` examples]({% link {{ page.version.version }}/create-statistics.md %}#examples). -### Enable and disable automatic statistics collection for tables +#### Enable and disable automatic statistics collection for tables -Statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. +Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes. -You can enable and disable automatic statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` storage parameter. This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). +You can enable and disable automatic [full](#full-statistics) and [partial](#partial-statistics) statistics collection for individual tables using the `sql_stats_automatic_collection_enabled` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) described in [Enable and disable automatic statistics collection for clusters](#enable-and-disable-automatic-statistics-collection-for-clusters). You can either configure this setting during table creation: @@ -138,28 +207,22 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled)` removes the table setting, in which case the `sql.stats.automatic_collection.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -The "stale row" cluster settings discussed in [Control statistics refresh rate](#control-statistics-refresh-rate) have table -setting counterparts `sql_stats_automatic_collection_fraction_stale_rows` and `sql_stats_automatic_collection_min_stale_rows`. For example: +### Configure non-default statistics retention -~~~ sql -CREATE TABLE accounts ( - id INT PRIMARY KEY, - balance DECIMAL) -WITH (sql_stats_automatic_collection_enabled = true, -sql_stats_automatic_collection_min_stale_rows = 1000000, -sql_stats_automatic_collection_fraction_stale_rows= 0.05 -); +By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not [collected by default](#table-statistics). -ALTER TABLE accounts -SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1, -sql_stats_automatic_collection_min_stale_rows = 2000); -~~~ +Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic. -Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the `ALTER TABLE ... SET` statement. +CockroachDB deletes statistics on non-default columns according to the `sql.stats.non_default_columns.min_retention_period` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), which defaults to a 24-hour retention period. -### Enable and disable forecasted statistics for tables +### Forecasted statistics -You can enable and disable [forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics) collection for individual tables using the `sql_stats_forecasts_enabled` table parameter. This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). +*Forecasted statistics* use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met: + +- There have been at least 3 historical statistics collections. +- The historical statistics closely fit a linear pattern. + +You can enable and disable forecasted statistics collection for individual tables using the `sql_stats_forecasts_enabled` [table parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}#table-parameters). This table setting **takes precedence** over the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}). You can either configure this setting during table creation: @@ -196,8 +259,6 @@ The current table settings are shown in the `WITH` clause output of `SHOW CREATE `ALTER TABLE accounts RESET (sql_stats_forecasts_enabled)` removes the table setting, in which case the `sql.stats.forecasts.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) is in effect for the table. -For details on forecasted statistics, see [Display forecasted statistics]({% link {{ page.version.version }}/show-statistics.md %}#display-forecasted-statistics). - ### Control histogram collection By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`]({% link {{ page.version.version }}/create-statistics.md %}), a histogram will be collected, regardless of whether or not the column is part of an index. diff --git a/src/current/v26.1/create-statistics.md b/src/current/v26.1/create-statistics.md index 1dfa622c46b..815bdd5d231 100644 --- a/src/current/v26.1/create-statistics.md +++ b/src/current/v26.1/create-statistics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to generate table statistics for the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}) to use. +Use the `CREATE STATISTICS` [statement]({% link {{ page.version.version }}/sql-statements.md %}) to [generate table statistics for the cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) to use. Once you [create a table]({% link {{ page.version.version }}/create-table.md %}) and load data into it (e.g., [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %})), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. @@ -166,6 +166,44 @@ To create statistics as of a given time (in this example, 1 minute ago to avoid For more information about how the `AS OF SYSTEM TIME` clause works, including supported time formats, see [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}). +### Create partial statistics using extremes + +CockroachDB supports [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics), which collect statistics on a subset of table data to provide more up-to-date information without scanning the entire table. + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) that collect statistics on the highest and lowest index values: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS rides_extremes_stats FROM rides USING EXTREMES; +~~~ + +This creates partial statistics on all single-column prefixes of non-inverted indexes in the `rides` table by scanning only the highest and lowest index values, rather than performing a full table scan. + +You can also create extremes statistics on specific columns, provided there is an index with the specified column as the first key column: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS city_extremes_stats ON city FROM rides USING EXTREMES; +~~~ + +### Create partial statistics on specific data + +To create [partial statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#partial-statistics) on a specific column and values matching specific conditions, ensure the column is indexed: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE INDEX ON rides (revenue); +~~~ + +Partial statistics can target any subset of data matching specific conditions. For example, to create statistics on high-value rides: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE STATISTICS high_value_rides_stats ON revenue FROM rides WHERE revenue > 50; +~~~ + +This creates partial statistics covering only high-value rides. + ### Delete statistics {% include {{ page.version.version }}/misc/delete-statistics.md %} diff --git a/src/current/v26.1/show-statistics.md b/src/current/v26.1/show-statistics.md index 6d85f5b9594..59143af8946 100644 --- a/src/current/v26.1/show-statistics.md +++ b/src/current/v26.1/show-statistics.md @@ -76,18 +76,13 @@ Parameter | Description ### Display forecasted statistics -The `WITH FORECAST` option calculates and displays forecasted statistics along with the existing table statistics. The forecast is a simple regression model that predicts how the statistics have changed since they were last collected. Forecasts that closely match the historical statistics are used by the [cost-based optimizer]({% link {{ page.version.version }}/cost-based-optimizer.md %}). - -CockroachDB generates forecasted statistics when the following conditions are met: - -- There have been at least 3 historical statistics collections. -- The historical statistics closely fit a linear pattern. +The `WITH FORECAST` option calculates and displays [forecasted statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#forecasted-statistics) along with the existing table statistics. The following example shows 3 historical statistics collections and the subsequent forecast: {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW STATISTICS FOR TABLE rides WITH FORECAST; +SHOW STATISTICS FOR TABLE rides WITH FORECAST; ~~~ ~~~