-
Notifications
You must be signed in to change notification settings - Fork 52
Azure SQL Database tips
This wiki page describes get-sqldb-tips.sql, a T-SQL script that produces tips to improve database design, health, and performance in Azure SQL Database. Tips are based on well-established best practices for the MSSQL database engine, with particular focus on Azure SQL Database specifics.
The script works on all Azure SQL Database service tiers, including Basic, Standard, Premium, General Purpose, Business Critical, and Hyperscale. Both single databases and databases in elastic pools are supported.
The script checks over 40 conditions to produce tips. New tips are added based on our customer engagements and community feedback.
After reading this wiki page, see FAQ for answers to common questions.
Download the latest release of the script from releases, under Assets. Unzip, and open the get-sqldb-tips.sql
file in a query window in SQL Server Management Studio (SSMS), Azure Data Studio (ADS), or a MSSQL client tool of your choice. Connect the query window to your target database and execute the script.
The result set will have the following columns:
-
tip_id
: Unique numeric identifier for a tip. -
description
: Short description of a tip. -
confidence_percent
: Our degree of confidence that following the tip is worthwhile and will result in a net benefit. Tips with lower confidence may require additional analysis and testing in the context of a specific application and workload. In the right circumstances, even tips with a relatively low confidence can produce substantial benefits. This confidence value is constant for a tip, i.e. it applies to the tip definition and does not change from one execution of the script to the next. -
additional_info_url
: Pasting this URL into a browser window opens a detailed tip description on this wiki page. Alternatively, you can search this page for the numeric value in thetip_id
column. Descriptions may include recommended actions, sample commands and queries, and an explanation for the data in thedetails
column. There are also links to additional resources (documentation, blogs, etc.) that provide background information on the topic. -
details
: For most tips, this provides additional data to help you act on a tip. For example, this can be the list of top queries, or the list of tables/indexes that are data compression candidates. In SSMS and ADS, clicking on this column displays the data in a new window for easier reading.
To return tips in a single JSON document instead of a relational rowset, set the @JSONOutput
variable to 1. This is useful if you want to send script results to others in a portable format.
We recommend executing the script while a representative workload is running. This could mean running the script on a production database, or a representative copy of a production database. Without a representative workload running, or with a smaller (e.g. non-production) database, relevant tips may be missed, or less accurate tips may be generated. If workload type and intensity change over time, consider executing the script at different points in time to get all relevant tips. You can also use the script as a troubleshooting tool, executing it when a performance or stability problem is occurring.
Some tips are based on system metadata accumulated since database engine startup. In those cases, the details
column includes engine startup time. If the time passed since engine startup is short, consider rerunning the script when more time has passed, to use more detailed and representative system metadata.
This script can be executed on either primary or readable secondary replica of a database, including read scale-out replicas and geo-replicas. Some tips will only be produced on either primary or secondary replica.
In this example, there are seven tips returned by the script. For most tips, there is additional information in the details
column.
The security principal executing the script must be either:
- A server administrator using SQL authentication or Azure Active Directory (AAD) authentication;
- A server login with an associated database user in the target database. The server login must be a member of
##MS_ServerStateReader##
server role.
Sample code to grant sufficient access to execute the script to a login named tips
:
-- In the master database on an Azure SQL Database logical server
CREATE LOGIN tips WITH PASSWORD = 'strong-password-here';
ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER tips;
-- In the user database where tips are to be produced, on the same logical server
CREATE USER tips FOR LOGIN tips;
This code grants permissions equivalent to VIEW SERVER STATE
, which is required to query system metadata views used in the script. The ##MS_ServerStateReader##
server role does not grant any additional permissions, specifically no permissions to read or modify database schema or data.
The above permissions are sufficient to execute the script on any service tier. This is the recommended way to run the script because it ensures that all potential tips are evaluated.
On single databases in Standard (S2 and higher), General Purpose, Premium, Business Critical, and Hyperscale service tiers, for nearly all tips it is sufficient to only hold the VIEW DATABASE STATE
permission on the database. That can be granted as follows:
-- In the master database on an Azure SQL Database logical server
CREATE LOGIN tips WITH PASSWORD = 'strong-password-here';
-- In the user database where tips are to be produced, on the same logical server
CREATE USER tips FOR LOGIN tips;
GRANT VIEW DATABASE STATE TO tips;
Alternatively, using database-scoped authentication, i.e. without creating a server login in the master
database:
-- In the user database where tips are to be produced
-- Using SQL authentication
CREATE USER tips WITH PASSWORD = 'strong-password-here';
GRANT VIEW DATABASE STATE TO tips;
-- Or, using Azure Active Directory (AAD) authentication
CREATE USER [[email protected]] FROM EXTERNAL PROVIDER;
GRANT VIEW DATABASE STATE TO [[email protected]];
For Basic, S0, and S1 single databases, and for any database in an elastic pool, VIEW DATABASE STATE
is sufficient to produce only a few tips. In that case, the script will output a warning listing potential tips were skipped because of insufficient permissions.
Tips are produced by analyzing database engine metadata in system views. For many tips, heuristics (based on configurable thresholds) are used, thus the output is produced with a finite degree of confidence. For each tip, our goal was to make it useful for a significant portion of our customers, not necessarily for every customer, which would be impossible given the broad variety of designs and workloads across millions of databases in Azure SQL Database.
The script is lightweight, does not query any user data, and usually executes within seconds. To avoid impacting workloads, the script checks current CPU utilization, and aborts before executing any other queries if CPU utilization is very high. The script also aborts if it is blocked for more than a few seconds waiting on a lock, to avoid contributing to blocking chains.
The database where you run the script must use compatibility level 110-150 or later, which is the case for the vast majority of databases in Azure SQL. For databases using compatibility level 100, we provide a different script, get-sqldb-tips-100.sql
. When using this script, data in the details
column is not sorted in any particular order because the WITHIN GROUP
clause of STRING_AGG()
function is not supported in compatibility level 100.
We welcome and encourage contributions to this project. You can open issues to report bugs and suggest improvements or new tips, or send pull requests to contribute code. We request that you keep the following guidelines in mind:
- A new tip or a change to an existing tip must be implementable in T-SQL using the current surface area of Azure SQL Database;
- A new tip or a change to an existing tip must be useful to a significant portion of Azure SQL Database customers;
- The implementation must be lightweight, i.e. it should not cause high resource utilization or contention that may impact running workloads, and should not cause the script to run for an excessively long time.
The last two guidelines are clearly subjective; we will use community input and our best judgement when considering contributions.
This section includes detailed descriptions for tips that may be produced by the script. Tips below are not in any particular order, and should be considered as collateral to script output, not as a set of general recommendations.
Maximum degree of parallelism (MAXDOP) for primary and secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for both primary and secondary replicas
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = PRIMARY;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Maximum degree of parallelism (MAXDOP) for the primary replica is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for the primary replica
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Maximum degree of parallelism (MAXDOP) for secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for the secondary replicas
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = 8;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Database compatibility level is not current. Recently added database engine features and improvements may not be available in this database. Consider using latest supported compatibility level, but note that changing compatibility level may require testing.
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150; -- check the first link for the latest compatibility level in Azure SQL Database
Alter database compatibility level
Auto-create statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-create statistics.
ALTER DATABASE CURRENT SET AUTO_CREATE_STATISTICS ON;
Auto-update statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-update statistics.
ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS ON;
Read Committed Snapshot Isolation (RCSI) is disabled. This may cause unnecessary lock blocking for DML queries.
ALTER DATABASE CURRENT SET READ_COMMITTED_SNAPSHOT ON;
Snapshot isolation in SQL Server SET TRANSACTION ISOLATION LEVEL
Query Store is disabled. This complicates performance tuning and troubleshooting, and disables certain Intelligent Query Processing features that could otherwise improve query performance and reduce resource consumption. Enable Query Store.
ALTER DATABASE CURRENT SET QUERY_STORE = ON;
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Query Store operation mode is READ_ONLY. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that could otherwise improve query performance and reduce resource consumption.
The details
column provides the reason for the read-only state. Change Query Store operation mode to READ_WRITE.
ALTER DATABASE CURRENT SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Query Store capture mode is NONE. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that could otherwise improve query performance and reduce resource consumption.
Change Query Store query capture mode to AUTO or ALL depending on requirements, or use a custom query capture policy that captures all queries relevant to your workload.
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
AUTO_SHRINK is enabled. While shrinking a database may be needed in response to one-time data deletion event and/or to reduce excessive allocated space in data files, it should not be executed continuously by enabling auto-shrink. Auto-shrink causes persistent and high resource utilization that will negatively impact workload performance.
ALTER DATABASE CURRENT SET AUTO_SHRINK OFF;
Recommendations and guidelines for setting the AUTO_SHRINK database option in SQL Server
Btree indexes have uniqueidentifier (GUID) leading columns. For larger tables, the details
column contains a list of btree indexes with leading columns using the uniqueidentifier
data type. Such indexes are subject to low page density as data is modified. This leads to increased disk space and memory usage and negatively impacts performance.
Avoid this pattern in physical database design, particularly for clustered btree indexes. To increase page density and release space, rebuild indexes.
-- Determine average page density for an index.
-- Replace [object_id] and [index_id] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object_id], [index_id], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
-- set FILLFACTOR to 80 if significant new data is being added, or if the values in the uniqueidentifier index leading column change frequently
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8, FILLFACTOR = 100);
SQL Server Index Architecture and Design Guide
FORCE_LAST_GOOD_PLAN auto-tuning option is not enabled. Query plan regressions will not be fixed automatically via plan forcing. To avoid unexpected performance issues due to query plan regressions, enable this auto-tuning option.
ALTER DATABASE CURRENT SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = ON);
Automatic tuning in Azure SQL Database
Enable automatic tuning in the Azure portal
Used data size within the database is close to maximum configured database size (MAXSIZE). To allow continued data growth, increase maximum database size, or scale up to a service tier or service objective that supports higher maximum database size, or implement data compression, or delete unneeded data.
ALTER DATABASE [database_name] MODIFY (MAXSIZE = 4096 GB); -- adjust database name and MAXSIZE according to requirements
Manage file space for databases in Azure SQL Database
What happens when database resource limits are reached
Storage size allocated for data files is close to maximum configured database size (MAXSIZE). If used size is not close to MAXSIZE and significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.
Note that data file shrink is a long-running and resource intensive process that should not be executed on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.
Prior to shrinking, if page density is low (below 70%), rebuild indexes.
-- Replace [object_id] and [index_id] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object_id], [index_id], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);
-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed
-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed
Manage file space for databases in Azure SQL Database
Storage size allocated for data files is much larger than used data size. If significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.
Note that data file shrink is a long-running and resource intensive process that should not be used on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.
Prior to shrinking, if page density is low (below 70%), rebuild indexes.
-- Replace [object_id] and [index_id] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object_id], [index_id], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);
-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed
-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed
Manage file space for databases in Azure SQL Database
Significant CPU throttling has recently occurred, as noted in the details
column. If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries ranked high by cpu time.
CPU throttling is common on lower service objectives, e.g. Basic, S0, S1, etc.
What happens when database resource limits are reached
Out of memory errors have recently occurred, as noted in the details
column. Tune query workload to consume less memory, or scale up to a service objective with more memory, or both. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.
What happens when database memory resource limits are reached
Memory grant waits and/or timeouts have recently occurred. The data in the details
column notes the number of recent requests for memory grants that had to wait with RESOURCE_SEMAPHORE
wait type, and the number of such requests that timed out waiting. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.
Tune query workload to request smaller memory grants, or reduce the number of queries running concurrently, or scale up to a service objective with more memory.
What happens when database memory resource limits are reached
Nonclustered indexes with low reads found. The details
column contains a list of non-unique nonclustered indexes where the number of index read operations is much less than the number of index write (update) operations.
As data changes, indexes are updated, which takes time and resources. The resource overhead of updating indexes that have few reads may outweigh their benefit, if any.
If the data in details
is for a sufficiently long period during which all critical workloads have run, consider dropping these indexes. Then, monitor performance to determine impact. If performance regressed, recreate dropped indexes.
SQL Server Index Architecture and Design Guide
ROW or PAGE data compression opportunities found. The details
column contains a list of objects, indexes, and partition ranges showing their current and suggested new data compression type, based on recent workload sampling and heuristics. To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.
In the following example of data in the details
column, the suggested data compression is ROW
for the first 5 partitions, and PAGE
for the last 5 partitions of the orders
table. The type of the index (clustered or nonclustered) and the total size of all partitions in the referenced range are included.
schema: [dbo], object: [orders], index: [ci_orders], index type: CLUSTERED, partition range: 6-10, partition range size (MB): 17,802.98, present compression type: NONE, suggested compression type: ROW
schema: [dbo], object: [orders], index: [ci_orders], index type: CLUSTERED, partition range: 1-5, partition range size (MB): 37,427.57, present compression type: NONE, suggested compression type: PAGE
ROW compression adds minimal CPU utilization while providing substantial storage and memory savings. For most workloads, ROW compression should be enabled by default for new tables and indexes. PAGE compression adds higher CPU utilization, but also provides higher storage and memory savings.
Additional analysis and testing based on information in provided links may be required to select data compression types optimally. Any data compression requires additional CPU processing, thus testing is particularly important when low query latency must be maintained in transactional workloads.
Data Compression: Strategy, Capacity Planning and Best Practices
There are recent occurrences of transaction log IO (aka log rate) approaching the limit of the service objective, as noted in the details
column. To improve performance of bulk data modifications including data loading, consider tuning the workload to reduce log IO, or scale up to a service objective with a higher log IO limit.
Additional solutions and workarounds include implementing data compression (ROW, PAGE, columnstore) to reduce effective log IO, and/or loading transient data to be further transformed/aggregated into the tempdb
database (e.g. using global temporary tables), or into non-durable memory-optimized tables.
Resource limits for Azure SQL Database
A significant part of plan cache is occupied by single-use plans, which have never been reused by query executions and therefore use memory unnecessarily.
To avoid caching single-use plans for a database and reduce memory consumption, enable the OPTIMIZE_FOR_AD_HOC_WORKLOADS
database-scoped configuration for each database where single-use plans occupy a large amount of memory, as noted in the details
column.
ALTER DATABASE SCOPED CONFIGURATION SET OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON;
ALTER DATABASE SCOPED CONFIGURATION
Queries could benefit from indexes that do not currently exist. The details
column describes potential new index definitions, based on metadata collected since database engine startup.
For more accurate indexing recommendations based on longer term historical data and including CREATE INDEX and DROP INDEX statements, see Performance recommendations for this database in Azure portal, or use the Get-AzSqlDatabaseRecommendedAction
PowerShell cmdlet.
Performance recommendations for Azure SQL Database
Get-AzSqlDatabaseRecommendedAction
sys.dm_db_missing_index_details
Redo queue on a secondary readable replica is large. Queries running on this replica may experience significant data latency relative to the primary replica. A large redo queue may also cause longer failovers if the replica is used as the failover target.
Common causes of a large redo queue are CPU and/or IO resource contention on the secondary replica, or blocking of the redo thread due to long-running queries on the secondary replica and concurrent DDL operations on the primary replica.
Tune/reduce workload on the secondary replica to use less resources, and/or increase database service objective. Avoid DDL operations such as schema changes on the primary replica while there are long-running queries executing on the secondary replica.
This tip is produced only on readable secondary replicas, including non-Hyperscale read scale-out replicas and geo-replicas.
Monitoring and troubleshooting read-only replicas
Data IOPS are close to workload group limit. Recent data IO statistics are available in the details
column.
Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
If workload performance for this database is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
Compare the DTU-based service tiers
Workload group IO governance impact is significant. Recent data IO statistics are available in the details
column.
Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
If workload performance for this database is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
Compare the DTU-based service tiers
Data IOPS are close to resource pool limit. Recent data IO statistics are available in the details
column.
Resource pool defines resource governance limits for an elastic pool.
If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
Compare the DTU-based service tiers
Resource pool IO governance impact is significant. Recent data IO statistics are available in the details
column.
Resource pool defines resource governance limits for an elastic pool.
If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
Compare the DTU-based service tiers
Persistent Version Store (PVS) size is large. PVS size and other statistics are available in the details
column.
PVS is a part of the user database, and is used to store row versions when Accelerated Database Recovery (ADR), Read-Committed Snapshot Isolation (RCSI), or Snapshot Isolation (SI) are enabled for the database. Large PVS size is usually caused by long-running active transactions. It may also occur due to a large number of aborted transactions that haven't yet been cleaned up by the PVS cleaner background thread.
Review application patterns to avoid long-running transactions, and make sure to explicitly roll back open transactions after query timeouts and other errors.
Paused resumable index maintenance operations found. Paused resumable index operations will be aborted automatically after a period of time specified by the PAUSED_RESUMABLE_INDEX_ABORT_DURATION_MINUTES
database-scoped configuration.
Either resume these index operations, or, if not planning to resume, abort them explicitly to avoid unnecessary performance impact on the DML operations changing these indexes between now and the time when auto-abort occurs.
ALTER INDEX [index_name] ON [schema_name].[object_name] RESUME; -- change schema, object, and index names
ALTER INDEX [index_name] ON [schema_name].[object_name] ABORT; -- change schema, object, and index names
sys.index_resumable_operations
A clustered columnstore index (CCI) can be created on one or more tables to improve query and data load performance and drastically reduce storage space and memory consumption by the table.
CCI indexes are most useful in analytical/reporting workloads where queries use large scans. CCI indexes are not recommended in OLTP workloads, or when queries predominantly use index seeks, lookups, and small range scans. They are also not recommended when rows in the table are frequently updated, or when less frequent but large updates occur.
The details
column provides a list of CCI candidate tables. Index usage statistics for the recent query workload on each table are included to help determine if CCI is a good fit for this table.
To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.
Columnstore indexes - Design guidance
Columnstore indexes - Query performance
Columnstore indexes - Data loading guidance
Some geo-replication links may be unhealthy. The details
column describes geo-replication link health, including data propagation lag (available on the primary geo-replica only) and replica state.
If a geo-replication link becomes unhealthy during high write load on the primary geo-replica, then the service objective of the secondary geo-replica may need to be increased to sustain data propagation rate.
For a recently created geo-replica, the SEEDING
replication state is expected. In all other cases, geo-replica replication state should be CATCH_UP
.
sys.dm_geo_replication_link_status
Last partitions are not empty for some partitioned tables. If the very last partition is not empty, and needs to be split to create a new partition for new data, the split operation will be slow and resource-intensive.
If new partitions are periodically added to accommodate new data, make sure to keep a buffer of several empty partitions at the end of the partition list.
The details
column provides the list of last several partitions for each table if at least some of them are not empty. If the very last partition is empty, act before any data is loaded into this partition to split it and create at least one empty partition at the end of the partition list. Otherwise, plan and prepare for a time-consuming process of splitting a non-empty partition, to avoid all new data being added to the last partition.
ALTER PARTITION FUNCTION [function_name] SPLIT RANGE ([new boundary value]);
Partitioned Tables and Indexes
Blog articles by Kendra Little:
How to Fix Lopsided Partitioned Tables
Sliding Window Table Partitioning: What to Decide Before You Automate
To help in performance investigations, top queries for a time interval are summarized in the details
column. This tip is informational in nature, and is produced when any queries have executed in the time interval. On idle databases, top queries may include the queries executed by the get-sqldb-tips.sql
script itself.
For each query, identified by its query_hash
value, query_id
, plan_id
, the number of query executions by execution type (regular, aborted, exception) and weighted wait categories are included in details
.
Additionally, each query is ranked by its consumption of the following resources:
- cpu time
- duration
- execution count
- logical IO reads
- physical IO reads
- maximum query memory used
- log bytes used
- tempdb space used
- parallelism
Lower rank numbers mean higher rank, e.g. the highest CPU consuming query has cpu time rank 1.
In the following example, the query with hash 0x5F9043DBAAEBB1FF
is the top consumer of tempdb
space (tempdb used rank: 1
), has two associated query_id
values (66
and one more), and three associated plan_id
values (36
and two more). It successfully executed 31615
times in the time interval. Most waits (~77.4%) were on CPU.
query hash: 0x5F9043DBAAEBB1FF, query_id: 66 (+1), plan_id: 36 (+2), executions: (regular: 31615, aborted: 0, exception: 0), CPU time rank: 4, duration rank: 32, executions rank: 50, logical IO reads rank: 6, physical IO reads rank: 5, max used memory rank: 79, log bytes used rank: 48, tempdb used rank: 1, parallelism rank: 50, weighted wait categories: CPU (0.774) | Latch (0.137) | Memory (0.057) | Unknown (0.028) | Preemptive (0.002) | Buffer IO (0.002)
To change the number of top ranked queries included for each resource type, change the @QueryStoreTopQueryCount
variable.
By default, the time interval used is the last hour. You can change the @QueryStoreIntervalMinutes
variable to look at shorter or longer intervals. To look at a specific time interval in the past, set the @QueryStoreCustomTimeStart
and @QueryStoreCustomTimeEnd
variables. Note that this tip may not appear if the specified interval is shorter than the configured Query Store resource statistics interval length.
Depending on the nature of the performance issue, focus on the queries ranked high for the relevant resource type. For example, if troubleshooting high tempdb
space consumption, look at the tempdb used
rank.
For further investigations, use Query Performance Insight in Azure portal, Query Store reports in SSMS, or query sys.query_store_*
views directly. When there are multiple query_id
values for a given query_hash
value, use the query below to see all query_id
and plan_id
values, and the SQL text for each query.
-- Return query/plan runtime statistics in the specified interval for a given query_hash value from the details column for further investigation
DECLARE @QueryHash binary(8) = 0x113AAAF159577E8C; -- replace query hash with the actual value from details
DECLARE @QueryStoreIntervalMinutes int = 60;
DECLARE @QueryStoreCustomTimeStart datetimeoffset -- = '2021-01-01 00:01 +00:00';
DECLARE @QueryStoreCustomTimeEnd datetimeoffset -- = '2021-12-31 23:59 +00:00';
SELECT q.query_id,
q.context_settings_id,
qt.query_sql_text,
OBJECT_SCHEMA_NAME(q.object_id) AS schema_name,
OBJECT_NAME(q.object_id) AS object_name,
p.plan_id,
TRY_CAST(p.query_plan AS xml) AS query_plan,
SUM(IIF(rs.execution_type_desc = 'Regular', rs.count_executions, 0)) AS count_regular_executions,
SUM(IIF(rs.execution_type_desc = 'Aborted', rs.count_executions, 0)) AS count_aborted_executions,
SUM(IIF(rs.execution_type_desc = 'Exception', rs.count_executions, 0)) AS count_exception_executions,
SUM(rs.avg_cpu_time * rs.count_executions) / SUM(rs.count_executions) AS avg_cpu_time,
SUM(rs.avg_duration * rs.count_executions) / SUM(rs.count_executions) AS avg_duration,
SUM(rs.avg_logical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_logical_io_reads,
SUM(rs.avg_physical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_physical_io_reads,
SUM(rs.avg_query_max_used_memory * rs.count_executions) / SUM(rs.count_executions) AS avg_query_max_used_memory,
SUM(rs.avg_log_bytes_used * rs.count_executions) / SUM(rs.count_executions) AS avg_log_bytes_used,
SUM(rs.avg_tempdb_space_used * rs.count_executions) / SUM(rs.count_executions) AS avg_tempdb_space_used,
SUM(rs.avg_dop * rs.count_executions) / SUM(rs.count_executions) AS avg_dop
FROM sys.query_store_query AS q
INNER JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_runtime_stats_interval AS rsi
ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE q.query_hash = @QueryHash
AND
rsi.start_time >= IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
DATEADD(minute, -@QueryStoreIntervalMinutes, SYSDATETIMEOFFSET()),
@QueryStoreCustomTimeStart
)
AND
rsi.start_time <= IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
SYSDATETIMEOFFSET(),
@QueryStoreCustomTimeEnd
)
GROUP BY q.query_hash,
q.query_id,
q.context_settings_id,
q.object_id,
qt.query_sql_text,
p.plan_id,
p.query_plan
ORDER BY query_id, plan_id
OPTION (RECOMPILE);
Query Performance Insight for Azure SQL Database
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Storage space allocated for tempdb
data files is close to maximum tempdb
data size. The details
column provides tempdb
allocated and used data sizes. If additional tempdb
space is needed, consider scaling to a service objective that provides a larger tempdb
.
In the Premium and Business Critical service tiers, tempdb
shares local SSD space with user databases. Large tempdb
allocated space can prevent user database growth and cause out-of-space errors.
As an emergency measure, you can reset tempdb
allocated space to its initial small value by failing over the database or elastic pool using PowerShell cmdlets linked below. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
tempdb database in the DTU purchasing model
Invoke-AzSqlElasticPoolFailover
Sample queries to monitor tempdb usage
Storage space used in tempdb
data files is close to maximum tempdb
data size. The details
column provides tempdb
allocated and used data sizes. Continuing growth of space used may cause tempdb
out-of-space errors. If additional tempdb
space is needed, consider scaling to a service objective that provides a larger tempdb
.
As an emergency measure, you can reduce tempdb
used size to its initial small value by failing over the database or elastic pool using PowerShell cmdlets linked below. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
tempdb database in the DTU purchasing model
Invoke-AzSqlElasticPoolFailover
Sample queries to monitor tempdb usage
Storage space allocated for tempdb
log file is close to maximum tempdb
log size. The details
column provides tempdb
allocated and maximum transaction log sizes. This may be caused by a long-running transaction modifying data in a temporary table, or using tempdb
for internal storage.
As an emergency measure, you can reset tempdb
allocated log size to its initial small value by failing over the database or elastic pool using PowerShell cmdlets linked below. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or elastic pool. Alternatively, you can kill the sessions with long-running transactions in tempdb
.
-- Transactions in tempdb sorted by duration
SELECT st.session_id,
dt.database_transaction_begin_time,
DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
st.is_user_transaction,
st.open_transaction_count,
ib.event_type,
ib.parameters,
ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = 2
ORDER BY transaction_duration_seconds DESC;
Invoke-AzSqlElasticPoolFailover
Sample queries to monitor tempdb usage
There are recent occurrences of worker utilization approaching the workload group limit of the service objective, as noted in the details
column. Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
If the limit is reached, queries and new connection attempts will fail. Common reasons for high worker utilization include:
- excessive parallelism
- large blocking chains
- the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective
Tune query workload and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in a time interval, review top queries ranked high by parallelism and/or execution count.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
What happens when worker resource limit is reached
There are recent occurrences of worker utilization approaching the resource pool limit of the service objective, as noted in the details
column. Resource pool defines resource governance limits for an elastic pool.
If the limit is reached, queries and new connection attempts will fail for some or all databases in the elastic pool. Common reasons for high worker utilization include:
- excessive parallelism
- large blocking chains
- the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective
Tune query workloads and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in a time interval in a database, review top queries for the database, ranked high by parallelism and/or execution count.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
What happens when worker resource limit is reached
Notable network connectivity events have recently occurred in this database, or other databases in the same elastic pool. This includes network errors, slow login attempts, and abnormally closed connections. Retry logic implemented by an application or a MSSQL network driver can prevent these events from impacting applications.
This tip is informational in nature. The presence of network connectivity events does not necessarily indicate a problem. Some events, while reported by this tip, may be expected in Azure SQL Database. For example, idle sessions are killed after a period of inactivity, by design.
The details
column provides a query to view detailed connectivity event data obtained in the latest execution of the script.
To reduce the likelihood of workload-impacting network events, configure the logical server to use the Redirect connection policy.
High CPU utilization by the database engine instance has been sustained for a significant period of time, as noted in the details
column. If the database is in an elastic pool, this refers to cumulative CPU utilization by all databases in the same pool.
If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries for each database ranked high by cpu time.
What happens when database resource limits are reached
Some statistics may be out of date. This may cause poor query performance due to suboptimal query plans. The list of potentially out of date statistics on user objects and some system objects is included in the details
column.
Update statistics by executing sp_updatestats
, UPDATE STATISTICS
, or a database maintenance solution, and consider establishing a periodic maintenance procedure to keep statistics up to date.
EXEC sys.sp_updatestats;
Many tables do not have any indexes. This tip appears when a substantial percentage of all tables in the database (excluding very small tables) do not have any indexes. This often causes poor query performance due to suboptimal data retrieval. The list of such tables is included in the details
column.
Review the linked Index Architecture and Design Guide to learn how to create efficient indexes. If the Queries could benefit from indexes that do not currently exist tip also appears, derive new indexes from index definitions in the details
column of that tip.