Listing 2 is a script that shows the size of all databases on the Amazon RDS instance. This script leverages the sp_msforeachdb stored procedure which can loop through all databases on a SQL Server instance. For each the database, the script specified in the @command1 parameter is executed for each database. The overall script is built such that the data generated by the core script is written to a table named [DB Size].
Figure 6 Shows the output of this query. Because this is a new instance, rdsadmin happens to be the largest instance followed by msdb.
Figure 8: Database Size in Amazon RDS Instance Expressed in MB
Check Database Activity Based on Index Usage Listing 2 is a script that shows the number and timing of user seeks, scans and lookups of indexes in each database. These numbers are indicative of how busy a database is. Panjak Patel gives a good summary of what index seeks, index scans and table scans mean in this short article .
Gail Shaw , a well-known SQL Server MVP and MCM differentiates User Seeks and Scans as follows:
“User_seeks will be the number of times a user query could use a seek on the index (execution plan shows an index seek operation), user scans is the number of times that a user query had to scan the index to get the necessary results. Any select without a where clause will be a scan.”
In his characteristic deep-dive fashion, Brent Ozar another MCM-cum-MVP illustrates index seeks and scans further in this article .
I tend to think of an index seek like looking for something in an ordered list, example a list of numbers 1 to 10. If you are asked to find the number 7, you can easily split the list in two since it is order and quickly determine that 7 will be in the second half. Index seeks are generally fast. For an index scan operation, the engine must look through the entire physical structure of the index. A table scan means the engine looks through the entire table (typically the case for heaps ). An index lookup simply means a non-clustered index is used to fulfil part of the result set required by a query.
The discussion on indexes might be a bit of a digression but the point is user interactions with indexes in the database are an indication of how busy the database is.
Listing 3: Check Database Usage -- Listing 3: Check Database Usage --
PRINT 'Checking Database Usage'
SELECT db_name(database_id) [Database]
, SUM(user_seeks) [Total User Seeks]
, SUM(user_scans) [Total User Scans]
, SUM(user_lookups) [Total User Lookups]
, SUM(user_updates) [Total User Updates]
, MAX(last_user_seek) [Last User Seek]
, MAX(last_user_scan) [Last User Scan]
, MAX(last_user_lookup)[Last User Lookup]
, MAX(last_user_update) [Last User Update]
FROM sys.dm_db_index_usage_stats
GROUP BY database_id
ORDER BY [Database]; Figure 9 gives us a sneak peek into the statistics in our specific case. Based on the data, rdsadmin is the busiest database.
Figure 9: Output for Query on Busiest Databases
Check Database Backups on Amazon RDS Instance Listing 4 checks the last backup of each database and displays the backup size, backup time and other details. The script checks for the past thirty days. It can also be revised to check for databases that have not been backed up in the last 30 days (or 7 days preferably). We can accomplish this with the Common Table Expression (CTE) shown in Listing 5.
Listing 4: Check Database Backup Sets -- Listing 4: Check Database Backup Sets --
SELECT
bus.database_name
,bus.name
,bus.description
,bus.first_lsn
,bus.last_lsn
,round(bus.backup_size/1024,0) as backup_size_MB
,bus.server_name
,bus.backup_start_date
,bus.backup_finish_date
,bus.expiration_date
FROM msdb..backupset bus
JOIN msdb..backupmediafamily bmf on bus.media_set_id=bmf.media_set_id
WHERE bus.backup_start_date >= (getdate() - 30); The result of Listings 4 and 5 are showed in Figures 10 and 11.
Figure 10: Output of Listing 4 - Database BackupListing 5: Check Databases Not Backed Up -- Listing 5: Check Databases Not Backed Up --
WITH backupcheck AS (
SELECT
bus.database_name
,bus.name
,bus.description
,bus.first_lsn
,bus.last_lsn
,round(bus.backup_size/1024,0) as backup_size_MB
,bus.server_name
,bus.backup_start_date
,bus.backup_finish_date
,bus.expiration_date
FROM msdb..backupset bus
JOIN msdb..backupmediafamily bmf on bus.media_set_id=bmf.media_set_id
WHERE bus.backup_start_date >= (getdate() - 30)
)
SELECT * from sys.databases
WHERE name NOT IN (select database_name from backupcheck); Figure 11: Output of Listing 5 - Databases Not Backed UpCheck Largest Tables and their Sizes Listing 6 gives us the largest tables in a specific database. The script leverages three joins on four system tables to achieve this result – sys.tables , sys.indexes , sys.partitions , and sys.allocation_units . As earlier mentioned, the column list from these tables can be return by invoking the system stored procedure sp_columns. Extensive documentation of the tables and their columns is also available in SQL Server documentation.
One might be interested in knowing the largest tables in a database to determine where to focus performance tuning effort or how to handle data archiving (by using table partitioning for example). Figure 10 shows us the output of this query in our example.
Listing 6: Check Size of Largest Tables -- Listing 6: Check Size of Largest Tables
USE rdsadmin
GO
-- Get the space used by table TableName
SELECT
TABL.name AS table_name,
INDX.name AS index_name,
SUM(PART.rows) AS rows_count,
SUM(ALOC.total_pages) AS total_pages,
SUM(ALOC.used_pages) AS used_pages,
SUM(ALOC.data_pages) AS data_pages,
(SUM(ALOC.total_pages)*8/1024) AS total_space_MB,
(SUM(ALOC.used_pages)*8/1024) AS used_space_MB,
(SUM(ALOC.data_pages)*8/1024) AS data_space_MB
FROM sys.tables AS TABL
INNER JOIN sys.indexes AS INDX
ON TABL.object_id = INDX.object_id
INNER JOIN sys.partitions AS PART
ON INDX.object_id = PART.object_id
AND INDX.index_id = PART.index_id
INNER JOIN sys.allocation_units AS ALOC
ON PART.partition_id = ALOC.container_id
WHERE INDX.object_id > 255
AND INDX.index_id <= 1
GROUP BY TABL.name,
INDX.object_id,
INDX.index_id,
INDX.name
HAVING (SUM(ALOC.used_pages)*8/1024) > 1
ORDER BY Object_Name(INDX.object_id),
(SUM(ALOC.total_pages)*8/1024) DESC
GO Figure 12: Output of Listing 6 – Largest Tables in a DatabaseConclusion In this article, we have seen just a few checks a data engineer might want to check when presented with a new environment. Just for fun, we have done these checks using raw T-SQL queries. The checks demonstrated in this article are just a small subset of things a Data Engineer might want to know about a new database environment. In addition, there is a plethora of tools in the market that you can used to achieve even more sophisticated results. Obviously, these tools implement similar queries in the background. Here is a short list:
Microsoft’s Utility Control Point Redgate’s SQL Monitor SQL Tools In exact terms, SQL Tools offers interesting capabilities relevant for exploring a new database. These include:
Get a full record of database changes and find out who committed them Generate lifelike meaningful test data for load testing and integration Run T-SQL unit tests as a part of continuous integration with DevOps And many more