I have been incognito for the last couple of weeks and nearly missed TSQL Tuesday this month. If it weren’t for somebody pinging me requesting the link to the list of upcoming hosts, I would have missed it entirely. The topic merges well with the work I have been doing for the past few weeks. In fact, there are so many things that DBA’s do on a regular basis to automate things – there should be plenty of items to cherry pick for a topic. Unfortunately, that doesn’t make this topic any easier for me. My decision process for this is partially based on thinking through what might be unique without having read what others may have written. (While thinking about reading blogs, it would be nice if there was an automated method to ingest all of the blogs into my head without having to iterate through them one at a time – manually.)
This month the party is hosted by Pat Wright (blog | twitter). Pat has asked us to describe some of the things we automate – or some of our automation that we have implemented. Often, we hear about DBA’s automating everything under the sun. Why? It simplifies the job and creates time to work on other projects. With all of the automation, I wonder if DBA’s are related to Hephaestus in any fashion. If we had our way, it seems that our databases would be…Automatons.
A Lesser Automation
Now that I have rambled for a good bit, I guess it is time to get to the meat of the topic. First, we need to understand automation. So, what is automation?
- The automatic operation or control of equipment, a process, or a system.
- The techniques and equipment used to achieve automatic operation or control.
A closely related word to automation is:
- Computerization – the control of processes by computer
In other words, for a DBA, automation is the implementation of a process or control for the computer to operate without the DBA doing the work. This is typically something that is repetitive or menial or tedious or frequently done. But that is not always the case. This can also be something that is fun and/or only done once or twice a year.
I have one of those cases where I may use the automation even less often than once a year, or maybe it could come about more frequently. It all depends on the needs of designing and testing new databases.
This process is to help in properly sizing the database before the database is finally released to production. With good project requirements, you may have a good idea of what the fields and sizes of those fields should be. In some cases, you will be getting data from an external source in some fashion or another. This data does not always come with storage requirements or data size parameters. You can make guesses at it by looking at the data – but sometimes, something more is required.
It is when more is required that this script comes in a bit handy for me. The script is ugly, but it does the deed. The base idea is to retrieve the data length for a sample of data from each of the tables (after import into a staging database). This is done so I can run statistical analysis on the data later. And cringe now because I use a nested cursor to get at what I want.
So Here is the script that will load a sample of data from every table in every database for every column and give you the length of the pertinent columns (I have excluded obvious columns such as numeric types and certain LOB types).
IF exists (SELECT name FROM sys.objects WHERE name = 'DataLenStats_Alt')
BEGIN
DROP TABLE DataLenStats_Alt
END
CREATE TABLE DataLenStats_Alt (DatabaseName sysname,TableName SYSNAME, ColumnName SYSNAME, ColLens INT)
DECLARE dbfetch CURSOR STATIC FOR
SELECT name
FROM sys.databases
WHERE database_id > 4
And Name <> 'Admin'
ORDER BY Name
OPEN dbfetch;
DECLARE @sql1 VARCHAR(MAX), @dbname VARCHAR(128)
FETCH NEXT FROM dbfetch
INTO @dbname;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql1 = '
Use ' + @dbname +';
DECLARE datalens_cursor CURSOR Static FOR
SELECT Distinct IST.TABLE_NAME,IST.Column_Name
FROM Information_schema.Columns IST
Inner Join sys.dm_db_partition_stats AS st
On object_id(IST.TABLE_NAME) = st.object_id
WHERE ObjectProperty(object_id(Table_Name),''Ismsshipped'') = 0
And Table_Name not like ''sysdi%''
And DATA_TYPE not in (''XML'',''uniqueidentifier'',''image'')
And isnull(numeric_precision,0)=0
Group By Table_Name,Column_name
Having Sum(st.row_count) > 0
ORDER BY Table_name;
OPEN datalens_cursor;
declare @sql varchar(max), @table sysname, @column sysname
FETCH NEXT FROM datalens_cursor
INTO @table,@column;
WHILE @@FETCH_STATUS = 0
BEGIN
Set @sql=''''
select @sql=@sql+''select top 2 Percent db_name(),''''''+@table+'''''',''''''+@column+'''''', len(isnull(convert(varchar(max),''+@column+''),''+''0''+'')) from [''+@table+''] ''
From information_schema.columns
Where table_name=''''+@table+''''
And column_name = ''''+@column+''''
Begin Tran
--print @sql
Insert Into Admin.dbo.DataLenStats_Alt
exec(@sql)
Commit Tran
FETCH NEXT FROM datalens_cursor
INTO @table, @column;
END
CLOSE datalens_cursor
DEALLOCATE datalens_cursor
--Checkpoint
Use Tempdb;
Checkpoint
'
EXEC (@sql1)
FETCH NEXT FROM dbfetch INTO @dbname;
PRINT @sql1
END
CLOSE dbfetch
DEALLOCATE dbfetch
And yeah, I rushed through this to get it finished up quickly. I hope to have more time to delve into it later. At least with this script, I can load a table with adequate data to be able to generate histograms on the data length/size distribution and then make appropriate sizing decisions based on the statistics. Be warned – though automated it is slow and should not be run on a production server.