As usually, most of my articles started with a specific task for my company. I am not looking for the article but the articles are finding me. Ones upon a time (starts like all those magic stories) I placed a question to myself – “How many columns in the corporate database are not used at all?”. Not used means the value for the column is NULL for every row and wasn’t ever inserted/updated for all rows in a table. For example, column special_note in a table CLIENT never was used and probably never will be used because database has the table notes where all notes are stored.
If such statistical information will be obtained it allows one to make an analysis and may help with limiting the number of columns in each table, find bugs in an application if the application is using the column(s), and help with the future database design.
Let’s try to develop a code for the simple analysis of NULL columns, not NULL columns, and number of records in a table. This generic query shows the number of records in a table:
Select count(*) from tablenm
Another generic query shows how many records have NULL value for the column:
Select count(*) from tablenm where colnm is NULL
Basically, this is all we need to know to be able gathering statistical data for the database. Of course, this is must be dynamic code to run it in any database with any unknown number of tables and columns. Let’s check the next code for one database.
It makes sense to gather information only for the columns that allows NULL (column isnullable = 1 in table syscolumns). And it may worth to limit end result with some other restrictions. For example, our databases include few tables with ERR prefix which are not used as a data storage but used by the error handler. Such tables could be eliminated from the analysis.
First, create a temporary permanent table in pubs.
Create table pubs..tmp_count (dbnm varchar(50), tbl varchar(255), col varchar(255), nulls int, tot int) go
It is possible to create the analysis table in the database we going to analyze, but generic table in pubs or another database will help with the analysis for all databases on a server and will combine statistical information into one table. Script below shows how to gather statistical data for one database.
---------------------------------------------------------------------------------- -- begin batch ---------------------------------------------------------------------------------- BEGIN declare @sql Nvarchar(1000), @minid int, @maxid int, @table_n varchar(255), @col varchar(255), @dbnm varchar(50) declare @tmp1 table (id int identity(1,1), tbl varchar(255), col varchar(255)) set @dbnm = db_name() truncate table pubs..tmp_count insert into @tmp1(tbl,col) select so.name, sc.name from syscolumns sc inner join sysobjects so on so.id = sc.id where so.name not like 'ERR_%' and so.type = 'U' and sc.isnullable = 1 select @minid = 1, @maxid = max(id) from @tmp1 while (@minid <=@maxid) begin select @table_n = tbl,@col = col from @tmp1 where id = @minid select @sql = ' insert into pubs..tmp_count(dbnm, tbl, col , nulls , tot)' select @sql = @sql + ' select ''' + @dbnm + ''', ''' + @table_n + ''', ''' + @col + ''', (select count(*) from [' + @table_n + '] where ['+ @col + '] is null) ' select @sql = @sql + ' , (select count(*) from [' + @table_n +'])' exec ( @sql ) set @minid = @minid + 1 end END ----------------------------------------------------------------------------------- -- end batch -----------------------------------------------------------------------------------
Let’s check the result.
select left(dbnm,10) as dbnm, left(tbl,20) as tbl, left(col,20) as col,tot, nulls, tot - nulls as NotNulls from pubs..tmp_count order by tot – nulls dbnm tbl col tot nulls NotNulls ---------- -------------------- ------------- ------- ----------- ----------- pubs discounts stor_id 3 2 1 pubs discounts lowqty 3 2 1 pubs discounts highqty 3 2 1 pubs publishers state 8 2 6 pubs stores stor_name 6 0 6 pubs stores stor_address 6 0 6 pubs stores city 6 0 6 pubs stores state 6 0 6 pubs stores zip 6 0 6 pubs publishers country 8 0 8 pubs pub_info logo 8 0 8 pubs pub_info pr_info 8 0 8
Editor's Note: Results abbreviated
What is the next step to modify the script? Of course, to get a data points from all user databases. Production server I am managing has 200+ databases and even physically I can’t analyze the metadata by running the analysis script on every database.
There are multiple ways to get it done. The simplest way is to use undocumented Microsoft stored procedure sp_msforeachdb which perform the same actions for all databases.
----------------------------------------------------------------- -- begin batch ----------------------------------------------------------------- declare @sqlscrt Nvarchar(3000) truncate table pubs..tmp_count set @sqlscrt = ' BEGIN declare @sql Nvarchar(1000), @minid int, @maxid int, @table_n varchar(255), @col varchar(255), @dbnm varchar(50) declare @tmp1 table (id int identity(1,1), tbl varchar(255), col varchar(255)) set @dbnm = db_name() insert into @tmp1(tbl,col) select so.name, sc.name from syscolumns sc inner join sysobjects so on so.id = sc.id where so.name not like ''ERR_%'' and so.type = ''U'' and sc.isnullable = 1 select @minid = 1, @maxid = max(id) from @tmp1 while (@minid <=@maxid) begin select @table_n = tbl,@col = col from @tmp1 where id = @minid select @sql = '' insert into pubs..tmp_count(dbnm, tbl, col , nulls , tot)'' select @sql = @sql + '' select '''''' + @dbnm + '''''', '''''' + @table_n + '''''', '''''' + @col + '''''', (select count(*) from ['' + @table_n + ''] where [''+ @col + ''] is null) '' select @sql = @sql + '' , (select count(*) from ['' + @table_n +''])'' exec ( @sql ) set @minid = @minid + 1 end END ' set @sqlscrt = N'use[?] ' + @sqlscrt exec sp_msforeachdb @command1 = @sqlscrt ----------------------------------------------------------- -- end batch -----------------------------------------------------------
Let’s check the result.
select left(dbnm,25) as dbnm, left(tbl,20) as tbl, left(col,20) as col,tot, nulls, tot - nulls as NotNulls from pubs..tmp_count order by dbnm, tot – nulls
I do not show the result of the query because it has a lot of rows. Query run time was close to 25 minutes in the server with about 150 databases. In many cases, I don’t need to analyze all databases but only some of them. So I did modify the code above to satisfy the criteria to be able to run the script for the specified databases or exclude few databases (for example, system databases).
--------------------------------------------------------------- -- begin batch --------------------------------------------------------------- BEGIN declare @sql Nvarchar(1000), @minid int, @maxid int , @table_n varchar(255), @col varchar(255) declare @minid1 int, @maxid1 int, @dbnm varchar(50) , @cmd varchar(8000) truncate table pubs..tmp_count declare @tmpdb table (tid int identity(1,1), dbnm varchar(50) ) create table #tmp1 (id int identity(1,1) , dbnm varchar(50) null, tbl varchar(255), col varchar(255)) insert into @tmpdb(dbnm) select name from master..sysdatabases where name in ('pubs', 'admin') --add all databases for analysis or exclude those you don’t need select @minid1 = min(tid), @maxid1 = max(tid) from @tmpdb while (@minid1 <= @maxid1) BEGIN select @dbnm = dbnm from @tmpdb where tid = @minid1 truncate table #tmp1 select @cmd = 'select so.name, sc.name from ' + @dbnm + '..syscolumns sc inner join ' + @dbnm + '..sysobjects so on so.id = sc.id where so.type = ''U'' and sc.isnullable = 1' insert into #tmp1(tbl,col) exec(@cmd) select @minid = 1, @maxid = max(id) from #tmp1 while (@minid <=@maxid) begin select @table_n = tbl,@col = col from #tmp1 where id = @minid select @sql = ' insert into pubs..tmp_count(dbnm,tbl, col , nulls , tot)' select @sql = @sql + ' select ''' + @dbnm + ''',''' + @table_n + ''', ''' + @col + ''', (select count(*) from ' + @dbnm + '..' + @table_n + ' where '+ @col + ' is null) ' select @sql = @sql + ' , (select count(*) from ' + + @dbnm + '..' +@table_n +')' exec ( @sql ) select @minid = @minid + 1 end select @minid1 = @minid1 + 1 end drop table #tmp1 END ---------------------------------------------------------------------------------- -- end batch ----------------------------------------------------------------------------------
Another obstacle that if a table is owned by specific user and permissions are not granted the script will not recognize the fail for such tables. But in production environment most (if not all) tables are owned by dbo and it should not make any problems.
If you decide, the stored procedure with various parameters can be created. It is possible to use the same batch for the various types of analysis. For example, you can set query to gather statistical information to know the specific column(s) setting or values.
Conclusion
This analysis is not something you need for your daily job. But it is very helpful for the database architect and/or database performance review. I was able to prove that about 20% of columns are not used in years, can be eliminated in old databases and we don’t need to use them for the new database design. It made tables smaller, design simpler, and, potentially, database performance better.