January 24, 2013 at 11:15 pm
Comments posted to this topic are about the item Table-Column Data Profiler
January 25, 2013 at 5:27 am
February 5, 2013 at 1:51 pm
Glad someone found it helpful. Here is an update that includes (Distinct) Value Counts:
DECLARE @Cat_Sch_TableName varchar(128) = 'HRIS.NTC.D20130201'
-- Initialize fully qualified object name; script can be used like a stored procedure
DECLARE @sqlstring varchar(8000), @ColumnName varchar(128), @ColumnID smallint,
@SequenceNumber smallint
-- cleanup temp tables
IF OBJECT_ID('Tempdb..#RawColumns') IS NOT NULL DROP TABLE #RawColumns
IF OBJECT_ID('Tempdb..#ProfiledColumns') IS NOT NULL DROP TABLE #ProfiledColumns
-- Populate working list of columns
SELECT ColumnID = IDENTITY(int, 1,1),
COLUMN_NAME AS ColumnName,
ORDINAL_POSITION AS SequenceNumber
INTO #RawColumns
FROM information_schema.COLUMNS
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
-- Initialize output table
CREATE TABLE #ProfiledColumns(
ColumnName varchar(128) NOT NULL,
SequenceNumber varchar(10),
NullCount int ,
ModalValue varchar(255),
ModalCount int,
ValueCount int,
MinValue varchar(255),
MaxValue varchar(255),
MaxLength smallint )
-- Loop through columns
WHILE (EXISTS(
SELECT *
FROM #RawColumns
WHERE ColumnName IS NOT NULL ))
BEGIN
SELECT @ColumnID = ColumnID,
@ColumnName = ColumnName,
@SequenceNumber = SequenceNumber
FROM #RawColumns -- assign column data to variables
SET @sqlstring=
'INSERT INTO #ProfiledColumns ' +
'SELECT '''+@ColumnName+''',' +
CAST(@SequenceNumber AS varchar(10)) + ' , ' +
'SUM(CASE WHEN ['+@ColumnName+'] IS NULL THEN 1 ELSE 0 END) ,'+
'(SELECT TOP 1 ['+@ColumnName+'] FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'(SELECT TOP 1 COUNT(*) FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'COUNT(DISTINCT ['+@ColumnName+']),' +
'CAST(MIN(['+@ColumnName+']) AS VARCHAR(255)),'+
'CAST(MAX(['+@ColumnName+']) AS VARCHAR(255)),'+
'MAX(LEN(CAST(['+@ColumnName+'] AS VARCHAR(255)))) '+
'FROM ' + @Cat_Sch_TableName -- build query string
DELETE FROM #RawColumns WHERE ColumnID=@ColumnID -- one column down, next...
EXECUTE (@sqlstring) -- execute built query
END -- End loop block
-- Display data profile with condensed data type
SELECT
ColumnName,
DATA_TYPE +
CASE WHEN DATA_TYPE LIKE '%char' THEN '('+LTRIM(CHARACTER_MAXIMUM_LENGTH)+')'
WHEN COALESCE(NUMERIC_SCALE,0) <> 0 THEN '('+LTRIM(NUMERIC_PRECISION)+','+LTRIM(NUMERIC_SCALE)+')' ELSE ''
END AS DataTypeName,
NullCount,
ModalValue,
ModalCount,
ValueCount,
MinValue,
MaxValue,
MaxLength
FROM #ProfiledColumns
JOIN information_schema.COLUMNS
ON SequenceNumber=ORDINAL_POSITION
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
ORDER BY CAST(SequenceNumber AS SMALLINT)
February 5, 2013 at 1:59 pm
Handy, but it throws an error on a bit field because you cannot perform a min or max on them
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
February 6, 2013 at 9:28 am
Thanks forgot about bit. This should work on bit (other data types?) as well...
DECLARE @Cat_Sch_TableName varchar(128) = 'HRIS.dbo.E20130201'
-- Initialize fully qualified object name; script can be used like a stored procedure
DECLARE @sqlstring varchar(8000), @ColumnName varchar(128), @ColumnID smallint,
@SequenceNumber smallint
-- cleanup temp tables
IF OBJECT_ID('Tempdb..#RawColumns') IS NOT NULL DROP TABLE #RawColumns
IF OBJECT_ID('Tempdb..#ProfiledColumns') IS NOT NULL DROP TABLE #ProfiledColumns
-- Populate working list of columns
SELECT ColumnID = IDENTITY(int, 1,1),
COLUMN_NAME AS ColumnName,
ORDINAL_POSITION AS SequenceNumber
INTO #RawColumns
FROM information_schema.COLUMNS
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
-- Initialize output table
CREATE TABLE #ProfiledColumns(
ColumnName varchar(128) NOT NULL,
SequenceNumber varchar(10),
NullCount int ,
ModalValue varchar(255),
ModalCount int,
ValueCount int,
MinValue varchar(255),
MaxValue varchar(255),
MaxLength smallint )
-- Loop through columns
WHILE (EXISTS(
SELECT *
FROM #RawColumns
WHERE ColumnName IS NOT NULL ))
BEGIN
SELECT @ColumnID = ColumnID,
@ColumnName = ColumnName,
@SequenceNumber = SequenceNumber
FROM #RawColumns -- assign column data to variables
SET @sqlstring=
'INSERT INTO #ProfiledColumns ' +
'SELECT '''+@ColumnName+''',' +
CAST(@SequenceNumber AS varchar(10)) + ' , ' +
'SUM(CASE WHEN ['+@ColumnName+'] IS NULL THEN 1 ELSE 0 END) ,'+
'(SELECT TOP 1 ['+@ColumnName+'] FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'(SELECT TOP 1 COUNT(*) FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'COUNT(DISTINCT ['+@ColumnName+']),' +
'CAST(MIN(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(8000)),'+
'CAST(MAX(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(8000)),'+
'MAX(LEN(CAST(['+@ColumnName+'] AS VARCHAR(255)))) '+
'FROM ' + @Cat_Sch_TableName -- build query string
DELETE FROM #RawColumns WHERE ColumnID=@ColumnID -- one column down, next...
EXECUTE (@sqlstring) -- execute built query
END -- End loop block
-- Display data profile with condensed data type
SELECT
ColumnName,
DATA_TYPE +
CASE WHEN DATA_TYPE LIKE '%char' THEN '('+LTRIM(CHARACTER_MAXIMUM_LENGTH)+')'
WHEN COALESCE(NUMERIC_SCALE,0) <> 0 THEN '('+LTRIM(NUMERIC_PRECISION)+','+LTRIM(NUMERIC_SCALE)+')' ELSE ''
END AS DataTypeName,
NullCount,
ModalValue,
ModalCount,
ValueCount,
MinValue,
MaxValue,
MaxLength
FROM #ProfiledColumns
JOIN information_schema.COLUMNS
ON SequenceNumber=ORDINAL_POSITION
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
ORDER BY CAST(SequenceNumber AS SMALLINT)
February 17, 2013 at 3:18 pm
Handy script, Thanks!! Many for the DB's I run into have nvarchar(max) columns. Any feel for how to address these?
Steve Pirazzi
ONEWARE, Inc.
http://www.ONEWARE.com
February 20, 2013 at 9:03 am
Hi Todd,
I just tried your script and I like it!
A couple of questions:
1. Can it be modified to loop through all tables in a database automatically and show the results?
2. For the datatypes that it can't handle (uniqueidentifier, image, text) it generates an exception and stops processing. Is it possible that it could simply skip the columns with these datatypes and produce an output of the remaining columns?
I ask because I'm not a strong coder and not sure how to get at this.
Thanks again for an excellent script Todd
Steve
February 20, 2013 at 10:36 am
Thanks Steve
I am not guaranteeing anything because I never allow use of varchar(max) and have no way to test it, BUT
for Varchar(max) types to work, change this statement:
CREATE TABLE #ProfiledColumns(
ColumnName varchar(128) NOT NULL,
SequenceNumber varchar(10),
NullCount int ,
ModalValue varchar(max),
ModalCount int,
ValueCount int,
MinValue varchar(max),
MaxValue varchar(max),
MaxLength smallint
And this statement:
SET @sqlstring=
'INSERT INTO #ProfiledColumns ' +
'SELECT '''+@ColumnName+''',' +
CAST(@SequenceNumber AS varchar(10)) + ' , ' +
'SUM(CASE WHEN ['+@ColumnName+'] IS NULL THEN 1 ELSE 0 END) ,'+
'(SELECT TOP 1 ['+@ColumnName+'] FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'(SELECT TOP 1 COUNT(*) FROM '+@Cat_Sch_TableName+' GROUP BY ['+@ColumnName+'] ORDER BY COUNT(*) DESC),' +
'COUNT(DISTINCT ['+@ColumnName+']),' +
'CAST(MIN(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(max)),'+
'CAST(MAX(CAST(['+@ColumnName+'] AS sql_variant)) AS VARCHAR(max)),'+
'MAX(LEN(CAST(['+@ColumnName+'] AS VARCHAR(255)))) '+
'FROM ' + @Cat_Sch_TableName -- build query string
February 20, 2013 at 10:49 am
SteveBell (2/20/2013)
Hi Todd,I just tried your script and I like it!
Thanks!
SteveBell (2/20/2013)
A couple of questions:
Uhoh..
SteveBell (2/20/2013)
1. Can it be modified to loop through all tables in a database automatically and show the results?
Yes but not by me yet; you will need to set up an outer loop using a Cursor or a WHILE statement or somesuch. Maybe later...
SteveBell (2/20/2013)
2. For the datatypes that it can't handle (uniqueidentifier, image, text) it generates an exception and stops processing. Is it possible that it could simply skip the columns with these datatypes and produce an output of the remaining columns?
Try adding this bolded line:
WHERE TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+TABLE_NAME=@Cat_Sch_TableName
AND DATA_TYPE NOT IN ('uniqueidentifier','image','text') -- excluded types
February 20, 2013 at 10:52 am
Thanks again Todd,
This is very helpful and much appreciated.
February 22, 2013 at 12:19 pm
Todd,
Thanks for the suggested changes. I’m going to try to get the unsupported data type columns to show up in the output, but not analyze the data in the columns. Hopefully this will allow someone without explicit knowledge of the table to be aware of their presence. I’ll post back if I get it working.
Steve Pirazzi
ONEWARE, Inc.
http://www.ONEWARE.com
February 22, 2013 at 2:01 pm
To suppress Min, Max and Modal Values, a CASE statement or an OUTER JOIN on data_types is needed in the SQL string area. If I had time I would do it myself.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply