January 17, 2009 at 12:33 pm
Comments posted to this topic are about the item Spring Clean Your Database Schema
January 19, 2009 at 6:08 am
Since this only works in SQL 2005 (due to the "OVER" clause on the "COUNT(*)" statement), you could replace the temporary table with a CTE.
Alternatively, for SQL 2000, you could use a nested query, and order by "COUNT(COLUMN_NAME) DESC" instead.
January 19, 2009 at 6:37 am
Half the problem here is that SQL isn't very good at relational 'domains' i.e. a standard definition of a 'type' of column, with predefinied rules (constraints), datatype, valid values etc.
Also missing is the ability to abstract types i.e. when defining parameters in t/sql you can't define them based on a domain, or even an existing column (in ORACLE this is done with the '%' operator e.g. title book.title%TYPE
)
Finally, the lack of a 'record' datatype also makes life difficult, as you can't easily define a full set of local variables that map the types of a row in a table, resulting ineven MORE duplicate definitions e.g:
--define a PL/SQL table containing entries of type book_rec:
Type book_rec_tab IS TABLE OF book_rec%TYPE
INDEX BY BINARY_INTEGER;
Personally it's about time this stuff arrived! It would do more for consistency in typing than any amount of tools.
However, this is a useful stopgap tool - many thanks to the author!
January 19, 2009 at 7:09 am
To make this work in SQL 2000
Change the first Query to:
[Code]
-- Calculate prevalence of column name
SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.total) as [%]
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
CROSS JOIN
(SELECT Count(*) as total FROM INFORMATION_SCHEMA.COLUMNS) t
GROUP BY COLUMN_NAME, t.total
-- Do the columns differ on datatype across the schemas and tables?
[/code]
--
JimFive
January 19, 2009 at 8:06 am
Ha! Just last Friday, I discovered I could not define a foreign key because the datatype was different between two tables. The first thing on my To-Do list for today was to check all the tables to see what other tables might be using the incorrect datatype.
Thanks for writing the query I need for me!
January 19, 2009 at 8:46 am
I decided I wanted the Prevalence to be the percentage of all tables not of all columns, so I used James Goodwin's code and modified it to get the total table count.
SELECT COLUMN_NAME, CONVERT(DECIMAL(12,2), Count(*)* 100.0/t.totalTables) as [%]
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
CROSS JOIN
(SELECT Count(*) as totalTables FROM INFORMATION_SCHEMA.TABLES ) t
GROUP BY COLUMN_NAME, t.totalTables
Select * from #Prevalence gives me useful information, all by itself.
January 19, 2009 at 3:00 pm
In SQL Server 2008 in the first statement (SET TRANSACTION)
ISOLATIONLEVEL should be
ISOLATION LEVEL
Herman
January 28, 2009 at 12:47 am
Ian,
I only wanted to see the column names that appear more than once, so that I exclude all the columns that are ok.
So I re-wrote your query to:
-- From http://www.sqlservercentral.com/articles/Admin/65138/
/*----------------------------------------------------------------------
Purpose: Identify columns having different datatypes, for the same column name.
Sorted by the prevalence of the mismatched column.
------------------------------------------------------------------------
Revision History:
06/01/2008 Ian_Stirk@yahoo.com Initial version.
2009-01-28 hsp@stovi.com; only show the column names that appear more than once
-----------------------------------------------------------------------*/
-- Do not lock anything, and do not get held up by any locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRY
DROP TABLE #Prevalence
END TRY
BEGIN CATCH
END CATCH
BEGIN TRY
DROP TABLE #FieldList
END TRY
BEGIN CATCH
END CATCH
-- Calculate prevalence of column name
SELECT
COLUMN_NAME
,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
GROUP BY COLUMN_NAME
-- Do the columns differ on datatype across the schemas and tables?
SELECT DISTINCT
C1.COLUMN_NAME
, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME AS TableName
, C1.DATA_TYPE
, C1.CHARACTER_MAXIMUM_LENGTH
, C1.NUMERIC_PRECISION
, C1.NUMERIC_SCALE
, [%]
INTO #FieldList
FROM INFORMATION_SCHEMA.COLUMNS C1
INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME
INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME
WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)
OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)
OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)
OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))
ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA + '.' + C1.TABLE_NAME
-- Tidy up.
DROP TABLE #Prevalence
--SELECT * FROM #fieldList ORDER BY column_name
;
WITH cte (column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale, UsedInTables, rownum) as
(
SELECT column_name
, MAX(data_type) AS Data_type, MAX(Character_maximum_length) AS Character_maximum_length, MAX(Numeric_precision) AS Numeric_precision
, MAX(Numeric_Scale) AS Numeric_Scale
, (SELECT TableName + ', ' FROM #fieldList F1
WHERE f1.Column_name= F.Column_name
AND (f1.Data_type = F.Data_Type OR F.Data_Type IS NULL)
AND (f1.Character_maximum_length = f.Character_maximum_length OR f.Character_maximum_length IS NULL)
AND (f1.Numeric_precision = f.Numeric_precision OR f.Numeric_precision IS NULL)
AND (f1.Numeric_Scale = f.Numeric_Scale OR f.Numeric_Scale IS NULL)
for xml path('')) AS UsedInTables
, ROW_NUMBER() OVER ( PARTITION BY column_name ORDER BY data_type, Character_maximum_length) AS rownum
FROM #fieldList F
WHERE
f.tablename NOT LIKE 'tmp%'
GROUP BY column_name, Data_type, Character_maximum_length, Numeric_precision, Numeric_Scale
)
SELECT c1.* FROM cte c1
INNER JOIN cte c2 ON c1.column_name = c2.Column_name
WHERE c2.rownum=2
Thank you very much for your initial script.
Best regards,
Henrik Staun Poulsen
Stovi Software
March 11, 2014 at 7:11 am
A bit late but I made the following updates which were useful to me -- flag table versus view and identify if a field is an identity field. Maybe someone else will as well...
/*----------------------------------------------------------------------
Purpose: Identify columns having different datatypes, for the same column name.
Sorted by the prevalence of the mismatched column.
------------------------------------------------------------------------
Revision History:
06/01/2008 Ian_Stirk@yahoo.com Initial version.
03/11/2014Martin McDonald -- Modified to add is_table and is_identity information
-----------------------------------------------------------------------*/
-- Do not lock anything, and do not get held up by any locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
-- Calculate prevalence of column name
SELECT
COLUMN_NAME
,[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
GROUP BY COLUMN_NAME
-- Do the columns differ on datatype across the schemas and tables?
SELECT DISTINCT
C1.COLUMN_NAME
, C1.TABLE_SCHEMA
, C1.TABLE_NAME
, case
when t.name is null then 'View'
else 'Table'
end as is_table
, c.is_identity
, C1.DATA_TYPE
, C1.CHARACTER_MAXIMUM_LENGTH
, C1.NUMERIC_PRECISION
, C1.NUMERIC_SCALE
, [%]
FROM INFORMATION_SCHEMA.COLUMNS C1
INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME
INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME
left outer join sys.tables t on c1.TABLE_NAME = t.name
left outer join sys.columns c on t.object_id = c.object_id
and c1.COLUMN_NAME = c.name
WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)
OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)
OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)
OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))
ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA, C1.TABLE_NAME
-- Tidy up.
DROP TABLE #Prevalence
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply