Standards and best practices are like flu shots you take before you’re infected; Database best practices protect your databases from bad things. But, we all make mistakes. It could be because we’re on a time crunch, or we’re lazy (which I’m guilty of by the way), or maybe it’s part of being a developer.
Common mistakes include: tables without a primary key, column name problems, missing foreign keys, etc., This is where I love LTD’s very own SQLCop. I can quickly go on with my database development and rely on SQLCop to detect the issues. It saves time and ensures that database standards are met.
However, there are some issues explicit to data warehouses that SQLCop doesn’t look for. I list those issues below and provide scripts to detect them. I use these scripts in conjunction with SQLCop.
Detect tables in a data warehouse that aren’t prefixed with either Dim or Fact:
Tables in a warehouse are generally prefixed with Dim and Fact for dimensions and fact respectively, to easily distinguish them.
SELECT [schema_name] = s.name , table_name = t.name FROM sys.tables t INNER JOIN sys.schemas s ON s.schema_id = t.schema_id WHERE t.name NOT LIKE 'Dim%' AND t.name NOT LIKE 'Fact%' AND t.TYPE = 'U';
Find tables in a data warehouse that don’t have a primary key:
Like in OLTP databases, all tables in a data warehouse also should have a primary key defined.
SELECT schema_name = SCHEMA_NAME(schema_id) , table_name = name FROM sys.tables WHERE OBJECTPROPERTY(OBJECT_ID, 'TableHasPrimaryKey') = 0 ORDER BY SCHEMA_NAME(schema_id) , name;
Detect dimension tables with a composite primary key:
A composite primary key on a dimension table causes degraded performance. It is best to create a single column primary key.
SELECT c.TABLE_NAME , COUNT(*) FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk , INFORMATION_SCHEMA.KEY_COLUMN_USAGE c WHERE CONSTRAINT_TYPE = 'PRIMARY KEY' AND c.TABLE_NAME = pk.TABLE_NAME AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME AND c.TABLE_NAME LIKE 'Dim%' GROUP BY c.TABLE_NAME HAVING COUNT(*) > 1
Detect dimension tables that don’t have Identity column as a primary key:
Usually, surrogate key is made the primary key of the dimension table. Surrogate key is an auto generated Identity value.
SELECT dim_table = t.name , primary_key = c.name , c.is_identity FROM sys.tables t INNER JOIN sys.key_constraints kc ON t.OBJECT_ID = kc.parent_object_id INNER JOIN sys.indexes i ON i.OBJECT_ID = kc.parent_object_id AND i.type_desc = 'CLUSTERED' INNER JOIN sys.index_columns ic ON ic.OBJECT_ID = kc.parent_object_id AND ic.index_id = 1 INNER JOIN sys.columns c ON c.OBJECT_ID = t.OBJECT_ID AND c.column_id = ic.column_id WHERE t.TYPE = 'U' AND t.name LIKE 'Dim%' AND kc.type_desc = 'PRIMARY_KEY_CONSTRAINT' AND c.is_identity = 0
Detect primary keys that don’t follow the naming convention:
SELECT dim_table = t.name , primary_key = c.name FROM sys.tables t INNER JOIN sys.key_constraints kc ON t.OBJECT_ID = kc.parent_object_id INNER JOIN sys.indexes i ON i.OBJECT_ID = kc.parent_object_id AND i.type_desc = 'CLUSTERED' INNER JOIN sys.index_columns ic ON ic.OBJECT_ID = kc.parent_object_id AND ic.index_id = 1 INNER JOIN sys.columns c ON c.OBJECT_ID = t.OBJECT_ID AND c.column_id = ic.column_id WHERE t.TYPE = 'U' AND t.name LIKE 'Dim%' AND kc.type_desc = 'PRIMARY_KEY_CONSTRAINT' AND c.name <> REPLACE(t.name, 'Dim', '') + 'Key'
Detect fact tables that have no foreign keys:
Without a foreign key, a fact table isn’t really a fact table.
SELECT table_name = t.name , fk_count = COUNT(*) FROM sys.tables t INNER JOIN sys.foreign_keys fk ON t.OBJECT_ID = fk.parent_object_id WHERE t.name LIKE 'fact%' GROUP BY t.name HAVING COUNT(*) < 1
Detect fact tables that have foreign key(s) to another fact table:
It’s unlikely to have a fact table related to another fact table.
SELECT foreign_key = fk.name , child_table = t.name , parent_name = rt.name FROM sys.foreign_keys fk INNER JOIN sys.tables rt ON rt.object_id = fk.referenced_object_id INNER JOIN sys.tables t ON t.object_id = fk.parent_object_id WHERE rt.name LIKE 'Fact%'
Detect missing foreign key(s) in fact tables – Columns suffixed with Key, but don’t have foreign key constraint:
I stole the following query from here posted by George Mastros, and replaced ID with Key to use it for data warehouse scenario.
SELECT C.TABLE_SCHEMA,C.TABLE_NAME,C.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS C INNER Join INFORMATION_SCHEMA.TABLES T ON C.TABLE_NAME = T.TABLE_NAME And T.TABLE_TYPE = 'Base Table' AND T.TABLE_SCHEMA = C.TABLE_SCHEMA LEFT Join INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE U ON C.TABLE_NAME = U.TABLE_NAME And C.COLUMN_NAME = U.COLUMN_NAME And U.TABLE_SCHEMA = C.TABLE_SCHEMA WHERE U.COLUMN_NAME IS Null And C.COLUMN_NAME Like '%Key' ORDER BY C.TABLE_SCHEMA, C.TABLE_NAME, C.COLUMN_NAME
Results of above queries aren’t always issues. They are just rare, you’ve to look at them closely and make sure there is a reason for each choice. Also, you may use different naming conventions that make these queries void. In that case, I hope you’re able to alter them to your needs.
Follow me on Twitter! @SamuelVanga