Standards and best practices are like flu shots you take before you're infected; Database best practices protect your databases from bad things. But, we all make mistakes. It could be because we're on a time crunch, or we're lazy (which I'm guilty of by the way), or maybe it's part of being a developer.
Common mistakes include: tables without a primary key, column name problems, missing foreign keys, etc., This is where I love LTD's very own SQLCop. I can quickly go on with my database development and rely on SQLCop to detect the issues. It saves time and ensures that database standards are met.
However, there are some issues explicit to data warehouses that SQLCop doesn't look for. I list those issues below and provide scripts to detect them. I use these scripts in conjunction with SQLCop.
Detect tables in a data warehouse that aren't prefixed with either Dim or Fact:
Tables in a warehouse are generally prefixed with Dim and Fact for dimensions and fact respectively, to easily distinguish them.
- SELECT [schema_name] = s.name ,
- table_name = t.name
- FROM sys.tables t
- INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
- WHERE t.name NOT LIKE 'Dim%'
- AND t.name NOT LIKE 'Fact%'
- AND t.TYPE = 'U';
Find tables in a data warehouse that don't have a primary key:
Like in OLTP databases, all tables in a data warehouse also should have a primary key defined.
- SELECT schema_name = SCHEMA_NAME(schema_id) ,
- table_name = name
- FROM sys.tables
- WHERE OBJECTPROPERTY(OBJECT_ID, 'TableHasPrimaryKey') = 0
- ORDER BY SCHEMA_NAME(schema_id) ,
- name;
Detect dimension tables with a composite primary key:
A composite primary key on a dimension table causes degraded performance. It is best to create a single column primary key.
- SELECT c.TABLE_NAME ,
- COUNT(*)
- FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk ,
- INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
- WHERE CONSTRAINT_TYPE = 'PRIMARY KEY'
- AND c.TABLE_NAME = pk.TABLE_NAME
- AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME
- AND c.TABLE_NAME LIKE 'Dim%'
- GROUP BY c.TABLE_NAME
- HAVING COUNT(*) > 1
Detect dimension tables that don't have Identity column as a primary key:
Usually, surrogate key is made the primary key of the dimension table. Surrogate key is an auto generated Identity value.
- SELECT dim_table = t.name ,
- primary_key = c.name ,
- c.is_identity
- FROM sys.tables t
- INNER JOIN sys.key_constraints kc ON t.OBJECT_ID = kc.parent_object_id
- INNER JOIN sys.indexes i ON i.OBJECT_ID = kc.parent_object_id
- AND i.type_desc = 'CLUSTERED'
- INNER JOIN sys.index_columns ic ON ic.OBJECT_ID = kc.parent_object_id
- AND ic.index_id = 1
- INNER JOIN sys.columns c ON c.OBJECT_ID = t.OBJECT_ID
- AND c.column_id = ic.column_id
- WHERE t.TYPE = 'U'
- AND t.name LIKE 'Dim%'
- AND kc.type_desc = 'PRIMARY_KEY_CONSTRAINT'
- AND c.is_identity = 0
Detect primary keys that don't follow the naming convention:
- SELECT dim_table = t.name ,
- primary_key = c.name
- FROM sys.tables t
- INNER JOIN sys.key_constraints kc ON t.OBJECT_ID = kc.parent_object_id
- INNER JOIN sys.indexes i ON i.OBJECT_ID = kc.parent_object_id
- AND i.type_desc = 'CLUSTERED'
- INNER JOIN sys.index_columns ic ON ic.OBJECT_ID = kc.parent_object_id
- AND ic.index_id = 1
- INNER JOIN sys.columns c ON c.OBJECT_ID = t.OBJECT_ID
- AND c.column_id = ic.column_id
- WHERE t.TYPE = 'U'
- AND t.name LIKE 'Dim%'
- AND kc.type_desc = 'PRIMARY_KEY_CONSTRAINT'
- AND c.name <> REPLACE(t.name, 'Dim', '') + 'Key'
Detect fact tables that have no foreign keys:
Without a foreign key, a fact table isn't really a fact table.
- SELECT table_name = t.name
- , fk_count = COUNT(*)
- FROM sys.tables t
- INNER JOIN
- sys.foreign_keys fk ON t.OBJECT_ID = fk.parent_object_id
- WHERE t.name LIKE 'fact%'
- GROUP BY t.name
- HAVING COUNT(*) < 1
Detect fact tables that have foreign key(s) to another fact table:
It's unlikely to have a fact table related to another fact table.
- SELECT foreign_key = fk.name ,
- child_table = t.name ,
- parent_name = rt.name
- FROM sys.foreign_keys fk
- INNER JOIN sys.tables rt ON rt.object_id = fk.referenced_object_id
- INNER JOIN sys.tables t ON t.object_id = fk.parent_object_id
- WHERE rt.name LIKE 'Fact%'
Detect missing foreign key(s) in fact tables - Columns suffixed with Key, but don't have foreign key constraint:
I stole the following query from here posted by George Mastros, and replaced ID with Key to use it for data warehouse scenario.
- SELECT C.TABLE_SCHEMA,C.TABLE_NAME,C.COLUMN_NAME
- FROM INFORMATION_SCHEMA.COLUMNS C
- INNER Join INFORMATION_SCHEMA.TABLES T
- ON C.TABLE_NAME = T.TABLE_NAME
- And T.TABLE_TYPE = 'Base Table'
- AND T.TABLE_SCHEMA = C.TABLE_SCHEMA
- LEFT Join INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE U
- ON C.TABLE_NAME = U.TABLE_NAME
- And C.COLUMN_NAME = U.COLUMN_NAME
- And U.TABLE_SCHEMA = C.TABLE_SCHEMA
- WHERE U.COLUMN_NAME IS Null
- And C.COLUMN_NAME Like '%Key'
- ORDER BY C.TABLE_SCHEMA, C.TABLE_NAME, C.COLUMN_NAME
Results of above queries aren't always issues. They are just rare, you've to look at them closely and make sure there is a reason for each choice. Also, you may use different naming conventions that make these queries void. In that case, I hope you're able to alter them to your needs.
Follow me on Twitter! @SamuelVanga