One of the developers that I work with asked me to write a “brief” (really brief) guide on database design. What I have written below is a summary in bullet point form of best practices when creating databases, tables, indexes, and then querying the data within. It is designed to be short and sweet so the reader can check each section quickly when creating a new database. References are provided so that the reader can use if they wish to explore a certain topic in more detail.
Let me know what you think! Is there anything glaringly obvious that I’ve missed? My email is dbafromthecold@gmail.com
Database Design Guidelines
Database Design
- PRIMARY filegroup should be reserved for system objects
- Secondary DATA filegroup should be set to default.
- Minimum of two physical files mapped to each secondary filegroups
- Data and Log files presized for a minimum of one year’s data inserts
- Data and Log files should reside on separate drives
- Denormalisation should only be considered in special circumstances. A properly normalised OLTP database will outperform a denormalised one
- Recommended collation is Latin1_General_CI_AS
Table Design
- Each table created within the database should have a Primary Key and a Clustered Index. These need not be the same column
- Deletion from Heaps may not release space taken by table http://support.microsoft.com/kb/913399
- GUID fields should not be used for clustered indexes even if used as table’s Primary Key
- Article on why GUIDs should not be used for clustered indexes http://www.sqlskills.com/blogs/paul/clustered-or-nonclustered-index-on-a-random-guid/
- NEWSEQUENTIALID() should be used to generate GUID fields
- Composite keys should be avoided. Surrogate INT IDENTITY keys can be used as a replacement
- Columns set to the smallest size possible – avoiding NVARCHAR(MAX), TEXT etc datatypes
- Avoid Nullable columns. Nullable columns require extra space within the database to designate the fact that they are nullable
- Use referential integrity – Foreign keys and unique constraints should be applied
MSDN CREATE TABLE Documentation – http://technet.microsoft.com/en-us/library/ms174979.aspx
Querying Data
- Stored procedures should be used as much as possible, to allow query execution plans to be re-used
- Do not use SELECT * – SELECT only the columns needed to keep the query’s memory usage as low as possible
- As a standard, cursors should be avoided. Removal of RBAR (row-by-agonising-row) processing
- SET NOCOUNT ON should be at the start of each SQL batch to reduce network traffic
- MSDN SET NOCOUNT ON Documentation – http://technet.microsoft.com/en-us/library/ms189837.aspx
- Dynamic SQL should be executed using sp_executesql.
- This stored procedure generates execution plans that are more likely to be reused.
- MSDN EXEC sp_executesql Documentation http://technet.microsoft.com/en-us/library/ms175170(v=SQL.105).aspx
- Do not repeatedly call functions within stored procedures, functions, batches and triggers.
- Sargability (Search ARGument Able) – Confirm queries executed are able to seek on indexes in database
- Avoid wildcard characters at the beginning of a word while searching using the LIKE keyword
- Avoid searching using not equals operators (<> and NOT)
- Avoid functions in WHERE clause on table columns
- Avoid implicit conversions in WHERE clause (use CAST/CONVERT if necessary)
BAD: SELECT ... WHERE Year(StartDate) = 2014 FIXED: SELECT ... WHERE StartDate >= '01-01-2014' AND StartDate < '01-01-2015' BAD: SELECT ... WHERE isNull(Surname,'Williams') = 'Williams' FIXED: SELECT ... WHERE ((Surname = 'Williams') OR (Surname IS NULL)) BAD: SELECT ... WHERE LEFT(FirstName,1) = 'P' FIXED: SELECT ... WHERE FirstName LIKE 'P%' BAD: SELECT ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30 FIXED: SELECT ... WHERE OrderDate < DateAdd(mm,-30,GetDate())
Sargability References
Stack Overflow – What makes a SQL Statement sargable? http://stackoverflow.com/questions/799584/what-makes-a-sql-statement-sargable
Rob Farley – Sargable Functions http://msmvps.com/blogs/robfarley/archive/2010/01/22/sargable-functions-in-sql-server.aspx
Brent Ozar – Why %string% is slow http://www.brentozar.com/archive/2010/06/sargable-why-string-is-slow/
Indexes
- Naming convention – Prefix_TableName_ColumnName
- Prefix – PK for Primary Keys, IX for non-clustered indexes, UC for Unique Indexes
- Make sure the option ONLINE = ON is in place
- Correct nonclustered indexes created for incoming queries – avoid RID and KEY lookups on tables
- No Indexes on columns like “Gender” as this would only have a maximum of three possible entries (Male, Female & NULL) making an index ineffective. Nonclustered indexes must be as selective as possible