September 21, 2012 at 3:15 pm
Hi All,
Let me start by saying that I know that I'm opening up a can of worms here. My company is trying to come up with a reasonable list of best practices related to SQL development in a multi-tier environment. I know there are probably hundreds, possibly thousands, of things that could be in this list, but we're not looking for a Magnum Opus of SQL Best practices. This is going to be given to .Net Software Engineers who are also responsible for varying degrees of SQL development, not DBAs. With that in mind, I have a tentative list and I would appreciate some input on things that are missing from the list or things that you think should be taken off the list.
Each of the items below is just a heading that will be fleshed out with more details before it is handed out. I also know that many of the items have exceptions, some of which will also be covered in the details. The idea is to get these principles in peoples heads so that they think about it and have a good reason before breaking them.
With that said, here's the list
[Edited based on comments]
General
1.Be set-minded, not row-minded, when designing anything in SQL
2.Keep SQL transactions as short as possible
3.Use the right isolation level, and know why you're using it.
4.Avoid the use of dynamic SQL where possible
5.Use sp_executesql instead EXEC('string') whenever writing dynamic SQL
6.When commenting SQL code, use block comments instead of line comments
7.Write SQL keywords, commands, and functions in upper case
8.USE TRY...CATCH instead of @@ERROR for error handling
9.Make error reporting, containment, and recovery part of your design
Table Design
1.Use the smallest data type appropriate to the data you're storing
2.Keep tables narrow
3.Do not blindly normalize everything Those who know why don't need this, those that don't are likely to abuse it.
4.Use VARCHAR(MAX) and NVARCHAR(MAX) instead of TEXT or NTEXT to store large textual data
Indexes
1.Add a clustered index to each table
2.Keep indexes narrow, especially the clustered index
3.Make sure that the columns in the clustered index aren't updated
4.Make clustered indexes monotonically increasing whenever possible
5.Design the column order of indexes around how it will be used
6.Make and use covering indexes where possible
7.Make clustered indexes unique
8.Do not make new or expand existing indexes to speed up a single query without considering the performance impact on the system as a whole
Query Design
1.Explicitly define the columns you want from a SELECT statement instead of using SELECT *.
2.Explicitly define column names in your insert statements
3.Make sure that your JOIN and WHERE clauses are SARGABLE
4.Prefer JOINS and derived tables to correlated sub-queries
5.Avoid the use of negative logic wherever possible (NOT EXISTS, NOT IN ...)
6.Avoid the use of LIKE in a JOIN clause
7.Use Query hints sparingly
7.Do not use Query hints without prior approval
8.Do not use scalar valued functions SELECT statements
9.Do not prefix search arguments in a LIKE clause with wildcards
9.Avoid prefixing search arguments in a LIKE clause with wildcards where possible
10.Do not use DISTINCT to clean up poorly written queries
11.Know the SQL windowing functions and how to use them
Procedure Design
1.Do not use both DML and DDL on temp tables within a stored procedure
1.Do not mix DDL within the DML of a stored procedure
2.Use SET NOCOUNT ON at the beginning of procedures
3.Use table variable for small data sets and temp tables for larger ones
3.Prefer temp tables over table variable unless there is a clear performance or functional advantage
4.Avoid the use of cursors and procedural code
September 21, 2012 at 3:58 pm
1. Add a clustered index to each table
2. Keep indexes narrow, especially the clustered index
3. Make sure that the columns in the clustered index aren't updated
4. Make clustered indexes monotonically increasing whenever possible
5. Design the column order of indexes around how it will be used
6. Make and use covering indexes where possible
>4. Make clustered indexes monotonically increasing whenever possible
Thought I would double check, I'm assuming this clustered index will be used by the application to query data in DB?
Or is the clustered index column added in part to make a record have a unique key(per recommended practices)?
If it's the second one (auto-increment) I believe we could improve utilization of clustered index by:
Picking a unique column (e.g. SSN though not 100% unique)
Picking potentially even a DATE column, if this table might be used to query a lot of date ranges. This could save some IO.
Though as always "it depends" :hehe:
--------------------------------------------------
...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully 😀
September 21, 2012 at 9:55 pm
Recurs1on (9/21/2012)
Query Design ...8.Do not use scalar valued functions SELECT statements
I don't think I understand this. Can you expand?
Recurs1on (9/21/2012)
Query Design ...9.Do not prefix search arguments in a LIKE clause with wildcards
This is not always avoidable. Probably should be phrased like "Try not to..."
Recurs1on (9/21/2012)
Procedure Design1.Do not use both DML and DDL on temp tables within a stored procedure
Really? I can't do this:
CREATE TABLE #Temp(...)
INSERT INTO #Temp ...
--Vadim R.
September 21, 2012 at 10:50 pm
I'd remove the following because anytime you can get a Developer to comment code, you're well ahead of the game.
6. When commenting SQL code, use block comments instead of line comments
The following isn't a bad suggestion but, unless you ccome up with a reassonable standard snippet of code for use in the CATCH block, you're going to have an awful lot of errors that no one can determine the origin of nor even what the error actually is.
8 USE TRY...CATCH instead of @@ERROR for error handling
This is a great idea but has somme nasty caveates depending on which ORM you may be using. For example, if you create a VARCHAR(10) column and pass filters to queries through Linq2Sql, Linq2Sql will convert the filter to NVARCHAR(10) in the code it executes. Since NVARCHAR has a high data precedence than VARCHAR, the column being filtered will first convert to NVARCHAR which destroys any hope of getting an INDEX SEEK out of the code. Depending on how they might be used, I also have a personal dislike for BIT columns. For example, it's sometime much more expedient to SUM a flag column to get a count of rows where the flag is set and that's just not possible with the BIT datatype. I've also see certain types of columns that really should be TINYINT for future expansion (think "bullet proofing" code) that some might not consider at design time.
1. Use the smallest data type appropriate to the data you're storing
The following two suggestions seem to form a paradox. I'll also just about guarantee that Item 3 will be severely abused.
2. Keep tables narrow
3. Do not blindly normalize everything
They fixed it in 2012 but I recommend that you recommend that MAX datatypes should be avoided unless they are absolutely necessary. I know that goes along with using the smallest datatype possible but using MAX in anything less than 2012 has some real index maintenance issues with rebuilding indexes in an ONLINE fashion.
I'd specifically specify that you should make the clustered index UNIQUE along with the following because it has some very real performance implications.
4. Make clustered indexes monotonically increasing whenever possible
You might want to specify that you shouldn't blindly make indexes because they can really bog down inserts, updates, and deletes.
The following should probably be changed to "DO NOt USE QUERY OR TABLE HINTS WITHOUT PRIOR APPROVAL FROM THE DBA FOR EACH USE!".
7. Use Query hints sparingly
The following is a pretty bad misconception. It truly depends on what is being done for performance purposes. Both temp Tables and Table Variables start out in memory and both overflow to disk if they get too big. Table Variables are also evaluated as having just one row unless you do a recompile after they're loaded and they do not use statistics at all. Further, sectional troubleshooting where non-persistent data tables exist is a real bitch in SSMS. Consdier rewording this to say something like "Use Temp Tables in favor of Table Variables except for when the use of Table Variables provides a substantial increase in performance or the code doesn't work because of the Temp Tables when being called from external sources."
3. Use table variable for small data sets and temp tables for larger ones
The following is a bit confusing. I believe what you mean to say is to not intermingle DDL within the DML of a stored procedure.
1. Do not use both DML and DDL on temp tables within a stored procedure
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2012 at 10:51 pm
Just my 2 cents on this...
Jeff Moden (9/21/2012)
6. When commenting SQL code, use block comments instead of line comments
The following isn't a bad suggestion but, unless you ccome up with a reassonable standard snippet of code for use in the CATCH block, you're going to have an awful lot of errors that no one can determine the origin of nor even what the error actually is.
8 USE TRY...CATCH instead of @@ERROR for error handling
This is a great idea but has somme nasty caveates depending on which ORM you may be using. For example, if you create a VARCHAR(10) column and pass filters to queries through Linq2Sql, Linq2Sql will convert the filter to NVARCHAR(10) in the code it executes. Since NVARCHAR has a high data precedence than VARCHAR, the column being filtered will first convert to NVARCHAR which destroys any hope of getting an INDEX SEEK out of the code. Depending on how they might be used, I also have a personal dislike for BIT columns. For example, it's sometime much more expedient to SUM a flag column to get a count of rows where the flag is set and that's just not possible with the BIT datatype. I've also see certain types of columns that really should be TINYINT for future expansion (think "bullet proofing" code) that some might not consider at design time.
1. Use the smallest data type appropriate to the data you're storing
The following two suggestions seem to form a paradox. I'll also just about guarantee that Item 3 will be severely abused.
2. Keep tables narrow
3. Do not blindly normalize everything
They fixed it in 2012 but I recommend that you recommend that MAX datatypes should be avoided unless they are absolutely necessary. I know that goes along with using the smallest datatype possible but using MAX in anything less than 2012 has some real index maintenance issues with rebuilding indexes in an ONLINE fashion.
I'd specifically specify that you should make the clustered index UNIQUE along with the following because it has some very real performance implications.
4. Make clustered indexes monotonically increasing whenever possible
You might want to specify that you shouldn't blindly make indexes because they can really bog down inserts, updates, and deletes.
The following should probably be changed to "DO NOt USE QUERY OR TABLE HINTS WITHOUT PRIOR APPROVAL FROM THE DBA FOR EACH USE!".
7. Use Query hints sparingly
The following is a pretty bad misconception. It truly depends on what is being done for performance purposes. Both temp Tables and Table Variables start out in memory and both overflow to disk if they get too big. Table Variables are also evaluated as having just one row unless you do a recompile after they're loaded and they do not use statistics at all. Further, sectional troubleshooting where non-persistent data tables exist is a real bitch in SSMS. Consdier rewording this to say something like "Use Temp Tables in favor of Table Variables except for when the use of Table Variables provides a substantial increase in performance or the code doesn't work because of the Temp Tables when being called from external sources."
3. Use table variable for small data sets and temp tables for larger ones
The following is a bit confusing. I believe what you mean to say is to not intermingle DDL within the DML of a stored procedure.
1. Do not use both DML and DDL on temp tables within a stored procedure
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2012 at 12:23 am
5.Avoid the use of negative logic wherever possible (NOT EXISTS, NOT IN ...)
Why? What would you use instead when you need this form negative logic? When an inexperienced SQL developer sees this line he/she could end up using a LEFT JOIN negative_table with WHERE negative_table .column IS NULL combined with a GROUP BY or DISTINCT which will break one of your other best practices and would usually perform worse.
I would rather explain the difference between NOT EXISTS and NOT IN when the column in NOT IN is nullable.
Gail Shaw has some great articles (as usual) on this subject.
http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/[/url]
http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/[/url]
September 22, 2012 at 9:27 am
Thanks for the comments
rVadim (9/21/2012)
Recurs1on (9/21/2012)
Query Design ...8.Do not use scalar valued functions SELECT statements
I don't think I understand this. Can you expand?
This should probably be restated. What we're after here is keeping people from using scalar valued functions in queries where possible because of the performance hit it takes. We've had a lot of code written that just throws them in there without considering the huge performance hit you can take with the RBAR that is often associated with them.
Recurs1on (9/21/2012)
Procedure Design1.Do not use both DML and DDL on temp tables within a stored procedure
Really? I can't do this:
CREATE TABLE #Temp(...)
INSERT INTO #Temp ...
As someone said later on, this needs to be restated as "Don't intermingle DDL within your DML" so that we don't get recompiles all the time
September 22, 2012 at 9:56 am
Jeff Moden (9/21/2012)
The following isn't a bad suggestion but, unless you ccome up with a reassonable standard snippet of code for use in the CATCH block, you're going to have an awful lot of errors that no one can determine the origin of nor even what the error actually is.
8 USE TRY...CATCH instead of @@ERROR for error handling
We do have one, and we're pretty good about using it. The other thing I forgot to mention is that this document will also be required reading for all new hires, so it's mostly in there for that purpose.
This is a great idea but has somme nasty caveates depending on which ORM you may be using. For example, if you create a VARCHAR(10) column and pass filters to queries through Linq2Sql, Linq2Sql will convert the filter to NVARCHAR(10) in the code it executes. Since NVARCHAR has a high data precedence than VARCHAR, the column being filtered will first convert to NVARCHAR which destroys any hope of getting an INDEX SEEK out of the code. Depending on how they might be used, I also have a personal dislike for BIT columns. For example, it's sometime much more expedient to SUM a flag column to get a count of rows where the flag is set and that's just not possible with the BIT datatype. I've also see certain types of columns that really should be TINYINT for future expansion (think "bullet proofing" code) that some might not consider at design time.
1. Use the smallest data type appropriate to the data you're storing
I didn't know that. We don't use an ORM, but that's some good advice to keep in mind if we ever decide to start. One of our biggest problems right now is the business has mandated we keep data "forever". We all know that eventually, that's not going to work, but right now, the difference between a tinyint and a smallint on some tables is 100's of MBS.
The following two suggestions seem to form a paradox. I'll also just about guarantee that Item 3 will be severely abused.
2. Keep tables narrow
3. Do not blindly normalize everything
They are kind of a paradox. There's often tension between normalizing data and performance. What I'm trying to get at here is that both need to be considered. Can you suggest a better way to express these competing goals?
They fixed it in 2012 but I recommend that you recommend that MAX datatypes should be avoided unless they are absolutely necessary. I know that goes along with using the smallest datatype possible but using MAX in anything less than 2012 has some real index maintenance issues with rebuilding indexes in an ONLINE fashion.
We can't rebuild indexes online because we don't have enterprise SQL. Do you think this is still a problem?
I'd specifically specify that you should make the clustered index UNIQUE along with the following because it has some very real performance implications.
4. Make clustered indexes monotonically increasing whenever possible
Good point.
You might want to specify that you shouldn't blindly make indexes because they can really bog down inserts, updates, and deletes.
Also a good point. We have had problems with this on a few tables before.
The following should probably be changed to "DO NOt USE QUERY OR TABLE HINTS WITHOUT PRIOR APPROVAL FROM THE DBA FOR EACH USE!".
7. Use Query hints sparingly
I like yours better 🙂
The following is a pretty bad misconception. It truly depends on what is being done for performance purposes. Both temp Tables and Table Variables start out in memory and both overflow to disk if they get too big. Table Variables are also evaluated as having just one row unless you do a recompile after they're loaded and they do not use statistics at all. Further, sectional troubleshooting where non-persistent data tables exist is a real bitch in SSMS. Consdier rewording this to say something like "Use Temp Tables in favor of Table Variables except for when the use of Table Variables provides a substantial increase in performance or the code doesn't work because of the Temp Tables when being called from external sources."
3. Use table variable for small data sets and temp tables for larger ones
I've read your's and others posts on table variable vs. temp table and that's exactly I was going after. What about
"Prefer temp tables over table variables unless there is a clear performance advantage."
and we'll flesh out the details in the body of the text that will go with each item?
The following is a bit confusing. I believe what you mean to say is to not intermingle DDL within the DML of a stored procedure.
1. Do not use both DML and DDL on temp tables within a stored procedure
Yup, that's what I was after.
September 22, 2012 at 10:01 am
Recurs1on (9/22/2012)We've had a lot of code written that just throws them in there without considering the huge performance hit you can take with the RBAR that is often associated with them.
Instead of just saying that scalar functions usually should be avoided, show them why. A screenshot from Profiler with thousands of call to the scalar function for the query it is used in is worth a thousand words.
September 22, 2012 at 10:06 am
Nils Gustav Stråbø (9/22/2012)
5.Avoid the use of negative logic wherever possible (NOT EXISTS, NOT IN ...)
Why? What would you use instead when you need this form negative logic? When an inexperienced SQL developer sees this line he/she could end up using a LEFT JOIN negative_table with WHERE negative_table .column IS NULL combined with a GROUP BY or DISTINCT which will break one of your other best practices and would usually perform worse.
I would rather explain the difference between NOT EXISTS and NOT IN when the column in NOT IN is nullable.
Thanks for the articles. I hadn't even thought of that one, but I will add an item for this. The reason this item was here originally was because unless I'm mistaken, negative logic forces a table scan. Am I wrong on that one?
September 22, 2012 at 10:27 am
Nils Gustav Stråbø (9/22/2012)
Instead of just saying that scalar functions usually should be avoided, show them why. A screenshot from Profiler with thousands of call to the scalar function for the query it is used in is worth a thousand words.
We will. These are going to be the headings in a document. Each item will have a paragraph or two of supporting text and examples where appropriate.
September 22, 2012 at 10:38 am
Definately wrong on the part that a NOT EXISTS will force the optimizer to use scans over seeks.
As always it depends on having the correct indexes and cardinality of the operators in the plan.
September 22, 2012 at 10:59 am
Recurs1on (9/22/2012)
The reason this item was here originally was because unless I'm mistaken, negative logic forces a table scan. Am I wrong on that one?
Yes.
'Negative' logic can mean a table scan, so can 'positive' logic. These two are equivalent (assuming no nulls) and are evaluated the same way
WHERE SomeColumn != 137
WHERE SomeColumn > 137 OR SomeColumn < 137
In case I tested, it's an index scan, well, I'm asking for the entire table bar one value, so makes sense to do it that way. If you look at the exec plans, the two queries resolve to exactly the same plan.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 22, 2012 at 11:19 am
GilaMonster (9/22/2012)
In case I tested, it's an index scan, well, I'm asking for the entire table bar one value, so makes sense to do it that way. If you look at the exec plans, the two queries resolve to exactly the same plan.
Is there another way to state this that would get the point across more clearly or do you think it will only muddy the waters? The original intent was to avoid unnecessary table scans because of poorly written queries that use NOT EXISTS, NOT IN or != in cases where it is possible to re-write them using EXISTS, IN, or =.
September 22, 2012 at 11:45 am
I still don't understand why you worry about NOT EXISTS.
Consider the following two scenarios:
1. Give me a list of all our customer that have not placed an order in 2012.
2. Give me a list of all our customers in the Vatican City that have not placed an order in 2012.
The following two queries using NOT EXIST can be used
1.
select c.* from Customers c
where not exists(select * from Orders o where o.CustomerId=c.CustomerId and o.OrderDate>='2012-01-01')
2.
select c.* from Customers c
where not exists(select * from Orders o where o.CustomerId=c.CustomerId and o.OrderDate>='2012-01-01')
and c.CountryCode='VAT'
The first one will most likely use a scan and hash join (anti semi join) because all rows in Customer have to be checked.
The second one will most likely use a nested loop (anti semi join) because of the low density of rows with CountryCode='VAT'.
Gail will correct me if I'm wrong here.
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply