December 2, 2007 at 9:25 am
Hi Guru's,
I have been given a task of improving performance on a relatively busy production server.
Does anyone have any good literature/articles on this topic that is freely available.?
I have run DTA against the DB which did not return any improvements, so I will assume, for now, that the index strategy is suitable
Basically what i want to try and do is improve performance by reducing the number reads a query uses by writing the TSOL - hopefully this will help but i wanted some guidelines on coding best practice, what type of clauses are effiecnt for WHERE etc.
Some of the 'poor' queries at the moment can have over 100,000 reads. I am hoping that re-writing the query will reduce this number (perhaps not) but i want to be able to identify code that looks like it is 'bad' andtry to improve it
Any suggestion welcome.
Gethyn Elliswww.gethynellis.com
December 2, 2007 at 12:42 pm
What I'd first recommend is that you get a list of queries that are run often and then search through those to see if you can improve their performance. You can run profiler, grab the list of queries and group them to see which is called most often.
If doesn't make sense to improve the performance of a query run once by 1000% if you can improve another 20% that's run 1000 times a day.
From there, search for links, maybe check the one below, and post more questions here:
December 2, 2007 at 7:29 pm
Hi,
I will do the following:
1- I will set a testing environment.
2- Use the profiler/MDVs (sys.dm_exec_query_stats, sys.dm_exec_sql_text) to check queries that take long time to execute.
3- Check the execustion plans to detect where queries cost lot to run.
4- Use Database Engine Tuning Adviser to get recommandation about creating/droping indexes.
5- Avoid recompilation in stored procedures.
6- Avoid to name stored procedure with the prefix sp_
7- Set AUTO_UPDATE_STATISTICS to on
Check the following links for more info:
SQL Server Query Execution Plan Analysis
http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
SQL Server Performance: Query Tuning vs. Process Tuning
http://www.sql-server-performance.com/articles/per/query_process_tuning_p1.aspx
Slow running queries - Blocking and Index problems
http://sqlserver-qa.net/blogs/perftune/archive/2007/09/24/2183.aspx
SQL Server UPDATE STATISTICS
http://www.sql-server-performance.com/tips/update_statistics_p1.aspx
Stored Proc recompile
http://sqlserver-qa.net/blogs/perftune/archive/2007/10/25/2549.aspx
http://www.sql-server-performance.com/articles/per/optimizing_sp_recompiles_p3.aspx
TSQL to detect long running queries against the database
http://sqlserver-qa.net/blogs/t-sql/archive/2007/08/29/1780.aspx
Troubleshooting Performance Problems in SQL Server 2005
http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx
Identifying top 20 most expensive queries in terms of read I/O
http://sql-server-performance.com/Community/blogs/satya/archive/2007/11/19/136522.aspx
SQL Server 2005 Performance Tuning
http://furrukhbaig.wordpress.com/?s=stored+proc&searchbutton=go%21
Regards,
Ahmed
December 3, 2007 at 9:33 am
The best thing you can do is to hire a professional to give your server/applications/databases a performance review. Amazing return on investment - no only in the performance of your systems but also in your knowledge of how to keep things running efficiently going forward.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 4, 2007 at 4:46 pm
In solving performance problems, I always go look at the code in the query or stored proc. I haven't found one yet where the cause wasn't just a poorly written query. For example, one was "where column like value " and the column contained strings 300 chars long. In another one, the query joined on a generic view that joined 15 tables of several hundred thousand rows apiece, when he only needed one or two columns from the view and could have gone after them directly. Look, and you WILL often find silly stuff like this.
December 4, 2007 at 5:44 pm
Heh...
Step 1... search SysComments for the word "Cursor".
Step 2... search SysComments for the word WHILE
Step 3... search SysComments for the occurance of any view.
Step 4... review all User Defined Functions
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2007 at 7:42 pm
Oh, one thing I forgot to mention. Just because DTA doesn't suggest any improvements, do NOT assume "that the index strategy is suitable". I personally have very little faith in DTA because I have seen horrible things come out of it.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 11, 2007 at 7:14 am
I fully agree that you should first identify the most called/heaviest queries in your database and concentrate mostly on optimising those. Your objectives can be any combination of: less cpu-load, less IO or improved concurrency, it just depends on where the bottleneck is for your application(s). Each of these objectives may need their own adjustments. But in any case you need to have a look at the T-SQL only after you've identified your poor performers first.
Of course it's best to always write your code with performance as one of your objectives. For this purpose I've compiled a non-extensive list with hints on how to get well performing T-SQL. The list was originally written with MS SQL 2000 in mind, though I'm sure most of this still holds for 2005. I think this list will help you too improving performance of your poor performers. I would like to invite more people to share their tips too and/or give comments on mine.
[h1]The list[/h1]
[h3]Reduce the number of columns returned to a minimum[/h3]
The less data needs to be retrieved, the more optimal the queries will be. Even better, try to reduce the columns fetched (not just the columns in the select clause add to this, also the where and join criteria must be included) to just columns from exactly one index. If a (partial) query can be answered by just reading an index, that will save the additional IO for reading the entire records from the table itself.
[h3]Avoid OR like the plague[/h3]
Very often OR results in index or even table scans. Scans are bad for performance. Different approaches can be used to avoid using OR, ranging from using 'union all' and the use of nullif, isnull and such in join queries.
[h3]Specify table qualifiers[/h3]
Specifying dbo. for all tables in your queries will simplify the job for the query interpretator, there by speeding up creating the query plan for your queries. More importantly: in future versions of MS SQL specifying the qualifier will be mandatory.
[h3]How to compare a mandatory column with an optional column[/h3]
IMPORTANT NOTE: the original hint -shown below in italics, (I can't find strike-through)- does NOT improve performance, in fact it even it hurts performance as Scott pointed out. The where clause as I've presented it below hides the fact that an 'or' is in the conditions and even making it worse by applying nullif. The query will result in a table scan because the arguments get obscured for the optimizer by the nullif operator. A next attempt for an example to compare a mandatory column with an optional column, please -anyone- correct me if I'm wrong again 😉
select 1
from dbo.Table t
where t.mandatorycol = t.optionalcol
DO NOT FOLLOW THIS ORIGINAL HINT:
Optional columns may contain null values, comparing null with any value may result in unexpected results. To properly compare optional columns you would need to use an OR operator to check for either of the operands being null plus a = (or <> ) operator to check for equal (or non-equal) values. By using the nullif operator this sort of comparison can be implemented more easily. The nullif operator returns null if both operands have the same value and it returns the value of the first operand if they are not the same. If either of the columns to compare is mandatory, to compare the values it is sufficient to use a single nullif operator.
For example, to check for equal values in a mandatory column and an optional column:
select 1
from dbo.Table t
where nullif( t.mandatorycol, t.optionalcol) is null
[h3]How to compare 2 optional columns[/h3]
If both columns are optional, the nullif operator will return null if the values are the same, but it will also return null if the values are not the same and the first operand is null. You can combine the coalesce operator with the nullif operator to check 2 optional columns for (in)equality. Note that by reversing the both operands in the second nullif, either of the both nullifs will return a non-null value if exactly 1 of the both columns is null.
select 1
from dbo.Table t
where isnull( nullif( t.optionalcol1, t.optionalcol2), nullif( t.optionalcol2, t.optionalcol1)) is null
[h3]Reduce the amount of rows to work on a.s.a.p.[/h3]
This is a very generic rule that can be applied in many ways. Examples follow. (sub queries, join on sub queries, selection criteria as join criteria, etc). Reducing the number of rows to work on helps us in many ways: less IO, less locks and less memory.
[h3]Check for existence[/h3]
select 1 outperforms select * in many situations and select top 1 1 often performs even better because it allows the query optimiser to reduce the amount of data as soon as possible.
Example:
if exists (
select top 1 1
from aTable
)
begin
...
[h3] Check for existence(2)[/h3]
If you need to find rows in table a for which at least 1 row exists in table b, you can use an inner join combined with a sub select on table b grouping table b by the joining key. This of course works best if an index exists on aTableB.id.
Example:
select a.col1
from aTableA a
inner join (
select
b.id
from aTableB b
group by
b.id
) b on (b.id = a.id)
note: Using inner join to find matching rows performs best if there is a high percentage of matches. If your data in table a is likely to have only a small number of matches in table b it is better to use "exists":
select a.col1
from aTableA a
where exists (
select top 1 *
from aTableB b
where b.id = a.id
)
[h3]Check for non-existence[/h3]
If you need to find rows in table a for which no row exists in table b, you can use a left outer join combined with a where clause checking that b's joined column is null. Of course this only works for mandatory columns in table b. If the join key is a combined key, you only need to check for one of these to be null. You should always look at the query plan and use the first column in the underlying index to check for null. This of course works best if an index exists on aTableB.id.
Example:
select a.col1
from aTableA a
left outer join aTableB b on (b.id = a.id)
where b.id is null -- not exists
note: Using left outer join to find non-matching rows performs best if there is a high percentage of matches. If your data in table a is likely to have only a small number of matches in table b it is better to use "not exists":
select a.col1
from aTableA a
where not exists (
select top 1 *
from aTableB b
where b.id = a.id
)
[h3]Check for duplicates[/h3]
It is sufficient to check for the existence of just one duplicate instead of looking for all duplicates. i.e. checks for duplicates can often be implemented as a check for existence of 2 different rows with identical properties. Never use select count(1) having count(1) > 0
instead use something like:
if exists (
select top 1 1
from aTable a
inner join aTable b on (b.someproperty = a.someproperty and b.keyproperty <> a.keyproperty)
)
begin
...
[h3]Group by in sub query[/h3]
One example of reducing the intermediate amount of data is when grouping data from joined tables. If possible, try to group the data from one table first in a sub query and then join the 2nd table onto the sub qeury's result. The performance is gained mostly because the query optimiser can more easily identify the correct indexes to use. Another important aspect here is that both the query plan for and the intermediate table results of the sub query can be re-used (cached) among more queries/connections, whereas for more monolithic queries the chances of getting a cache hit for either are much less.
select fls.flsJFHillebrandRef, cnt.NumContainers
from dbo.Files fls
left outer join (
select
cnt.flsGUID,
count(1) as NumContainers,
cnt.cntDeleteDate
from dbo.Containers cnt
group by cnt.flsGUID, cnt.cntDeleteDate
) cnt on (cnt.flsGUID = fls.flsGUID and nullif( cnt.cntDeleteDate, fls.flsDeleteDate) is null)
[h3]Use left joins for 'find' queries[/h3]
When a query is potentially used with many different where clauses (typically this sort of queries is used in view definitions, but it may apply to others too) and the search conditions may include columns from joined-in tables, it can be advantageous to specify left joins for most of the join conditions while specifying in the select clause the value from the left-joined table. Performance is gained because only if the value is used as a selection criterium, the left-join will reduce into an inner join, and the left-joined table needs to be searched for the specified value. However if in a call the column is not specified as a selection criterium, the left join needs not be considered when selecting the rows to search through. i.e. the intermediate results are significantly smaller. Of course care must be taken to not turn inner joins that functionally filter unmatched rows into lef-joins.
select cnt.cntJFHillebrandRef, emqt.eqmtID, eqmt.eqmtDescription
from dbo.Containers cnt
left outer join dbo.EquipmentTypes eqmt on (eqmt.eqmtID = cnt.eqmtID)
[h3]Aggregated value is only valid if another field has the same values[/h3]
In some queries you need an aggregated value (sum, avg, count, etc) which is only to be shown if another field in the same row has exactly 1 value. For example order detail summaries are to be summed only if the currency on those details all have the same currency, in all other cases no order total can be calculated. This is easy to implement in one query:
select
odtl.ordGUID,
odtl.odtlDeleteDate,
case
when isnull( nullif( max(odtl.crncCode), min(odtl.crncCode)),
nullif(min(odtl.crncCode), max(odtl.crncCode))) is null
then max(odtl.crncCode)
else null
end as crncCode,
case
when isnull( nullif( max(odtl.crncCode), min(odtl.crncCode)),
nullif(min(odtl.crncCode), max(odtl.crncCode))) is null
then sum(odtl.odtlProductValue)
else null
end as odtlProductValue
from dbo.OrderDetails odtl
group by odtl.ordGUID, odtl.odtlDeleteDate
[h3]No type conversions[/h3]
If a query requires 2 values to be compared, try to keep the both values of the same data type. If a conversion is required before the values can be matched, this will usually result in a index or even a table scan. If a type conversion can not be avoided, put in an explicit type conversion on the value that is expected to deliver the least rows.
For example: LookupValues.luvValue is of type varchar, EquipmentTypes.eqmtGUID is of type uniqueidentifier. LookupValues has many more rows than EquipmentTypes, i.e. we should convert eqmt.eqmtGUID:
select 1
from dbo.LookupValues luv
left outer join dbo.EquipmentTypes eqmt on (convert(varchar(36), eqmt.eqmtGUID) = luv.luvValue)
where eqmt.eqmtGUID is null -- not exists in eqmt
and luv.ifgrCode = 'FLOW'
and luv.luenCode = 'eqmt'
[h3]Use a temporary configuration table[/h3]
Sometimes you need a specific set of decisions taken depending on the data found in a table. Such decisions can be implemented using if constructions in a procedure. However these if constructions usually lead to several queries being fired. If there is only a limited set of decisions to be taken you can also opt to declare a memory table and fill that with appropriate data so that join-ing your data with that configuration table will result in the correct decisions in one query.
An example from a report procedure where 4 copies were to be printed if the package volume was pallet and 1 copy for boxes:
declare @tblConfig (
pckvID T_KEYSEQUENCE not null,
nCopies int not null,
primary key (pckvID)
)
insert @tblConfig( pckvID, nCopies)
select 'PL', 4
union all
select 'BX', 1
select odtl.prdCode, t.nCopies
from dbo.OrderDetails odtl
inner join @tblConfig t on (t.pckvID = odtl.pckvID)
[h3]Checks before insert, delete and update[/h3]
Insert, delete and update are very 'expensive' operations compared to select statements. They usually involve a lot of IO and in a multi user environment they will lock down certain data for at best a short period of time. If in a procedure or trigger inserts, deletes or udpates are more often not fired than that they are fired, you should check whether the statement will have any effect on beforehand by adding an if-exists-statement around the statement. The existence check is much faster and does not put as many locks or as high a level of locks on the data as the statement itself.
if exists (
select top 1 1
from Inserted i
left outer join dbo.Table tbl on (tbl.key = i.key)
where tbl.key is null -- not exists in tbl
)
begin
insert dbo.Table( key, col2)
select i.key, i.col2
from Inserted i
left outer join dbo.Table tbl on (tbl.key = i.key)
where tbl.key is null -- not exists in tbl
end
if exists (
select top 1 1
from Inserted i
inner join dbo.Table tbl on (tbl.key = i.key)
where nullif( tbl.col2, i.col2) is not null
)
begin
update tbl
set
col2 = i.col2
from Inserted i
inner join dbo.Table tbl on (tbl.key = i.key)
where nullif( tbl.col2, i.col2) is not null
end
[h3]Optional joined table pairs[/h3]
Sometimes you need a certain row from table A only when there is a matching row in table B and you need neither of them if there is no such match. you need to specify that the inner join needs to be evaluated before the left outer join on the pair. To do this is actually just a matter of putting additional parenthesis '(' in your query, however syntactically this can be challenging and so often, this is implemented as 2 left joins plus an additional where clause testing that both are non-null. This however requires the query engine to evaluate the both tables without the additional join condition first, to later on filter out all unmatched rows. i.e. a lot of unneeded IO and intermediate storage is involved. Here's an example that shows the syntax and performs much better:
select top 10 fls.flsJFHillebrandRef, ctpr.ctprFullName as EHOContact
from dbo.Files fls
left outer join (dbo.Users usr
inner join dbo.ContactPersons ctpr on (ctpr.cmpnGUID = usr.cmpnGUID and ctpr.ctprGUID = usr.ctprGUID)
) on (usr.usrGUID = fls.usrGUIDEHO)
[h3]use multiple columns from the same table[/h3]
If you need to retrieve multiple columns from a single table you can use cross apply to avoid having to query the same table twice and union-ing the results into one table. Using cross apply the table or it's index will only be scanned once instead of twice, resulting in less IO and shorter lock periods.
select cmpnX.cmpnGUID
from dbo.Files fls
cross apply (
select fls.cmpnGUIDSupplier as cmpnGUID
union all select fls.cmpnGUIDCustomer
) cmpnX
Always use a "union all" for best performance, only if you need unique values returned use the default union. This method can be further tuned by adding where clauses to the cross applied select statements.
[h3]Transactions must be kept short[/h3]
Transactions are important to avoid race conditions. However while inside a transaction you are cummulating all locks along the way, thereby possibly blocking other users from accessing the objects you've touched upon until you've finished your transaction. i.e. other users needing to access (even just reading depending on the locks applied) need to wait until your transaction is finished. You should take great care to optimise the actions inside any explicit transaction. i.e. collect as much as possible data before beginning the transaction and only do the actual updates/inserts/deletes inside the transaction.
[h3]isnull or coalesce[/h3]
Both can be used to avoid or's when handling null values. coalesce can take any number of parameters, isnull only takes exactly 2 parameters. Due to this isnull is slightly faster than coalesce. So if you need more than 2 values compared use coalesce, otherwise use isnull.
[h3]with (nolock)[/h3]
Adding the with (nolock) clause can give considerable performance increase. It makes the queries results unreliable though. with (nolock) should therefore be used with care. I deliberatly do not give an example here as I personally feel with (nolock) should be avoided.
December 11, 2007 at 7:38 am
Nice list! Want to write us an article on this?
Steve
December 11, 2007 at 10:42 am
In the example for comparing a mandatory and an optional column,
select 1
from dbo.Table t
where nullif( t.mandatorycol, t.optionalcol) is null
Am I missing something, or will "nullif( t.mandatorycol, t.optionalcol) is null" produce the same result as "t.mandatorycol = t.optionalcol"? Except for the fact that it is non-SARGable and will force a table scan?
If the arguments are reversed, "nullif( t.optionalcol, t.mandatorycol) is null" is equivalent to "t.mandatorycol = t.optionalcol or t.optionalcol is null". But the NULLIF will still force a table scan so other than being a little shorter to type I don't see where the performance benefit comes from.
December 11, 2007 at 1:27 pm
R.P.Rozema thanks very much for the response, this is exactly what I was after. I kind of new the gist of it but having a "list" to refer to will definately help me, so thank you for that.
You should take Steve up on his offer, it wouldn't take much to turn this post into article then all subscribers to this site can benefit from your work.
Again, many thanks for the detailed response.
Gethyn Elliswww.gethynellis.com
December 12, 2007 at 1:05 am
Scott Coleman (12/11/2007)
In the example for comparing a mandatory and an optional column,select 1
from dbo.Table t
where nullif( t.mandatorycol, t.optionalcol) is null
Am I missing something, or will "nullif( t.mandatorycol, t.optionalcol) is null" produce the same result as "t.mandatorycol = t.optionalcol"? Except for the fact that it is non-SARGable and will force a table scan?
If the arguments are reversed, "nullif( t.optionalcol, t.mandatorycol) is null" is equivalent to "t.mandatorycol = t.optionalcol or t.optionalcol is null". But the NULLIF will still force a table scan so other than being a little shorter to type I don't see where the performance benefit comes from.
Hi Scot, you are right, this doesn't help much in performance, though I don't see it hurt performance either like you suggested. Looking at it now, I maybe should have removed this suggestion from the list but this one has slipped through. This construct was included in my original list more to help less experienced T-SQL programmers avoiding the situation where the optional column is null and thus "t.mandatory = t.optionalcol" could result in true for any value in the mandatory column. Reversing the arguments for nullif breaks this intended logic.
December 12, 2007 at 5:33 am
Hey .....................
Well this link is really useful it tells you about
1) Worst performing queries
2) Whether the index is useful or not
Just check it out .... hoping it helps 🙂 🙂 ... All the best ....
December 12, 2007 at 5:42 am
Hi Megha,
Sorry can't see a link
Gethyn Elliswww.gethynellis.com
December 12, 2007 at 5:47 am
I am sorry ................. Missed it 🙁
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply