April 20, 2010 at 12:59 am
Hi
I have temporary table of 150 employees, and I m pasing date range.
For each employee for each date I have to search data in another table.
But this looping takes around 30 mins if the date range exceeds 4 days.
So how to improve performance. How can i know which part of the stored procedure is consuming more time.
Please post the answer.
Thank you all
April 20, 2010 at 3:28 am
Very simple: avoid looping. Almost every problem can be solved with set based code. You provided very little information, I suggest that you take a look at the article linked in my signature and come back with more info. I'm sure lots of people will be glad to help.
-- Gianluca Sartori
April 20, 2010 at 6:24 am
The way to tell which parts of the procedure are taking more time is to look at the execution plan. You can run the query yourself through a SQL window in Management Studio. Hit CTL-M before you execute it, or click on the icon on the tool bar, and you'll get an execution plan which will show the estimated costs for the various queries within the stored procedure. It will also show the estimated cost of operations within the query. With this information you can track down what's slowing things down.
You can also get an estimated plan by querying Dynamic Management Views, such as sys.dm_exec_query_stats and combine that with sys.dm_exec_query_plan.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 20, 2010 at 6:46 am
Consider below some of the important points
If you have ever performed a SELECT COUNT(*) on a very large table, you know how long it can take.
For example, when I ran the following command on a large table I manage:
SELECT COUNT(*) from <table_name>
It took 1:09 to count 10,725,948 rows in the table. At the same time, SQL Server had to perform a
lot of logical and physical I/O in order to perform the count, chewing up important SQL Server
resources.
A much faster, and more efficient, way of counting rows in a table is to run the following query:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('<table_name>') AND indid < 2
When I run the query against the same table, it takes less than a second to run, and it gave me the
same results. Not a bad improvement and it took virtually no I/O. This is because the row count of your
tables is stored in the sysindexes system table of your database. So instead of counting rows when you
need to, just look up the row count in the sysindexes table.
There is one potential downside to using the sysindexes table. And that this system table is not
updated in real time, so it might underestimate the number of rows you actually have. Assuming you
have the database option turned on to "Auto Create Statistics" and "Auto Update Statistics," the value
you get should be very close to being correct, if not correct. If you can live with a very close estimate,
then this is the best way to count rows in your tables.
Queries that return a single row, are just as fast using a clustered index as a non-clustered index. If you
will be creating an index to speed the retrieval of a single record, you may want to consider making it a
non-clustered index, and saving the clustering index (you can only have one) for queries that return a
range of data.
The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE
clause in the query contains an OR operator and if any of the referenced columns in the OR clause are
not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses,
you will want to ensure that each referenced column in the WHERE clause has an index.
One of the best ways to boost JOIN performance is to limit how many rows need to be JOINed. This is
especially beneficial for the outer table in a JOIN. Only return absolutely only those rows needed to be
JOINed, and no more.
If you perform regular joins between two or more tables in your queries, performance will be
optimized if each of the joined columns has their own indexes. This includes adding indexes to the
columns in each table used to join the tables. Generally speaking, a clustered key is better than a nonclustered
key for optimum JOIN performance.
Avoid joining tables based on columns with few unique values. If columns used for joining aren’t
mostly unique, then the SQL Server optimizer may not be able to use an existing index in order to
speed up the join. Ideally, for best performance, joins should be done on columns that have unique
indexes.
If you have the choice of using a JOIN or a subquery to perform the same task, generally the JOIN
(often an OUTER JOIN) is faster. But this is not always the case. For example, if the returned data is
going to be small, or if there are no indexes on the joined columns, then a subquery may indeed be
faster.
• We have a query that contains two subselects containing an aggregate function (SUM,
Count, etc.) in the SELECT part. The query was performing sluggishly. We were able to
isolate the problem down to the aggregate function in the subselect.
To rectify the problem, we reorganized the query so that there was still an aggregate function in
the SELECT part, but replaced the subselects with a series of JOINS. The query executed much faster.
By default, every time a stored procedure is executed, a message is sent from the server to the client
indicating the number of rows that were affected by the stored procedure. Rarely is this information
useful to the client. By turning off this default behavior, you can reduce network traffic between the
server and the client, helping to boost overall performance of your server and applications.
There are two main ways to turn this feature off. You can also turn this feature off using a server
trace setting, but it is unnecessary as there are easier ways, as described here. To turn this feature off
on at the stored procedure level, you can include the statement:
SET NOCOUNT ON
at the beginning of each stored procedure you write. This statement should be included in every
stored procedure you write.
When you need to execute a string of Transact-SQL, you should use the sp_executesql stored
procedure instead of the EXECUTE statement. Sp_executesql offers two major advantages over
EXECUTE. First, it supports parameter substitution, which gives you more options when creating your
code. Second, it creates query execution plans that are more likely to be reused by SQL Server, which
in turn reduces overhead on the server, boosting performance.
Sp_executesql executes a string of Transact-SQL in its own self-contained batch. When it is run, SQL
Server compiles the code in the string into an execution plan that is separate from the batch that
contained the sp_executesql and its string.
SQL Server will automatically recompile a stored procedure if A major number of INSERTS, UPDATES or
DELETES are made to a table referenced by a stored procedure.
When calling a stored procedure from your application, it is important that you call it using its qualified
name. Such as:
exec dbo.myProcedure
instead of:
exec myProcedure
Why? There are a couple of reasons, one of which relates to performance. First, using fully qualified
names helps to eliminate any potential confusion about which stored procedure you want to run,
helping to prevent bugs and other potential problems. But more importantly, doing so allows SQL
Server to access the stored procedures execution plan more directly, and in turn, speeding up the
performance of the stored procedure. Yes, the performance boost is very small, but if your server is
running tens of thousands or more stored procedures every hour, these little time savings can add up.
If a stored procedure needs to return only a single value, and not a recordset, consider returning the
single value as an output statement. While output statements are generally used for error-checking,
they can actually be used for any reason you like. Returning a single value as at output statement is
faster than returning a single value as part of a recordset.
In your queries, don't return column data you don't need. For example, you should not use SELECT * to
return all the columns from a table if you don't need all the data from each column. In addition, using
SELECT * may prevent the use of covered indexes, further potentially hurting query performance.
If you need to verify the existence of a record in a table, don't use SELECT COUNT(*) in your Transact-
SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-
SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:
Here's how you might use COUNT(*):
IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')
Here's a faster way, using IF EXISTS:
IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')
The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the
text is proven true, while COUNT(*) must count go through every record, whether there is only one, or
thousands, before it can be found to be true.
If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is
constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to
use an index (assuming there is one). If the substring you are searching for does not include the first
character of the column you are searching for, then a table scan is performed.
If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for
better performance.
Instead of doing this:
WHERE SUBSTRING(column_name,1,1) = 'b'
Try using this instead:
WHERE column_name LIKE 'b%'
If you decide to make this choice, keep in mind that you will want your LIKE condition to be
sargable, which means that you cannot place a wildcard in the first position.
If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server
optimizer has to use a nested table scan to perform this activity, instead try to use one of the following
options instead, all of which offer better performance:
• Use EXISTS or NOT EXISTS.
• Use IN.
• Perform a LEFT OUTER JOIN and check for a NULL condition.
When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will
generally want to use the EXISTS clause, as it is usually more efficient and performs faster.
Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra
overhead. For example, perhaps it may be more efficient to sort the data at the client than at the
server. In other cases, perhaps the client doesn't even need sorted data to achieve its goal. The key
here is to remember that you shouldn't automatically sort data, unless you know it is necessary.
If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order
the list of values so that the most frequently found values are placed at the first of the list, and the less
April 20, 2010 at 6:56 am
Your post covers everything and it covers nothing. I suggest sticking to the OP's issue and waiting for more info. I'm sure he will appreciate the effort, but I don't think your answer will be particularly useful.
Let's wait and see.
-- Gianluca Sartori
April 20, 2010 at 7:03 am
hrushiatre (4/20/2010)
Consider below some of the important pointsIf you have ever performed a SELECT COUNT(*) on a very large table, you know how long it can take.
For example, when I ran the following command on a large table I manage:
SELECT COUNT(*) from <table_name>
It took 1:09 to count 10,725,948 rows in the table. At the same time, SQL Server had to perform a
lot of logical and physical I/O in order to perform the count, chewing up important SQL Server
resources.
A much faster, and more efficient, way of counting rows in a table is to run the following query:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('<table_name>') AND indid < 2
When I run the query against the same table, it takes less than a second to run, and it gave me the
same results. Not a bad improvement and it took virtually no I/O. This is because the row count of your
tables is stored in the sysindexes system table of your database. So instead of counting rows when you
need to, just look up the row count in the sysindexes table.
There is one potential downside to using the sysindexes table. And that this system table is not
updated in real time, so it might underestimate the number of rows you actually have. Assuming you
have the database option turned on to "Auto Create Statistics" and "Auto Update Statistics," the value
you get should be very close to being correct, if not correct. If you can live with a very close estimate,
then this is the best way to count rows in your tables.
Queries that return a single row, are just as fast using a clustered index as a non-clustered index. If you
will be creating an index to speed the retrieval of a single record, you may want to consider making it a
non-clustered index, and saving the clustering index (you can only have one) for queries that return a
range of data.
That's not entirely accurate. In fact, in some ways, it's inaccurate. It really depends on what is being indexed and how it is being queried before you can say either type of index is faster in any given situation. For example, you said they'd be the same for retrieving a single row. What if the nonclustered index is not covering, meaning it doesn't provide all the data necessary to answer the query? Then it needs to hit both the nonclustered index and perform a lookup either on the heap or the clustered index. Lookups are very costly.
The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE
clause in the query contains an OR operator and if any of the referenced columns in the OR clause are
not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses,
you will want to ensure that each referenced column in the WHERE clause has an index.
This is also an iffy statement. You need to be very careful about saying that every column referenced needs an index. You don't want lots of singleton indexes on a table.
One of the best ways to boost JOIN performance is to limit how many rows need to be JOINed. This is
especially beneficial for the outer table in a JOIN. Only return absolutely only those rows needed to be
JOINed, and no more.
If you perform regular joins between two or more tables in your queries, performance will be
optimized if each of the joined columns has their own indexes. This includes adding indexes to the
columns in each table used to join the tables. Generally speaking, a clustered key is better than a nonclustered
key for optimum JOIN performance.
Again, dangerous information. First, each column having an index is probably not the way to go. If multiple columns are regularly used in a join, then a compound index in which all those columns take part is a better approach. Also, a cluster is not necessarily better than a nonclustered index for JOIN performance. What type of join operation is the execution plan using? Let's say it's a Nested Loop Operator. Let's also say that we can provide a covering nonclustered index. This means each probe operation from the top of the operator can take advantage of the covering nonclustered index. This can be much faster than a clustered index on the same key columns (but not necessarily, as usual, it depends).
Avoid joining tables based on columns with few unique values. If columns used for joining aren’t
mostly unique, then the SQL Server optimizer may not be able to use an existing index in order to
speed up the join. Ideally, for best performance, joins should be done on columns that have unique
indexes.
If you have the choice of using a JOIN or a subquery to perform the same task, generally the JOIN
(often an OUTER JOIN) is faster. But this is not always the case. For example, if the returned data is
going to be small, or if there are no indexes on the joined columns, then a subquery may indeed be
faster.
You'll have to back this one up, because I can't support it at all
• We have a query that contains two subselects containing an aggregate function (SUM,
Count, etc.) in the SELECT part. The query was performing sluggishly. We were able to
isolate the problem down to the aggregate function in the subselect.
To rectify the problem, we reorganized the query so that there was still an aggregate function in
the SELECT part, but replaced the subselects with a series of JOINS. The query executed much faster.
By default, every time a stored procedure is executed, a message is sent from the server to the client
indicating the number of rows that were affected by the stored procedure. Rarely is this information
useful to the client. By turning off this default behavior, you can reduce network traffic between the
server and the client, helping to boost overall performance of your server and applications.
There are two main ways to turn this feature off. You can also turn this feature off using a server
trace setting, but it is unnecessary as there are easier ways, as described here. To turn this feature off
on at the stored procedure level, you can include the statement:
SET NOCOUNT ON
at the beginning of each stored procedure you write. This statement should be included in every
stored procedure you write.
When you need to execute a string of Transact-SQL, you should use the sp_executesql stored
procedure instead of the EXECUTE statement. Sp_executesql offers two major advantages over
EXECUTE. First, it supports parameter substitution, which gives you more options when creating your
code. Second, it creates query execution plans that are more likely to be reused by SQL Server, which
in turn reduces overhead on the server, boosting performance.
Sp_executesql executes a string of Transact-SQL in its own self-contained batch. When it is run, SQL
Server compiles the code in the string into an execution plan that is separate from the batch that
contained the sp_executesql and its string.
SQL Server will automatically recompile a stored procedure if A major number of INSERTS, UPDATES or
DELETES are made to a table referenced by a stored procedure.
When calling a stored procedure from your application, it is important that you call it using its qualified
name. Such as:
exec dbo.myProcedure
instead of:
exec myProcedure
Why? There are a couple of reasons, one of which relates to performance. First, using fully qualified
names helps to eliminate any potential confusion about which stored procedure you want to run,
helping to prevent bugs and other potential problems. But more importantly, doing so allows SQL
Server to access the stored procedures execution plan more directly, and in turn, speeding up the
performance of the stored procedure. Yes, the performance boost is very small, but if your server is
running tens of thousands or more stored procedures every hour, these little time savings can add up.
If a stored procedure needs to return only a single value, and not a recordset, consider returning the
single value as an output statement. While output statements are generally used for error-checking,
they can actually be used for any reason you like. Returning a single value as at output statement is
faster than returning a single value as part of a recordset.
In your queries, don't return column data you don't need. For example, you should not use SELECT * to
return all the columns from a table if you don't need all the data from each column. In addition, using
SELECT * may prevent the use of covered indexes, further potentially hurting query performance.
If you need to verify the existence of a record in a table, don't use SELECT COUNT(*) in your Transact-
SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-
SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:
Here's how you might use COUNT(*):
IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')
Here's a faster way, using IF EXISTS:
IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')
The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the
text is proven true, while COUNT(*) must count go through every record, whether there is only one, or
thousands, before it can be found to be true.
If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is
constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to
use an index (assuming there is one). If the substring you are searching for does not include the first
character of the column you are searching for, then a table scan is performed.
If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for
better performance.
Instead of doing this:
WHERE SUBSTRING(column_name,1,1) = 'b'
Try using this instead:
WHERE column_name LIKE 'b%'
If you decide to make this choice, keep in mind that you will want your LIKE condition to be
sargable, which means that you cannot place a wildcard in the first position.
If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server
optimizer has to use a nested table scan to perform this activity, instead try to use one of the following
options instead, all of which offer better performance:
• Use EXISTS or NOT EXISTS.
• Use IN.
• Perform a LEFT OUTER JOIN and check for a NULL condition.
When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will
generally want to use the EXISTS clause, as it is usually more efficient and performs faster.
Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra
overhead. For example, perhaps it may be more efficient to sort the data at the client than at the
server. In other cases, perhaps the client doesn't even need sorted data to achieve its goal. The key
here is to remember that you shouldn't automatically sort data, unless you know it is necessary.
If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order
the list of values so that the most frequently found values are placed at the first of the list, and the less
I don't have time to hit every point here, but a lot of this information is of dubious origin. I'd be very careful applying this advice to my production servers.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 20, 2010 at 7:07 am
@Grant, looks a lot like cut'n paste. Would be nice to know the source.
-- Gianluca Sartori
April 20, 2010 at 7:27 am
NO it is not cut an paste
1 and half year back i am having problem of performance
So i have search through net and created a document
April 20, 2010 at 7:39 am
My statement "dubious origin" was probably badly phrased. I think "dubioius benefit" would be more accurate.
But I would still be curious about the origin of some of the statements.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply