Stored procedure performance tuning

  • Hi

    I have temporary table of 150 employees, and I m pasing date range.

    For each employee for each date I have to search data in another table.

    But this looping takes around 30 mins if the date range exceeds 4 days.

    So how to improve performance. How can i know which part of the stored procedure is consuming more time.

    Please post the answer.

    Thank you all

  • Very simple: avoid looping. Almost every problem can be solved with set based code. You provided very little information, I suggest that you take a look at the article linked in my signature and come back with more info. I'm sure lots of people will be glad to help.

    -- Gianluca Sartori

  • The way to tell which parts of the procedure are taking more time is to look at the execution plan. You can run the query yourself through a SQL window in Management Studio. Hit CTL-M before you execute it, or click on the icon on the tool bar, and you'll get an execution plan which will show the estimated costs for the various queries within the stored procedure. It will also show the estimated cost of operations within the query. With this information you can track down what's slowing things down.

    You can also get an estimated plan by querying Dynamic Management Views, such as sys.dm_exec_query_stats and combine that with sys.dm_exec_query_plan.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Consider below some of the important points

    If you have ever performed a SELECT COUNT(*) on a very large table, you know how long it can take.

    For example, when I ran the following command on a large table I manage:

    SELECT COUNT(*) from <table_name>

    It took 1:09 to count 10,725,948 rows in the table. At the same time, SQL Server had to perform a

    lot of logical and physical I/O in order to perform the count, chewing up important SQL Server

    resources.

    A much faster, and more efficient, way of counting rows in a table is to run the following query:

    SELECT rows FROM sysindexes WHERE id = OBJECT_ID('<table_name>') AND indid < 2

    When I run the query against the same table, it takes less than a second to run, and it gave me the

    same results. Not a bad improvement and it took virtually no I/O. This is because the row count of your

    tables is stored in the sysindexes system table of your database. So instead of counting rows when you

    need to, just look up the row count in the sysindexes table.

    There is one potential downside to using the sysindexes table. And that this system table is not

    updated in real time, so it might underestimate the number of rows you actually have. Assuming you

    have the database option turned on to "Auto Create Statistics" and "Auto Update Statistics," the value

    you get should be very close to being correct, if not correct. If you can live with a very close estimate,

    then this is the best way to count rows in your tables.

    Queries that return a single row, are just as fast using a clustered index as a non-clustered index. If you

    will be creating an index to speed the retrieval of a single record, you may want to consider making it a

    non-clustered index, and saving the clustering index (you can only have one) for queries that return a

    range of data.

    The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE

    clause in the query contains an OR operator and if any of the referenced columns in the OR clause are

    not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses,

    you will want to ensure that each referenced column in the WHERE clause has an index.

    One of the best ways to boost JOIN performance is to limit how many rows need to be JOINed. This is

    especially beneficial for the outer table in a JOIN. Only return absolutely only those rows needed to be

    JOINed, and no more.

    If you perform regular joins between two or more tables in your queries, performance will be

    optimized if each of the joined columns has their own indexes. This includes adding indexes to the

    columns in each table used to join the tables. Generally speaking, a clustered key is better than a nonclustered

    key for optimum JOIN performance.

    Avoid joining tables based on columns with few unique values. If columns used for joining aren’t

    mostly unique, then the SQL Server optimizer may not be able to use an existing index in order to

    speed up the join. Ideally, for best performance, joins should be done on columns that have unique

    indexes.

    If you have the choice of using a JOIN or a subquery to perform the same task, generally the JOIN

    (often an OUTER JOIN) is faster. But this is not always the case. For example, if the returned data is

    going to be small, or if there are no indexes on the joined columns, then a subquery may indeed be

    faster.

    • We have a query that contains two subselects containing an aggregate function (SUM,

    Count, etc.) in the SELECT part. The query was performing sluggishly. We were able to

    isolate the problem down to the aggregate function in the subselect.

    To rectify the problem, we reorganized the query so that there was still an aggregate function in

    the SELECT part, but replaced the subselects with a series of JOINS. The query executed much faster.

    By default, every time a stored procedure is executed, a message is sent from the server to the client

    indicating the number of rows that were affected by the stored procedure. Rarely is this information

    useful to the client. By turning off this default behavior, you can reduce network traffic between the

    server and the client, helping to boost overall performance of your server and applications.

    There are two main ways to turn this feature off. You can also turn this feature off using a server

    trace setting, but it is unnecessary as there are easier ways, as described here. To turn this feature off

    on at the stored procedure level, you can include the statement:

    SET NOCOUNT ON

    at the beginning of each stored procedure you write. This statement should be included in every

    stored procedure you write.

    When you need to execute a string of Transact-SQL, you should use the sp_executesql stored

    procedure instead of the EXECUTE statement. Sp_executesql offers two major advantages over

    EXECUTE. First, it supports parameter substitution, which gives you more options when creating your

    code. Second, it creates query execution plans that are more likely to be reused by SQL Server, which

    in turn reduces overhead on the server, boosting performance.

    Sp_executesql executes a string of Transact-SQL in its own self-contained batch. When it is run, SQL

    Server compiles the code in the string into an execution plan that is separate from the batch that

    contained the sp_executesql and its string.

    SQL Server will automatically recompile a stored procedure if A major number of INSERTS, UPDATES or

    DELETES are made to a table referenced by a stored procedure.

    When calling a stored procedure from your application, it is important that you call it using its qualified

    name. Such as:

    exec dbo.myProcedure

    instead of:

    exec myProcedure

    Why? There are a couple of reasons, one of which relates to performance. First, using fully qualified

    names helps to eliminate any potential confusion about which stored procedure you want to run,

    helping to prevent bugs and other potential problems. But more importantly, doing so allows SQL

    Server to access the stored procedures execution plan more directly, and in turn, speeding up the

    performance of the stored procedure. Yes, the performance boost is very small, but if your server is

    running tens of thousands or more stored procedures every hour, these little time savings can add up.

    If a stored procedure needs to return only a single value, and not a recordset, consider returning the

    single value as an output statement. While output statements are generally used for error-checking,

    they can actually be used for any reason you like. Returning a single value as at output statement is

    faster than returning a single value as part of a recordset.

    In your queries, don't return column data you don't need. For example, you should not use SELECT * to

    return all the columns from a table if you don't need all the data from each column. In addition, using

    SELECT * may prevent the use of covered indexes, further potentially hurting query performance.

    If you need to verify the existence of a record in a table, don't use SELECT COUNT(*) in your Transact-

    SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-

    SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:

    Here's how you might use COUNT(*):

    IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')

    Here's a faster way, using IF EXISTS:

    IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')

    The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the

    text is proven true, while COUNT(*) must count go through every record, whether there is only one, or

    thousands, before it can be found to be true.

    If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is

    constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to

    use an index (assuming there is one). If the substring you are searching for does not include the first

    character of the column you are searching for, then a table scan is performed.

    If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for

    better performance.

    Instead of doing this:

    WHERE SUBSTRING(column_name,1,1) = 'b'

    Try using this instead:

    WHERE column_name LIKE 'b%'

    If you decide to make this choice, keep in mind that you will want your LIKE condition to be

    sargable, which means that you cannot place a wildcard in the first position.

    If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server

    optimizer has to use a nested table scan to perform this activity, instead try to use one of the following

    options instead, all of which offer better performance:

    • Use EXISTS or NOT EXISTS.

    • Use IN.

    • Perform a LEFT OUTER JOIN and check for a NULL condition.

    When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will

    generally want to use the EXISTS clause, as it is usually more efficient and performs faster.

    Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra

    overhead. For example, perhaps it may be more efficient to sort the data at the client than at the

    server. In other cases, perhaps the client doesn't even need sorted data to achieve its goal. The key

    here is to remember that you shouldn't automatically sort data, unless you know it is necessary.

    If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order

    the list of values so that the most frequently found values are placed at the first of the list, and the less

  • Your post covers everything and it covers nothing. I suggest sticking to the OP's issue and waiting for more info. I'm sure he will appreciate the effort, but I don't think your answer will be particularly useful.

    Let's wait and see.

    -- Gianluca Sartori

  • hrushiatre (4/20/2010)


    Consider below some of the important points

    If you have ever performed a SELECT COUNT(*) on a very large table, you know how long it can take.

    For example, when I ran the following command on a large table I manage:

    SELECT COUNT(*) from <table_name>

    It took 1:09 to count 10,725,948 rows in the table. At the same time, SQL Server had to perform a

    lot of logical and physical I/O in order to perform the count, chewing up important SQL Server

    resources.

    A much faster, and more efficient, way of counting rows in a table is to run the following query:

    SELECT rows FROM sysindexes WHERE id = OBJECT_ID('<table_name>') AND indid < 2

    When I run the query against the same table, it takes less than a second to run, and it gave me the

    same results. Not a bad improvement and it took virtually no I/O. This is because the row count of your

    tables is stored in the sysindexes system table of your database. So instead of counting rows when you

    need to, just look up the row count in the sysindexes table.

    There is one potential downside to using the sysindexes table. And that this system table is not

    updated in real time, so it might underestimate the number of rows you actually have. Assuming you

    have the database option turned on to "Auto Create Statistics" and "Auto Update Statistics," the value

    you get should be very close to being correct, if not correct. If you can live with a very close estimate,

    then this is the best way to count rows in your tables.

    Queries that return a single row, are just as fast using a clustered index as a non-clustered index. If you

    will be creating an index to speed the retrieval of a single record, you may want to consider making it a

    non-clustered index, and saving the clustering index (you can only have one) for queries that return a

    range of data.

    That's not entirely accurate. In fact, in some ways, it's inaccurate. It really depends on what is being indexed and how it is being queried before you can say either type of index is faster in any given situation. For example, you said they'd be the same for retrieving a single row. What if the nonclustered index is not covering, meaning it doesn't provide all the data necessary to answer the query? Then it needs to hit both the nonclustered index and perform a lookup either on the heap or the clustered index. Lookups are very costly.

    The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE

    clause in the query contains an OR operator and if any of the referenced columns in the OR clause are

    not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses,

    you will want to ensure that each referenced column in the WHERE clause has an index.

    This is also an iffy statement. You need to be very careful about saying that every column referenced needs an index. You don't want lots of singleton indexes on a table.

    One of the best ways to boost JOIN performance is to limit how many rows need to be JOINed. This is

    especially beneficial for the outer table in a JOIN. Only return absolutely only those rows needed to be

    JOINed, and no more.

    If you perform regular joins between two or more tables in your queries, performance will be

    optimized if each of the joined columns has their own indexes. This includes adding indexes to the

    columns in each table used to join the tables. Generally speaking, a clustered key is better than a nonclustered

    key for optimum JOIN performance.

    Again, dangerous information. First, each column having an index is probably not the way to go. If multiple columns are regularly used in a join, then a compound index in which all those columns take part is a better approach. Also, a cluster is not necessarily better than a nonclustered index for JOIN performance. What type of join operation is the execution plan using? Let's say it's a Nested Loop Operator. Let's also say that we can provide a covering nonclustered index. This means each probe operation from the top of the operator can take advantage of the covering nonclustered index. This can be much faster than a clustered index on the same key columns (but not necessarily, as usual, it depends).

    Avoid joining tables based on columns with few unique values. If columns used for joining aren’t

    mostly unique, then the SQL Server optimizer may not be able to use an existing index in order to

    speed up the join. Ideally, for best performance, joins should be done on columns that have unique

    indexes.

    If you have the choice of using a JOIN or a subquery to perform the same task, generally the JOIN

    (often an OUTER JOIN) is faster. But this is not always the case. For example, if the returned data is

    going to be small, or if there are no indexes on the joined columns, then a subquery may indeed be

    faster.

    You'll have to back this one up, because I can't support it at all

    • We have a query that contains two subselects containing an aggregate function (SUM,

    Count, etc.) in the SELECT part. The query was performing sluggishly. We were able to

    isolate the problem down to the aggregate function in the subselect.

    To rectify the problem, we reorganized the query so that there was still an aggregate function in

    the SELECT part, but replaced the subselects with a series of JOINS. The query executed much faster.

    By default, every time a stored procedure is executed, a message is sent from the server to the client

    indicating the number of rows that were affected by the stored procedure. Rarely is this information

    useful to the client. By turning off this default behavior, you can reduce network traffic between the

    server and the client, helping to boost overall performance of your server and applications.

    There are two main ways to turn this feature off. You can also turn this feature off using a server

    trace setting, but it is unnecessary as there are easier ways, as described here. To turn this feature off

    on at the stored procedure level, you can include the statement:

    SET NOCOUNT ON

    at the beginning of each stored procedure you write. This statement should be included in every

    stored procedure you write.

    When you need to execute a string of Transact-SQL, you should use the sp_executesql stored

    procedure instead of the EXECUTE statement. Sp_executesql offers two major advantages over

    EXECUTE. First, it supports parameter substitution, which gives you more options when creating your

    code. Second, it creates query execution plans that are more likely to be reused by SQL Server, which

    in turn reduces overhead on the server, boosting performance.

    Sp_executesql executes a string of Transact-SQL in its own self-contained batch. When it is run, SQL

    Server compiles the code in the string into an execution plan that is separate from the batch that

    contained the sp_executesql and its string.

    SQL Server will automatically recompile a stored procedure if A major number of INSERTS, UPDATES or

    DELETES are made to a table referenced by a stored procedure.

    When calling a stored procedure from your application, it is important that you call it using its qualified

    name. Such as:

    exec dbo.myProcedure

    instead of:

    exec myProcedure

    Why? There are a couple of reasons, one of which relates to performance. First, using fully qualified

    names helps to eliminate any potential confusion about which stored procedure you want to run,

    helping to prevent bugs and other potential problems. But more importantly, doing so allows SQL

    Server to access the stored procedures execution plan more directly, and in turn, speeding up the

    performance of the stored procedure. Yes, the performance boost is very small, but if your server is

    running tens of thousands or more stored procedures every hour, these little time savings can add up.

    If a stored procedure needs to return only a single value, and not a recordset, consider returning the

    single value as an output statement. While output statements are generally used for error-checking,

    they can actually be used for any reason you like. Returning a single value as at output statement is

    faster than returning a single value as part of a recordset.

    In your queries, don't return column data you don't need. For example, you should not use SELECT * to

    return all the columns from a table if you don't need all the data from each column. In addition, using

    SELECT * may prevent the use of covered indexes, further potentially hurting query performance.

    If you need to verify the existence of a record in a table, don't use SELECT COUNT(*) in your Transact-

    SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-

    SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:

    Here's how you might use COUNT(*):

    IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')

    Here's a faster way, using IF EXISTS:

    IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')

    The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the

    text is proven true, while COUNT(*) must count go through every record, whether there is only one, or

    thousands, before it can be found to be true.

    If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is

    constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to

    use an index (assuming there is one). If the substring you are searching for does not include the first

    character of the column you are searching for, then a table scan is performed.

    If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for

    better performance.

    Instead of doing this:

    WHERE SUBSTRING(column_name,1,1) = 'b'

    Try using this instead:

    WHERE column_name LIKE 'b%'

    If you decide to make this choice, keep in mind that you will want your LIKE condition to be

    sargable, which means that you cannot place a wildcard in the first position.

    If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server

    optimizer has to use a nested table scan to perform this activity, instead try to use one of the following

    options instead, all of which offer better performance:

    • Use EXISTS or NOT EXISTS.

    • Use IN.

    • Perform a LEFT OUTER JOIN and check for a NULL condition.

    When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will

    generally want to use the EXISTS clause, as it is usually more efficient and performs faster.

    Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra

    overhead. For example, perhaps it may be more efficient to sort the data at the client than at the

    server. In other cases, perhaps the client doesn't even need sorted data to achieve its goal. The key

    here is to remember that you shouldn't automatically sort data, unless you know it is necessary.

    If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order

    the list of values so that the most frequently found values are placed at the first of the list, and the less

    I don't have time to hit every point here, but a lot of this information is of dubious origin. I'd be very careful applying this advice to my production servers.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • @Grant, looks a lot like cut'n paste. Would be nice to know the source.

    -- Gianluca Sartori

  • NO it is not cut an paste

    1 and half year back i am having problem of performance

    So i have search through net and created a document

  • My statement "dubious origin" was probably badly phrased. I think "dubioius benefit" would be more accurate.

    But I would still be curious about the origin of some of the statements.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply