March 22, 2010 at 6:09 pm
Attaching the text file to this post to make accessing it easier than going via rapidshare.
When you attach the graphical plan, be sure to upload it as an attachment to your post. The file you upload should be named *.sqlplan - you can obtain it by right-clicking on the graphical plan, and choosing Save As. Just being clear that an image of the plan would not be helpful!
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 23, 2010 at 4:55 am
I dont think the physical reads are the problem. The query in production runs for over 4 hours. During this time only about 30000 physical reads are performed. There is no way this amount of physical reads can keep the disk system busy for 4 hours. Even a very basic system could perform 30000 random reads in about 5 minutes.
The query is definitely not IO-bound - it must be CPU bound.
I dont know why the query runs so much slower in production than in dev, but you should focus on understanding what the CPU is doing. Are you sure that there are no other CPU-intensive tasks running at the same time ?
It is also possible to speed up the query a lot just by rewriting it.
User-defined functions cause lots of performance problems in SQL server. It is possible to rewrite the query without functions - like this:
UPDATE CustomerAdditionalFields
SET Store_Loyalty = (select top 1 StoreNumber
FROM dbo._df_proc_active_stores
WHERE StoreNumber <> '931' --exclude the internet
ORDER BY
POWER(SIN(( customer_latitude*0.017453 - GEO_LATITUDE*0.017453 ) / 2), 2) +
COS(customer_latitude*0.017453) * COS(GEO_LATITUDE*0.017453) *
POWER(SIN(( customer_longitude*0.017453 - GEO_LONGITUDE*0.017453 ) / 2), 2)
)
FROM CustomerAdditionalFields CAF
INNER JOIN CustomerLoyaltyStore CLS ON CAF.customer_key = CLS.customer_key
WHERE RollingMonth = 36
AND CLS.RollingMonthRank = 1
AND ISNULL(customer_latitude, 0) <> 0
AND ISNULL(customer_longitude, 0) <> 0
This rewrite speeds up the query about 5 times in my tests !
It is also possible to increase speed even more by using a CLR procedure, or an SSIS package with a script task.
I have actually tried rewriting it as a CLR procedure. It increased the speed another 5 times in my tests. A total improvement of 25 times !!
When performing the query as a CLR procedure it is difficult to make it multi-threaded. To exploit the many CPU:s in the production machine you might want to use an SSIS-package since it is very simple to make SSIS packages multithreaded.
Good luck
/SG
March 23, 2010 at 6:31 am
Attached are the graphical execution plans for both prod and dev
March 23, 2010 at 7:22 am
Check fragmentation on the tables. Read somewhere that fragmentation can affect the ability to read ahead.
March 23, 2010 at 8:06 am
Kevin,
For brevity, I am going to assume that you know all about the evils of T-SQL scalar functions, those that do data access in particular.
Nevertheless, I would join Stefan in encouraging you to write a set-based solution that can take advantage of SQL Server parallelism (no scalar functions!)
For the time being, I am also going to assume that you cannot make any immediate code changes whatsoever, and simply want production performance to match development performance, as far as possible.
To that end, my strong hunch is that you (or someone) either have a trace running on production, or the query is executed with SET STATISTICS IO/TIME ON. One particularly evil thing about T-SQL scalar functions is that they generate trace output for every row processed.
Turn off all traces, and ensure those SET options are OFF. Rejoice in the improved performance, then set about developing and testing a set-based solution, like Stefan's, perhaps taking advantage of the built-in RADIANS function too. I sometimes think I am the only one that uses it.
Remember, all T-SQL functions (except in-line) are inherently evil, and must be avoided at all costs.
edit: spelling, again
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 23, 2010 at 9:47 am
Kevin Bullen (3/23/2010)
Attached are the graphical execution plans for both prod and dev
Just curious: How many different customers do you have, and how many stores do you have ?
March 23, 2010 at 11:01 am
Here's another vote for turning this into a set-based query, eliminating the scalar function, and turning off those options.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 23, 2010 at 11:59 am
Just for fun I created an SSIS package that performs the processing using a C# script component. The package runs two separate threads to get some parallellism. It is easy to extend to more parallell threads if desired.
I have used SSIS 2008 so it is probably not directly usable on 2005, but it might give you some ideas.
The performance is very good. This package is 11 times faster than the set-oriented SQL code I posted before.
/SG
March 25, 2010 at 9:08 am
Thanks all for the input. I ripped out the UDF and called the logic inline, mostly hacking from Stefan_G's sql example. That took the runtime from 10 hours to 2.
I have sworn off scalar functions at this point.
Stefan_G - pretty cool SSIS package.
So with help from you all, I shaved 8 hours from a sql job.
I'm still going to poke around a bit and understand why it runs differently in dev and prod.
Thanks again.
Kevin
March 25, 2010 at 9:23 am
You're welcome.
If you are still interested in help, please post the new execution plans. There might be something more that we can see to help out.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 25, 2010 at 9:54 am
Kevin Bullen (3/25/2010)
I'm still going to poke around a bit and understand why it runs differently in dev and prod.
Trace.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 25, 2010 at 10:02 am
Kevin Bullen (3/25/2010)
Thanks all for the input. I ripped out the UDF and called the logic inline, mostly hacking from Stefan_G's sql example. That took the runtime from 10 hours to 2.I have sworn off scalar functions at this point.
Stefan_G - pretty cool SSIS package.
So with help from you all, I shaved 8 hours from a sql job.
And if you decide to use the SSIS package you will probably gain another factor 10, so the time should go down from 2 hours to 12 minutes.
🙂
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply