September 20, 2010 at 11:53 am
Hello all,
I was "optimizing" a procedure that was taking a long time to run, and I am really confused about the results,
That procedures did theses things
part a)
select into #temptable from table 1 join table 2 join table 3 ( table 2 and 3 weren't really needed)
( about 50 million records)
create indexes
part b)
select into #temptable2 from ...
create indexes
part c)
select into #temptable3 from ...
create indexes
part d)
select into #temptable4 from #temptable1 join #temptable2 join #temptable3
(about 5 million records)
Resultant time 50 minutes
I changed it to something like this
Part a)
create table #temptable1
create clustered index
insert into #temptable1 select from table1 ( all needed data was here)
create non clustered indexes
Part b)
create table #temptable2
create clustered index
insert into #temptable2 select from ...
create non clustered indexes
Part c)
create table #temptable3
create clustered index
insert into #temptable3 select from ...
create non clustered indexes
Partd d)
select into #temptable4 from #temptable1 join #temptable2 join #temptable3
Resultant time 4 hours
Somebody can explain to me why such a thing is possible? where can I find the very best practices for tuning?
I believed that I was following the rules and look at what happened!
Thanks in advance
September 20, 2010 at 12:03 pm
Your question isn't entirely easy to answer. You're missing a few pieces. Like, were you creating clustered indexes in the first sequence, or non-clustered indexes? Secondly, did you change any of the other Select queries like you did the first one?
Things like that will change performance. None of us can give you a decent answer to your problem with the pseudo-code you've listed. However... I can tell you this. Clustered indexes suck for data insertion. They're great for reads, not so much for writes. And 50 million rows? That's a LOT of data. I'm betting your second attempt spent most of that 4 hours just on part a.
September 20, 2010 at 12:07 pm
Thanks for the quick response
I didn't change the indexes, only the order
I dind't change the other querys since they already use the correspondant indexes, I only changed the first one since it used 3 tables and all the data was on the fist one and reacheble with a clustered index
September 20, 2010 at 12:10 pm
Okay, the best way to figure this out is to try Part A of both methods and time them. I'm betting this is where the problem is. Because you're creating the clustered index prior to data load, it's forcing the data to get reordered as it's inserting.
But that's a guess. As I said, the best way to know for sure is to ignore the other parts, and just do Part A of both methods in two separate SSMS windows. Time it, have it Display Execution Plan, and look at the plan for both bits when it's done running.
September 20, 2010 at 12:11 pm
I think that best practice is to create indexes after the load.
Otherwise you are doing index maintenance for each insert operation which I would think would be substantially slower.
-Dan B
September 20, 2010 at 12:12 pm
Oops! Ditto what Brandie wrote...
September 20, 2010 at 12:20 pm
Already did that test
Part A, original sproc 14 minutes/New sproc 15 minutes
September 20, 2010 at 12:32 pm
Then keep doing the tests one at a time until you find out which part of your method changed so much that it's taking 4 hours to do.
September 20, 2010 at 6:34 pm
My first question is, what are you doing with the data that you need all the temp tables? That's probably a lot of spill to disk that might be avoided by avoiding temp tables all together.
Why can't you just do your final join on the base tables that are filling the temp tables? That "might" be faster than the loading temp tables with 50million rows to return 5 million.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
September 20, 2010 at 9:54 pm
The second most important part of optimizing is to know how long [font="Arial Black"]each part [/font]took using the old way and how long it takes using the new way. Otherwise, you're tuning blindly and are likely going to make things very much worse. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
September 20, 2010 at 10:09 pm
You really have given minimum infornation here. No join conditions, no where clauses or column names, no execution plans, no tables or table details. Sorry we can't help with some details.
I've just been through a simmilar proces of optimising a poorly written query that was reading 48 million rows into a temp table as part of a complex select. Turned out all it needed was to add the INCLUDE clause with 4 int columns to an existing index to improve the query plan. The execution plan went from a Table Scan to an index seek and the report went from 40 minutes to 3 seconds.:-D
If you give us enough info there are people on the forum who can help you.
Cheers
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.
September 21, 2010 at 2:17 am
Hi,
Gail Shaw has an excellent article on here covering how to post queries relating to performance issues in order to give people the information they need in order to assist.
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Have a look at this and follow her advice and I'm sure you'll get an answer to your problem, and in all probability a method of improving your response time, and resource use, dramatically
September 21, 2010 at 6:14 am
Jeff Moden (9/20/2010)
The second most important part of optimizing is to know how long [font="Arial Black"]each part [/font]took using the old way and how long it takes using the new way. Otherwise, you're tuning blindly and are likely going to make things very much worse. 🙂
Exactly the point I was trying to make. Thanks for clarifying it, Jeff.
September 21, 2010 at 8:39 am
Actually I didn't post that info on purpose, I just wanted to show a case where the best practices aren't the bes, and the question at the final part of my post is "where can I find the very best practices for tuning?" sice I followed all of the ones I know, all my querys use index seek or clustered index seek and the original one use index scan in all the cases, so I want to know if someone has a set of rules that we should follow when we write or try to optimize something like this
Thanks
September 21, 2010 at 9:40 am
ricardo_chicas (9/21/2010)
Actually I didn't post that info on purpose, I just wanted to show a case where the best practices aren't the bes, and the question at the final part of my post is "where can I find the very best practices for tuning?" sice I followed all of the ones I know, all my querys use index seek or clustered index seek and the original one use index scan in all the cases, so I want to know if someone has a set of rules that we should follow when we write or try to optimize something like thisThanks
I think you missed one of the most important ones, which I alluded to in my original post. Only return the rows that you need to return. Is it really necessary to insert 50million rows into a temp table in order to return 5 million rows? The answer may be "Yes", but without details no one can suggest a better way. You could be experiencing a ton of contention in tempdb, especially if there are other processes that work the same way.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply