Best practices are really the best (fastest) solution?

Question

Best practices are really the best (fastest) solution?

ricardo_chicas

SSCertifiable

Points: 5024
More actions
September 20, 2010 at 11:53 am

#238423

Hello all,
I was "optimizing" a procedure that was taking a long time to run, and I am really confused about the results,
That procedures did theses things
part a)
select into #temptable from table 1 join table 2 join table 3 ( table 2 and 3 weren't really needed)
( about 50 million records)
create indexes
part b)
select into #temptable2 from ...
create indexes
part c)
select into #temptable3 from ...
create indexes
part d)
select into #temptable4 from #temptable1 join #temptable2 join #temptable3
(about 5 million records)
Resultant time 50 minutes
I changed it to something like this
Part a)
create table #temptable1
create clustered index
insert into #temptable1 select from table1 ( all needed data was here)
create non clustered indexes
Part b)
create table #temptable2
create clustered index
insert into #temptable2 select from ...
create non clustered indexes
Part c)
create table #temptable3
create clustered index
insert into #temptable3 select from ...
create non clustered indexes
Partd d)
select into #temptable4 from #temptable1 join #temptable2 join #temptable3
Resultant time 4 hours
Somebody can explain to me why such a thing is possible? where can I find the very best practices for tuning?
I believed that I was following the rules and look at what happened!
Thanks in advance

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Brandie Tarvin SSC Guru Points: 173068 More actions · Answer 1

Your question isn't entirely easy to answer. You're missing a few pieces. Like, were you creating clustered indexes in the first sequence, or non-clustered indexes? Secondly, did you change any of the other Select queries like you did the first one?

Things like that will change performance. None of us can give you a decent answer to your problem with the pseudo-code you've listed. However... I can tell you this. Clustered indexes suck for data insertion. They're great for reads, not so much for writes. And 50 million rows? That's a LOT of data. I'm betting your second attempt spent most of that 4 hours just on part a.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 2

Thanks for the quick response

I didn't change the indexes, only the order

I dind't change the other querys since they already use the correspondant indexes, I only changed the first one since it used 3 tables and all the data was on the fist one and reacheble with a clustered index

Brandie Tarvin SSC Guru Points: 173068 More actions · Answer 3

Okay, the best way to figure this out is to try Part A of both methods and time them. I'm betting this is where the problem is. Because you're creating the clustered index prior to data load, it's forcing the data to get reordered as it's inserting.

But that's a guess. As I said, the best way to know for sure is to ignore the other parts, and just do Part A of both methods in two separate SSMS windows. Time it, have it Display Execution Plan, and look at the plan for both bits when it's done running.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

skrilla99 SSCertifiable Points: 5795 More actions · Answer 4

I think that best practice is to create indexes after the load.

Otherwise you are doing index maintenance for each insert operation which I would think would be substantially slower.

-Dan B

skrilla99 SSCertifiable Points: 5795 More actions · Answer 5

skrilla99

SSCertifiable

Points: 5795

September 20, 2010 at 12:12 pm

#1224274

Oops! Ditto what Brandie wrote...

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 6

Already did that test

Part A, original sproc 14 minutes/New sproc 15 minutes

Brandie Tarvin SSC Guru Points: 173068 More actions · Answer 7

Then keep doing the tests one at a time until you find out which part of your method changed so much that it's taking 4 hours to do.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Jack Corbett SSC Guru Points: 184394 More actions · Answer 8

My first question is, what are you doing with the data that you need all the temp tables? That's probably a lot of spill to disk that might be avoided by avoiding temp tables all together.

Why can't you just do your final join on the base tables that are filling the temp tables? That "might" be faster than the loading temp tables with 50million rows to return 5 million.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 9

The second most important part of optimizing is to know how long [font="Arial Black"]each part [/font]took using the old way and how long it takes using the new way. Otherwise, you're tuning blindly and are likely going to make things very much worse. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Leo.Miller SSChampion Points: 13051 More actions · Answer 10

You really have given minimum infornation here. No join conditions, no where clauses or column names, no execution plans, no tables or table details. Sorry we can't help with some details.

I've just been through a simmilar proces of optimising a poorly written query that was reading 48 million rows into a temp table as part of a complex select. Turned out all it needed was to add the INCLUDE clause with 4 int columns to an existing index to improve the query plan. The execution plan went from a Table Scan to an index seek and the report went from 40 minutes to 3 seconds.:-D

If you give us enough info there are people on the forum who can help you.

Cheers

Leo

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Andrew Gothard-467944 Hall of Fame Points: 3928 More actions · Answer 11

Hi,

Gail Shaw has an excellent article on here covering how to post queries relating to performance issues in order to give people the information they need in order to assist.

http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Have a look at this and follow her advice and I'm sure you'll get an answer to your problem, and in all probability a method of improving your response time, and resource use, dramatically

Brandie Tarvin SSC Guru Points: 173068 More actions · Answer 12

Jeff Moden (9/20/2010)
The second most important part of optimizing is to know how long [font="Arial Black"]each part [/font]took using the old way and how long it takes using the new way. Otherwise, you're tuning blindly and are likely going to make things very much worse. 🙂

Exactly the point I was trying to make. Thanks for clarifying it, Jeff.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 13

Actually I didn't post that info on purpose, I just wanted to show a case where the best practices aren't the bes, and the question at the final part of my post is "where can I find the very best practices for tuning?" sice I followed all of the ones I know, all my querys use index seek or clustered index seek and the original one use index scan in all the cases, so I want to know if someone has a set of rules that we should follow when we write or try to optimize something like this

Thanks

Jack Corbett SSC Guru Points: 184394 More actions · Answer 14

ricardo_chicas (9/21/2010)
Actually I didn't post that info on purpose, I just wanted to show a case where the best practices aren't the bes, and the question at the final part of my post is "where can I find the very best practices for tuning?" sice I followed all of the ones I know, all my querys use index seek or clustered index seek and the original one use index scan in all the cases, so I want to know if someone has a set of rules that we should follow when we write or try to optimize something like this
Thanks

I think you missed one of the most important ones, which I alluded to in my original post. Only return the rows that you need to return. Is it really necessary to insert 50million rows into a temp table in order to return 5 million rows? The answer may be "Yes", but without details no one can suggest a better way. You could be experiencing a ton of contention in tempdb, especially if there are other processes that work the same way.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question