Performance required on > 4 million records

Question

Performance required on > 4 million records

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply

dhay1999 Old Hand Points: 379 More actions · Answer 1

Yes, this is a non-analysis services Datawarehouse. so basically I am processing a "cube" to get aggregations at 6 different hierarchical levels/views. No matter what I do, aggregating 150+ million rows is going to take time.

Jeff Moden (10/4/2007)
Are you reporting on all 3 years at the same time?

david hay

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

dhay, you are correct that at some point you are simply hardware bound due to the amount of data you have to chew through with your queries. Having said that, I will note that I have never gone into a client for a performance review where I haven't found SIGNIFICANT opportunites for improvement. Consider having a pro give your system, code, etc a look-see.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

dhay1999 Old Hand Points: 379 More actions · Answer 3

We did. A consultant wrote this code, thats kind of the issue. I'm playing with optimizer hints right now. On a test box and using the option (Maxdop 1) over 2 million rows I can get it almost 20% performance increase. I still don't totally understand why limiting it to 1 proc increases performance, but I'll take it. It's a bit counter intuitive. I'm benchmarking the various ways I go about it and I'll post my results. Thanks for the input!

TheSQLGuru (10/5/2007)
dhay, you are correct that at some point you are simply hardware bound due to the amount of data you have to chew through with your queries. Having said that, I will note that I have never gone into a client for a performance review where I haven't found SIGNIFICANT opportunites for improvement. Consider having a pro give your system, code, etc a look-see.

david hay

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

Clearly you need to find better consultants!! 😛 Did you accept the code from him/her with such poor performance, or did it slow down over time with data growth?

Your statement about MAXDOP makes me wonder if you have hyperthreading enabled on the CPU(s). Most of the time that is a bad thing for sql performance.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 5

dhay1999 (10/5/2007)
Yes, this is a non-analysis services Datawarehouse. so basically I am processing a "cube" to get aggregations at 6 different hierarchical levels/views. No matter what I do, aggregating 150+ million rows is going to take time.

Did your consultants happen to know about "GROUP BY WITH ROLLUP" or "GROUP BY WITH CUBE"? Makes life real easy... Route the output of that to a temp table and then report off the temp table... Look into the "GROUPING" function in the BOL writeup for WITH CUBE... that can be real handy, as well.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dhay1999 Old Hand Points: 379 More actions · Answer 6

I honestly don't know how well it operated when it was first installed. As for the consultants, well, they are gone and we'll leave it at that, and now it's mine! @ Wheeeeeeeeeee.

This is a new position for me and this is one of the tasks I've been assigned is to increase performance on this data load. I'll take a look at the hyperthreading and see if I can turn that "off" for the server.

I'll also look at the cube and rollup that the other poster suggested, I don't know what that will entail, but the reporting piece cannot be changed readily, so I have to keep all the current SP's and datastructures.

TheSQLGuru (10/5/2007)
Clearly you need to find better consultants!! 😛 Did you accept the code from him/her with such poor performance, or did it slow down over time with data growth?
Your statement about MAXDOP makes me wonder if you have hyperthreading enabled on the CPU(s). Most of the time that is a bad thing for sql performance.

david hay

dhay1999 Old Hand Points: 379 More actions · Answer 7

FYI, I check out alot of HT threads on the net today. Really make me wonder how anything can work at times. I found a link to download CPUCOUNT.EXE and ran it against both my servers. Neither one has HT enabled so at least that is out of the equation. Right now my best performance is coming from having the following two options applied to the SP. "option (maxdop 1, fast 1)" takes just over 6 minutes to roll up 2 million rows on the test box. Without the options it takes nearly 8 minutes, so changing the options gives me nearly a 20% performance increase. I'm still at a loss as to "why"... Intuitively, you'd think having more processing power used would = better performance. I've searched and I cannot find a reason why, just that this solution works in my case.

Link for CPUCOUNT.EXE

http://gcc.gnu.org/ml/gcc-help/2002-07/msg00239.html

TheSQLGuru (10/5/2007)
Clearly you need to find better consultants!! 😛 Did you accept the code from him/her with such poor performance, or did it slow down over time with data growth?
Your statement about MAXDOP makes me wonder if you have hyperthreading enabled on the CPU(s). Most of the time that is a bad thing for sql performance.

david hay

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

Couldn't find it in the postings, but I bet you have only 2 cpus - maybe 4? parallelization takes a good bit of effort. You have to split the data into streams, allocate the threads and I/O to be processed, then merge the data back together. Synchronization is a big issue - perhaps one CPU (of 2?) is busy with antivirus work, netowork I/O, etc, etc and gets behind. Now the faster cpu winds up waiting at the end for slower cpu. Also you could be saturating your I/O system, yada-yada. On systems with < 4 cpus, I pretty much always set server max parallelization to 1. Most 4 cpu systems get the same treatment after testing.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

dhay1999 (10/8/2007)
I'll also look at the cube and rollup that the other poster suggested, I don't know what that will entail, but the reporting piece cannot be changed readily, so I have to keep all the current SP's and datastructures.

Man, I feel for ya... "We need you to fix this car... it's leaking oil and gets really bad mileage... but you can't open the hood."

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 10

dhay1999 (10/5/2007)
Yes, this is a non-analysis services Datawarehouse. so basically I am processing a "cube" to get aggregations at 6 different hierarchical levels/views. No matter what I do, aggregating 150+ million rows is going to take time.
Jeff Moden (10/4/2007)
Are you reporting on all 3 years at the same time?

Can they modify data in the past?

If not why don't you create "aggregate" table containing report data for all closed periods.

When next period is closed you need to update that table once.

You may use it with UNION to data from current open period.

Or, if they can, you may create "aggregate" indexed views for periods in the past.

It will slow down updates on historical data but it should not be a problem.

Then you just union all those views together and add aggregate from current period.

Looks pretty nasty, requires manual servicing, but if you're not allowed to open the hood...

_____________
Code for TallyGenerator