Help with performance - aggregating a lot of data

Question

Help with performance - aggregating a lot of data

Viewing 15 posts - 16 through 30 (of 53 total)

You must be logged in to reply to this topic. Login to reply

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 1

Jeff Moden (10/28/2011)
Nope. Good to go here. My only other question is how many rows does your test data jig create? I started it this morning in TempDB and stopped it when TempDb had grown to 10GB.

hmmm...had similar issues, had to crash out after 20mins....PC came to a standstill 🙂

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 2

J Livingston SQL (10/28/2011)
Jeff Moden (10/28/2011)
Nope. Good to go here. My only other question is how many rows does your test data jig create? I started it this morning in TempDB and stopped it when TempDb had grown to 10GB.
hmmm...had similar issues, had to crash out after 20mins....PC came to a standstill 🙂

Ok, now I'm going to drop test tables and recreate them, because it didn't take all that long to run... *curious*

-Ki

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 3

I just dropped the tables and rebuilt them using that code, and it took just under 5 minutes to run. I was starting to wonder if I'd accidentally posted code that wouldn't work!

I'm not wedded to that test data - I just wanted to come with with a way to generate *enough* test data that was short and could post here easily.

-Ki

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 4

Hi Ki

I will try again with your test rig....earlier on I thought it was my PC playing up...until I saw Jeff's post.

think I will reboot and start over.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 5

Ok...got the test data now (dev machince at work 🙂 )

have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 6

J Livingston SQL (10/28/2011)
Hi Ki
I will try again with your test rig....earlier on I thought it was my PC playing up...until I saw Jeff's post.
think I will reboot and start over.

You've probably got the standard 1 million row tally table already set up. Ki's script includes tally table creation, 4,000 rows IIRC. Restrict your own tally table to this amount.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 7

J Livingston SQL (10/28/2011)
Ok...got the test data now (dev machince at work 🙂 )
have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

I'm seeing over a minute in dev - which is way too long. Based on prior experience, I'd expect that to drop about 50% in production for this type of query. I don't have a specific target in mind, simply because I don't know what I should consider acceptable for this. My definition of "fast enough" tends to be a combination of typical user response when they run a report (which this query will end up feeding) and have to wait for the report to generate, and trying to head that off before it becomes a problem.

And, of course, I want to avoid overly long locks on these tables, because this is being run against a fairly active OLTP database.

I know that's not the world's best hard target number. My apologies.

-Ki

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 8

J Livingston SQL (10/28/2011)
Ok...got the test data now (dev machince at work 🙂 )
have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

I'm using my 11001 row "standard" zero-based Tally Table. Besides, it looks like there are limits everywhere in the code in the WHERE clauses. I'll have to take a closer look when I get home.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 9

Jeff Moden (10/28/2011)
J Livingston SQL (10/28/2011)
Ok...got the test data now (dev machince at work 🙂 )
have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?
I'm using my 11001 row "standard" zero-based Tally Table. Besides, it looks like there are limits everywhere in the code in the WHERE clauses. I'll have to take a closer look when I get home.

Nope - my bad there. Because I built this in my Sandbox db, I also built the tally table there rather than using my "big" one. So it's assuming 4,000 rows in the Tally table - I didn't limit that portion of the code.

Edited - I changed this to restrict to using 4,000 rows, regardless of how big the Tally table is.

-Ki

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 10

Kiara (10/28/2011)
J Livingston SQL (10/28/2011)
Ok...got the test data now (dev machince at work 🙂 )
have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?
I'm seeing over a minute in dev - which is way too long. Based on prior experience, I'd expect that to drop about 50% in production for this type of query. I don't have a specific target in mind, simply because I don't know what I should consider acceptable for this. My definition of "fast enough" tends to be a combination of typical user response when they run a report (which this query will end up feeding) and have to wait for the report to generate, and trying to head that off before it becomes a problem.
And, of course, I want to avoid overly long locks on these tables, because this is being run against a fairly active OLTP database.
I know that's not the world's best hard target number. My apologies.

Ok...using your test rig of nearly 15M rows and your initial posted query

SELECT DailyHours.empid ,

DailyHours.weekEnding ,

RegHours = SUM(DailyHours.RegHrs) ,

OT1Hours = SUM(DailyHours.OT1Hours),

OT2Hours = SUM(DailyHours.OT2Hours)

FROM ( SELECT empid ,

weekEnding ,

RegHrs ,

OT1Hours ,

OT2Hours

FROM ( SELECT h.empid ,

h.weekEnding ,

d.day1_reg ,

d.day1_ot1 ,

d.day1_ot2 ,

d.day2_reg ,

d.day2_ot1 ,

d.day2_ot2 ,

d.day3_reg ,

d.day3_ot1 ,

d.day3_ot2 ,

d.day4_reg ,

d.day4_ot1 ,

d.day4_ot2 ,

d.day5_reg ,

d.day5_ot1 ,

d.day5_ot2 ,

d.day6_reg ,

d.day6_ot1 ,

d.day6_ot2 ,

d.day7_reg ,

d.day7_ot1 ,

d.day7_ot2

FROM dbo.TimeDetail d

INNER JOIN dbo.TimeHeader h ON d.docnbr = h.docnbr

WHERE h.tc_status = 'P'

AND h.weekEnding between '2010-01-01' AND '2010-10-27'

) hrs

CROSS APPLY ( SELECT day1_reg ,

day1_ot1 ,

day1_ot2

UNION ALL

SELECT day2_reg ,

day2_ot1 ,

day2_ot2

UNION ALL

SELECT day3_reg ,

day3_ot1 ,

day3_ot2

UNION ALL

SELECT day4_reg ,

day4_ot1 ,

day4_ot2

UNION ALL

SELECT day5_reg ,

day5_ot1 ,

day5_ot2

UNION ALL

SELECT day6_reg ,

day6_ot1 ,

day6_ot2

UNION ALL

SELECT day7_reg ,

day7_ot1 ,

day7_ot2

) h ( RegHrs, OT1Hours, OT2Hours )

) DailyHours

GROUP BY DailyHours.empid ,

DailyHours.weekEnding;

I am getting 172,000 result rows in under 7 secs returned in SSMS.

no changes to your code/test rig

If I use Chris's query am getting slightly faster results by around a second.....

but my real question is that you say you need this in "real time"....but with 170k reults what is anyone going to do in real time that will change on under 10secs query refresh?

...maybe I have lost the plot :w00t:

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 11

J Livingston SQL (10/28/2011)
but my real question is that you say you need this in "real time"....but with 170k reults what is anyone going to do in real time that will change on under 10secs query refresh?
...maybe I have lost the plot :w00t:

Ah - "real time" as opposed to "data warehouse time", eg no time lag at all between committing data and being able to report on it.

7 seconds, huh? Now I'm starting to wonder about something other than the query itself and data size being the reason I'm seeing the times I'm seeing.

Thanks for the reality check. I have some digging to do.

-Ki

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 12

Gah... Now that I've had time to look at it, I see that my problem with generating the test data was that not all usages of the Tally Table were constrained by a WHERE clause.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 13

Jeff Moden (10/28/2011)
Gah... Now that I've had time to look at it, I see that my problem with generating the test data was that not all usages of the Tally Table were constrained by a WHERE clause.

Yup. That's why I apologized and changed the test script.

Sorry about that!

-Ki

andersg98 SSCertifiable Points: 5180 More actions · Answer 14

I would suggest that you do not run this kind of reporting against a busy OLTP database. Use replication to copy the relevant tables to a reporting database and do the report query there. Much cleaner and you will get less contention for resources. To make it even faster you could then load the replicated database into a set of pre-aggregated tables then union them to the most recent data as it comes in.

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 15

andersg98 (10/28/2011)
I would suggest that you do not run this kind of reporting against a busy OLTP database. Use replication to copy the relevant tables to a reporting database and do the report query there. Much cleaner and you will get less contention for resources. To make it even faster you could then load the replicated database into a set of pre-aggregated tables then union them to the most recent data as it comes in.

I know that - and I'd absolutely love to do things that way.

However.

I can't use traditional replication with the application that creates these tables.

We're in the middle of an infrastructure overhaul, and until that's finished, I won't have the hardware to support a log shipping replica.

So I'm one of those folks who gets to make what I have work for now. Trust me - I'll be changing the infrastructure as soon as I possibly can. In the meantime, I deal with the situation I've got in front of me.

-Ki