Help with performance - aggregating a lot of data

  • Jeff Moden (10/28/2011)


    Nope. Good to go here. My only other question is how many rows does your test data jig create? I started it this morning in TempDB and stopped it when TempDb had grown to 10GB.

    hmmm...had similar issues, had to crash out after 20mins....PC came to a standstill 🙂

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (10/28/2011)


    Jeff Moden (10/28/2011)


    Nope. Good to go here. My only other question is how many rows does your test data jig create? I started it this morning in TempDB and stopped it when TempDb had grown to 10GB.

    hmmm...had similar issues, had to crash out after 20mins....PC came to a standstill 🙂

    Ok, now I'm going to drop test tables and recreate them, because it didn't take all that long to run... *curious*

    -Ki

  • I just dropped the tables and rebuilt them using that code, and it took just under 5 minutes to run. I was starting to wonder if I'd accidentally posted code that wouldn't work!

    I'm not wedded to that test data - I just wanted to come with with a way to generate *enough* test data that was short and could post here easily.

    -Ki

  • Hi Ki

    I will try again with your test rig....earlier on I thought it was my PC playing up...until I saw Jeff's post.

    think I will reboot and start over.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Ok...got the test data now (dev machince at work 🙂 )

    have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (10/28/2011)


    Hi Ki

    I will try again with your test rig....earlier on I thought it was my PC playing up...until I saw Jeff's post.

    think I will reboot and start over.

    You've probably got the standard 1 million row tally table already set up. Ki's script includes tally table creation, 4,000 rows IIRC. Restrict your own tally table to this amount.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • J Livingston SQL (10/28/2011)


    Ok...got the test data now (dev machince at work 🙂 )

    have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

    I'm seeing over a minute in dev - which is way too long. Based on prior experience, I'd expect that to drop about 50% in production for this type of query. I don't have a specific target in mind, simply because I don't know what I should consider acceptable for this. My definition of "fast enough" tends to be a combination of typical user response when they run a report (which this query will end up feeding) and have to wait for the report to generate, and trying to head that off before it becomes a problem.

    And, of course, I want to avoid overly long locks on these tables, because this is being run against a fairly active OLTP database.

    I know that's not the world's best hard target number. My apologies.

    -Ki

  • J Livingston SQL (10/28/2011)


    Ok...got the test data now (dev machince at work 🙂 )

    have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

    I'm using my 11001 row "standard" zero-based Tally Table. Besides, it looks like there are limits everywhere in the code in the WHERE clauses. I'll have to take a closer look when I get home.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/28/2011)


    J Livingston SQL (10/28/2011)


    Ok...got the test data now (dev machince at work 🙂 )

    have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

    I'm using my 11001 row "standard" zero-based Tally Table. Besides, it looks like there are limits everywhere in the code in the WHERE clauses. I'll have to take a closer look when I get home.

    Nope - my bad there. Because I built this in my Sandbox db, I also built the tally table there rather than using my "big" one. So it's assuming 4,000 rows in the Tally table - I didn't limit that portion of the code.

    Edited - I changed this to restrict to using 4,000 rows, regardless of how big the Tally table is.

    -Ki

  • Kiara (10/28/2011)


    J Livingston SQL (10/28/2011)


    Ok...got the test data now (dev machince at work 🙂 )

    have you any performance figs...you say your query is "slow"...based on the test data what response times are you looking for?

    I'm seeing over a minute in dev - which is way too long. Based on prior experience, I'd expect that to drop about 50% in production for this type of query. I don't have a specific target in mind, simply because I don't know what I should consider acceptable for this. My definition of "fast enough" tends to be a combination of typical user response when they run a report (which this query will end up feeding) and have to wait for the report to generate, and trying to head that off before it becomes a problem.

    And, of course, I want to avoid overly long locks on these tables, because this is being run against a fairly active OLTP database.

    I know that's not the world's best hard target number. My apologies.

    Ok...using your test rig of nearly 15M rows and your initial posted query

    SELECT DailyHours.empid ,

    DailyHours.weekEnding ,

    RegHours = SUM(DailyHours.RegHrs) ,

    OT1Hours = SUM(DailyHours.OT1Hours),

    OT2Hours = SUM(DailyHours.OT2Hours)

    FROM ( SELECT empid ,

    weekEnding ,

    RegHrs ,

    OT1Hours ,

    OT2Hours

    FROM ( SELECT h.empid ,

    h.weekEnding ,

    d.day1_reg ,

    d.day1_ot1 ,

    d.day1_ot2 ,

    d.day2_reg ,

    d.day2_ot1 ,

    d.day2_ot2 ,

    d.day3_reg ,

    d.day3_ot1 ,

    d.day3_ot2 ,

    d.day4_reg ,

    d.day4_ot1 ,

    d.day4_ot2 ,

    d.day5_reg ,

    d.day5_ot1 ,

    d.day5_ot2 ,

    d.day6_reg ,

    d.day6_ot1 ,

    d.day6_ot2 ,

    d.day7_reg ,

    d.day7_ot1 ,

    d.day7_ot2

    FROM dbo.TimeDetail d

    INNER JOIN dbo.TimeHeader h ON d.docnbr = h.docnbr

    WHERE h.tc_status = 'P'

    AND h.weekEnding between '2010-01-01' AND '2010-10-27'

    ) hrs

    CROSS APPLY ( SELECT day1_reg ,

    day1_ot1 ,

    day1_ot2

    UNION ALL

    SELECT day2_reg ,

    day2_ot1 ,

    day2_ot2

    UNION ALL

    SELECT day3_reg ,

    day3_ot1 ,

    day3_ot2

    UNION ALL

    SELECT day4_reg ,

    day4_ot1 ,

    day4_ot2

    UNION ALL

    SELECT day5_reg ,

    day5_ot1 ,

    day5_ot2

    UNION ALL

    SELECT day6_reg ,

    day6_ot1 ,

    day6_ot2

    UNION ALL

    SELECT day7_reg ,

    day7_ot1 ,

    day7_ot2

    ) h ( RegHrs, OT1Hours, OT2Hours )

    ) DailyHours

    GROUP BY DailyHours.empid ,

    DailyHours.weekEnding;

    I am getting 172,000 result rows in under 7 secs returned in SSMS.

    no changes to your code/test rig

    If I use Chris's query am getting slightly faster results by around a second.....

    but my real question is that you say you need this in "real time"....but with 170k reults what is anyone going to do in real time that will change on under 10secs query refresh?

    ...maybe I have lost the plot :w00t:

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (10/28/2011)


    but my real question is that you say you need this in "real time"....but with 170k reults what is anyone going to do in real time that will change on under 10secs query refresh?

    ...maybe I have lost the plot :w00t:

    Ah - "real time" as opposed to "data warehouse time", eg no time lag at all between committing data and being able to report on it.

    7 seconds, huh? Now I'm starting to wonder about something other than the query itself and data size being the reason I'm seeing the times I'm seeing.

    Thanks for the reality check. I have some digging to do.

    -Ki

  • Gah... Now that I've had time to look at it, I see that my problem with generating the test data was that not all usages of the Tally Table were constrained by a WHERE clause.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/28/2011)


    Gah... Now that I've had time to look at it, I see that my problem with generating the test data was that not all usages of the Tally Table were constrained by a WHERE clause.

    Yup. That's why I apologized and changed the test script.

    Sorry about that!

    -Ki

  • I would suggest that you do not run this kind of reporting against a busy OLTP database. Use replication to copy the relevant tables to a reporting database and do the report query there. Much cleaner and you will get less contention for resources. To make it even faster you could then load the replicated database into a set of pre-aggregated tables then union them to the most recent data as it comes in.

  • andersg98 (10/28/2011)


    I would suggest that you do not run this kind of reporting against a busy OLTP database. Use replication to copy the relevant tables to a reporting database and do the report query there. Much cleaner and you will get less contention for resources. To make it even faster you could then load the replicated database into a set of pre-aggregated tables then union them to the most recent data as it comes in.

    I know that - and I'd absolutely love to do things that way.

    However.

    I can't use traditional replication with the application that creates these tables.

    We're in the middle of an infrastructure overhaul, and until that's finished, I won't have the hardware to support a log shipping replica.

    So I'm one of those folks who gets to make what I have work for now. Trust me - I'll be changing the infrastructure as soon as I possibly can. In the meantime, I deal with the situation I've got in front of me.

    -Ki

Viewing 15 posts - 16 through 30 (of 53 total)

You must be logged in to reply to this topic. Login to reply