Help with performance - aggregating a lot of data

  • Kiara,

    Any interest in a bit of table redesign that will greatly improve performance for this particular query? One that will give the reporting performance you seek without interfering with the OLTP nature of these two tables?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's a few preliminary results from queries posted so far, and another which I'm guessing is Jeff's preaggregate. The preaggregate works fastest but not by a fat lot. I've not yet tested the alternative cluster of the child table - it's taking a while...

    SET STATISTICS TIME,io OFF;

    PRINT '===== Query 1 add the aggregates ================================================================================='

    SET STATISTICS TIME ON;

    DECLARE @Nothing INT

    SELECT @Nothing = empid

    FROM (

    SELECT

    a.empid,

    a.weekEnding ,

    reg = (a.day1_reg + a.day2_reg + a.day3_reg + a.day4_reg + a.day5_reg + a.day6_reg + a.day7_reg),

    ot1 = (a.day1_ot1 + a.day2_ot1 + a.day3_ot1 + a.day4_ot1 + a.day5_ot1 + a.day6_ot1 + a.day7_ot1),

    ot2 = (a.day1_ot2 + a.day2_ot2 + a.day3_ot2 + a.day4_ot2 + a.day5_ot2 + a.day6_ot2 + a.day7_ot2)

    FROM (

    SELECT

    h.empid,

    h.weekEnding ,

    day1_reg = SUM(d.day1_reg), day2_reg = SUM(d.day2_reg), day3_reg = SUM(d.day3_reg), day4_reg = SUM(d.day4_reg),

    day5_reg = SUM(d.day5_reg), day6_reg = SUM(d.day6_reg), day7_reg = SUM(d.day7_reg),

    day1_ot1 = SUM(d.day1_ot1), day2_ot1 = SUM(d.day2_ot1), day3_ot1 = SUM(d.day3_ot1), day4_ot1 = SUM(d.day4_ot1),

    day5_ot1 = SUM(d.day5_ot1), day6_ot1 = SUM(d.day6_ot1), day7_ot1 = SUM(d.day7_ot1),

    day1_ot2 = SUM(d.day1_ot2), day2_ot2 = SUM(d.day2_ot2), day3_ot2 = SUM(d.day3_ot2), day4_ot2 = SUM(d.day4_ot2),

    day5_ot2 = SUM(d.day5_ot2), day6_ot2 = SUM(d.day6_ot2), day7_ot2 = SUM(d.day7_ot2)

    FROM dbo.TimeDetail d (nolock)

    INNER JOIN dbo.TimeHeader h ON d.docnbr = h.docnbr

    WHERE h.tc_status = 'P'

    AND h.weekEnding between '2010-01-01' AND '2010-10-27'

    GROUP BY h.empid, h.weekEnding

    ) a

    ) n

    SET STATISTICS TIME OFF;

    PRINT '===== Query 2 aggregate the rows ================================================================================='

    SET STATISTICS TIME ON;

    SELECT @Nothing = empid

    FROM (SELECT

    h.empid,

    h.weekEnding ,

    reg = SUM(d.day1_reg + d.day2_reg + d.day3_reg + d.day4_reg + d.day5_reg + d.day6_reg + d.day7_reg),

    ot1 = SUM(d.day1_ot1 + d.day2_ot1 + d.day3_ot1 + d.day4_ot1 + d.day5_ot1 + d.day6_ot1 + d.day7_ot1),

    ot2 = SUM(d.day1_ot2 + d.day2_ot2 + d.day3_ot2 + d.day4_ot2 + d.day5_ot2 + d.day6_ot2 + d.day7_ot2)

    FROM dbo.TimeDetail d (nolock)

    INNER JOIN dbo.TimeHeader h ON d.docnbr = h.docnbr

    WHERE h.tc_status = 'P'

    AND h.weekEnding between '2010-01-01' AND '2010-10-27'

    GROUP BY h.empid, h.weekEnding

    ) n

    SET STATISTICS TIME OFF;

    PRINT '===== Query 3 preaggregate TimeDetail on docnbr ================================================================================='

    SET STATISTICS TIME ON;

    SELECT @Nothing = empid

    FROM (SELECT

    h.empid,

    h.weekEnding ,

    reg = SUM(a.day1_reg + a.day2_reg + a.day3_reg + a.day4_reg + a.day5_reg + a.day6_reg + a.day7_reg),

    ot1 = SUM(a.day1_ot1 + a.day2_ot1 + a.day3_ot1 + a.day4_ot1 + a.day5_ot1 + a.day6_ot1 + a.day7_ot1),

    ot2 = SUM(a.day1_ot2 + a.day2_ot2 + a.day3_ot2 + a.day4_ot2 + a.day5_ot2 + a.day6_ot2 + a.day7_ot2)

    FROM (

    SELECT

    d.docnbr,

    day1_reg = SUM(day1_reg), day2_reg = SUM(day2_reg), day3_reg = SUM(day3_reg), day4_reg = SUM(day4_reg),

    day5_reg = SUM(day5_reg), day6_reg = SUM(day6_reg), day7_reg = SUM(day7_reg),

    day1_ot1 = SUM(day1_ot1), day2_ot1 = SUM(day2_ot1), day3_ot1 = SUM(day3_ot1), day4_ot1 = SUM(day4_ot1),

    day5_ot1 = SUM(day5_ot1), day6_ot1 = SUM(day6_ot1), day7_ot1 = SUM(day7_ot1),

    day1_ot2 = SUM(day1_ot2), day2_ot2 = SUM(day2_ot2), day3_ot2 = SUM(day3_ot2), day4_ot2 = SUM(day4_ot2),

    day5_ot2 = SUM(day5_ot2), day6_ot2 = SUM(day6_ot2), day7_ot2 = SUM(day7_ot2)

    FROM dbo.TimeDetail d

    GROUP BY d.docnbr

    ) a

    INNER JOIN dbo.TimeHeader h ON h.docnbr = a.docnbr

    WHERE h.tc_status = 'P'

    AND h.weekEnding between '2010-01-01' AND '2010-10-27'

    GROUP BY h.empid, h.weekEnding

    ) n

    SET STATISTICS TIME OFF;

    PRINT '===== Query 4 preaggregate TimeDetail on docnbr with #temptable ================================================================================='

    SET STATISTICS TIME ON;

    IF OBJECT_ID('tempdb..#TimeDetail') IS NOT NULL

    DROP TABLE tempdb..#TimeDetail

    SELECT

    d.docnbr,

    day1_reg = SUM(day1_reg), day2_reg = SUM(day2_reg), day3_reg = SUM(day3_reg), day4_reg = SUM(day4_reg),

    day5_reg = SUM(day5_reg), day6_reg = SUM(day6_reg), day7_reg = SUM(day7_reg),

    day1_ot1 = SUM(day1_ot1), day2_ot1 = SUM(day2_ot1), day3_ot1 = SUM(day3_ot1), day4_ot1 = SUM(day4_ot1),

    day5_ot1 = SUM(day5_ot1), day6_ot1 = SUM(day6_ot1), day7_ot1 = SUM(day7_ot1),

    day1_ot2 = SUM(day1_ot2), day2_ot2 = SUM(day2_ot2), day3_ot2 = SUM(day3_ot2), day4_ot2 = SUM(day4_ot2),

    day5_ot2 = SUM(day5_ot2), day6_ot2 = SUM(day6_ot2), day7_ot2 = SUM(day7_ot2)

    INTO #TimeDetail

    FROM dbo.TimeDetail d

    GROUP BY d.docnbr

    CREATE UNIQUE CLUSTERED INDEX [CX_docnbr] ON #TimeDetail ([docnbr] ASC)

    SELECT @Nothing = empid

    FROM (SELECT

    h.empid,

    h.weekEnding ,

    reg = SUM(a.day1_reg + a.day2_reg + a.day3_reg + a.day4_reg + a.day5_reg + a.day6_reg + a.day7_reg),

    ot1 = SUM(a.day1_ot1 + a.day2_ot1 + a.day3_ot1 + a.day4_ot1 + a.day5_ot1 + a.day6_ot1 + a.day7_ot1),

    ot2 = SUM(a.day1_ot2 + a.day2_ot2 + a.day3_ot2 + a.day4_ot2 + a.day5_ot2 + a.day6_ot2 + a.day7_ot2)

    FROM #TimeDetail a

    INNER JOIN dbo.TimeHeader h ON h.docnbr = a.docnbr

    WHERE h.tc_status = 'P'

    AND h.weekEnding between '2010-01-01' AND '2010-10-27'

    GROUP BY h.empid, h.weekEnding

    ) n

    IF OBJECT_ID('tempdb..#TimeDetail') IS NOT NULL

    DROP TABLE tempdb..#TimeDetail

    SET STATISTICS TIME OFF;

    PRINT '===== Query 5 Original query ================================================================================='

    SET STATISTICS TIME ON;

    SELECT @Nothing = empid

    FROM (SELECT DailyHours.empid ,

    DailyHours.weekEnding ,

    RegHours = SUM(DailyHours.RegHrs) ,

    OT1Hours = SUM(DailyHours.OT1Hours),

    OT2Hours = SUM(DailyHours.OT2Hours)

    FROM ( SELECT empid ,

    weekEnding ,

    RegHrs ,

    OT1Hours ,

    OT2Hours

    FROM ( SELECT h.empid ,

    h.weekEnding ,

    d.day1_reg ,

    d.day1_ot1 ,

    d.day1_ot2 ,

    d.day2_reg ,

    d.day2_ot1 ,

    d.day2_ot2 ,

    d.day3_reg ,

    d.day3_ot1 ,

    d.day3_ot2 ,

    d.day4_reg ,

    d.day4_ot1 ,

    d.day4_ot2 ,

    d.day5_reg ,

    d.day5_ot1 ,

    d.day5_ot2 ,

    d.day6_reg ,

    d.day6_ot1 ,

    d.day6_ot2 ,

    d.day7_reg ,

    d.day7_ot1 ,

    d.day7_ot2

    FROM dbo.TimeDetail d

    INNER JOIN dbo.TimeHeader h ON d.docnbr = h.docnbr

    WHERE h.tc_status = 'P'

    AND h.weekEnding between '2010-01-01' AND '2010-10-27'

    ) hrs

    CROSS APPLY ( SELECT day1_reg ,

    day1_ot1 ,

    day1_ot2

    UNION ALL

    SELECT day2_reg ,

    day2_ot1 ,

    day2_ot2

    UNION ALL

    SELECT day3_reg ,

    day3_ot1 ,

    day3_ot2

    UNION ALL

    SELECT day4_reg ,

    day4_ot1 ,

    day4_ot2

    UNION ALL

    SELECT day5_reg ,

    day5_ot1 ,

    day5_ot2

    UNION ALL

    SELECT day6_reg ,

    day6_ot1 ,

    day6_ot2

    UNION ALL

    SELECT day7_reg ,

    day7_ot1 ,

    day7_ot2

    ) h ( RegHrs, OT1Hours, OT2Hours )

    ) DailyHours

    GROUP BY DailyHours.empid ,

    DailyHours.weekEnding

    ) n;

    SET STATISTICS TIME OFF;

    PRINT '===== finished ================================================================================='

    And the results:

    ===== Query 1 add the aggregates =================================================================================

    SQL Server Execution Times:

    CPU time = 12339 ms, elapsed time = 26314 ms.

    ===== Query 2 aggregate the rows =================================================================================

    SQL Server Execution Times:

    CPU time = 13838 ms, elapsed time = 32414 ms.

    ===== Query 3 preaggregate TimeDetail on docnbr =================================================================================

    SQL Server Execution Times:

    CPU time = 10967 ms, elapsed time = 25788 ms.

    ===== Query 4 preaggregate TimeDetail on docnbr with #temptable =================================================================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 71 ms.

    SQL Server Execution Times:

    CPU time = 26941 ms, elapsed time = 32355 ms.

    (624000 row(s) affected)

    SQL Server parse and compile time:

    CPU time = 187 ms, elapsed time = 363 ms.

    SQL Server Execution Times:

    CPU time = 2933 ms, elapsed time = 3211 ms.

    SQL Server Execution Times:

    CPU time = 3120 ms, elapsed time = 3650 ms.

    SQL Server Execution Times:

    CPU time = 1092 ms, elapsed time = 701 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 42 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 138 ms.

    ===== Query 5 Original query =================================================================================

    SQL Server Execution Times:

    CPU time = 32137 ms, elapsed time = 34796 ms.

    ===== finished =================================================================================

    ===== finished =================================================================================


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (10/30/2011)


    Here's a few preliminary results from queries posted so far, and another which I'm guessing is Jeff's preaggregate. The preaggregate works fastest but not by a fat lot.

    I ran into the same problem... that's why I'm suggesting a table redesign. The biggest problem here is that the detail table must be joined to the header table to find the detail rows in the date range... it's just not going to be fast for such a thing because of the Clustered Index on the Docno... it spreads the data all over hell's half acre and SQL Server correctly makes the assumption that a clustered index scan is going to provide the best performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/30/2011)


    ChrisM@home (10/30/2011)


    Here's a few preliminary results from queries posted so far, and another which I'm guessing is Jeff's preaggregate. The preaggregate works fastest but not by a fat lot.

    I ran into the same problem... that's why I'm suggesting a table redesign. The biggest problem here is that the detail table must be joined to the header table to find the detail rows in the date range... it's just not going to be fast for such a thing because of the Clustered Index on the Docno... it spreads the data all over hell's half acre and SQL Server correctly makes the assumption that a clustered index scan is going to provide the best performance.

    That's what I figured too. You get a clustered index seek if you narrow down the date filter to about three months but the performance still sucks.

    I've dropped the clustered index and created a new one over empid and weekEnding. Let's put that another way. After adding the two columns to the detail table (slow) and updating from the header (fairly quick), then defragging which took hours, then running the scripts above, I've kicked off a reindex early this morning...maybe it will be finished by the time I get home :hehe:

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (10/29/2011)


    Kiara,

    Any interest in a bit of table redesign that will greatly improve performance for this particular query? One that will give the reporting performance you seek without interfering with the OLTP nature of these two tables?

    Jeff,

    I'd love to - but I can't touch the table structure in this particular case. The application that owns them will break if I do.

    -Ki

  • Jeff Moden (10/30/2011)


    ChrisM@home (10/30/2011)


    Here's a few preliminary results from queries posted so far, and another which I'm guessing is Jeff's preaggregate. The preaggregate works fastest but not by a fat lot.

    I ran into the same problem... that's why I'm suggesting a table redesign. The biggest problem here is that the detail table must be joined to the header table to find the detail rows in the date range... it's just not going to be fast for such a thing because of the Clustered Index on the Docno... it spreads the data all over hell's half acre and SQL Server correctly makes the assumption that a clustered index scan is going to provide the best performance.

    Thanks guys. You've confirmed what I was afraid of - given that I can't change the table structure here, running this query against the actual data tables will, in the long run, make me a very, very unhappy camper. I'd love to change the clustered index, but I'm pretty sure that will just give me problems in another part of system.

    Since the application responsible for these tables is going to be around for a long time, I think my best bet is going to be some variation on a very simple data warehouse along with a very carefully written trigger to grab the data as it changes.

    I really don't have the words to express how very much I appreciate all your assistance on this. Thanks so very, very much.

    -Ki

  • ChrisM@home (10/30/2011)


    Here's a few preliminary results from queries posted so far, and another which I'm guessing is Jeff's preaggregate. The preaggregate works fastest but not by a fat lot. I've not yet tested the alternative cluster of the child table - it's taking a while...

    And the results:

    ===== Query 1 add the aggregates =================================================================================

    SQL Server Execution Times:

    CPU time = 12339 ms, elapsed time = 26314 ms.

    ===== Query 2 aggregate the rows =================================================================================

    SQL Server Execution Times:

    CPU time = 13838 ms, elapsed time = 32414 ms.

    ===== Query 3 preaggregate TimeDetail on docnbr =================================================================================

    SQL Server Execution Times:

    CPU time = 10967 ms, elapsed time = 25788 ms.

    ===== Query 4 preaggregate TimeDetail on docnbr with #temptable =================================================================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 71 ms.

    SQL Server Execution Times:

    CPU time = 26941 ms, elapsed time = 32355 ms.

    (624000 row(s) affected)

    SQL Server parse and compile time:

    CPU time = 187 ms, elapsed time = 363 ms.

    SQL Server Execution Times:

    CPU time = 2933 ms, elapsed time = 3211 ms.

    SQL Server Execution Times:

    CPU time = 3120 ms, elapsed time = 3650 ms.

    SQL Server Execution Times:

    CPU time = 1092 ms, elapsed time = 701 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 42 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 138 ms.

    ===== Query 5 Original query =================================================================================

    SQL Server Execution Times:

    CPU time = 32137 ms, elapsed time = 34796 ms.

    ===== finished =================================================================================

    ===== finished =================================================================================

    Thanks, Chris. I'll play around with this structure and see how it behaves against the actual data set involved. I really, really appreciate all your assistance here.

    -Ki

  • Kiara (10/31/2011)


    Since the application responsible for these tables is going to be around for a long time, I think my best bet is going to be some variation on a very simple data warehouse along with a very carefully written trigger to grab the data as it changes.

    I'd say that the mini-datawarehouse is probably a good idea given your restrictions. I'm not real fond of triggers for this sort of thing, but I've had to use them as a last resort. This, alas, may be one of those situations.

    If your data is never deleted or updated after inserting, you could possibly set up a polling type Agent job to get the new records and pump them into your mini-warehouse.

    Todd Fifield

  • Kiara (10/31/2011)


    Jeff Moden (10/29/2011)


    Kiara,

    Any interest in a bit of table redesign that will greatly improve performance for this particular query? One that will give the reporting performance you seek without interfering with the OLTP nature of these two tables?

    Jeff,

    I'd love to - but I can't touch the table structure in this particular case. The application that owns them will break if I do.

    Not if you build a couple of "pass through" views of the same name as the tables and rename the tables. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/31/2011)


    Kiara (10/31/2011)


    Jeff Moden (10/29/2011)


    Kiara,

    Any interest in a bit of table redesign that will greatly improve performance for this particular query? One that will give the reporting performance you seek without interfering with the OLTP nature of these two tables?

    Jeff,

    I'd love to - but I can't touch the table structure in this particular case. The application that owns them will break if I do.

    Not if you build a couple of "pass through" views of the same name as the tables and rename the tables. 😉

    Jeff, you'd have to materialize those views to see any real benefit, afaik, and that might cause unsustainable transactional slowdowns.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (10/31/2011)


    Jeff Moden (10/31/2011)


    Kiara (10/31/2011)


    Jeff Moden (10/29/2011)


    Kiara,

    Any interest in a bit of table redesign that will greatly improve performance for this particular query? One that will give the reporting performance you seek without interfering with the OLTP nature of these two tables?

    Jeff,

    I'd love to - but I can't touch the table structure in this particular case. The application that owns them will break if I do.

    Not if you build a couple of "pass through" views of the same name as the tables and rename the tables. 😉

    Jeff, you'd have to materialize those views to see any real benefit, afaik, and that might cause unsustainable transactional slowdowns.

    You're thinking of the wrong type of view. Think of it more as a synonym with named columns. It's an age old trick for doing things like using BULK INSERT to import to a table that has more columns than the data without using a BCP Format File. It's nasty fast and the app will never know the difference.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/31/2011)


    Evil Kraig F (10/31/2011)


    Jeff Moden (10/31/2011)


    Kiara (10/31/2011)


    Jeff Moden (10/29/2011)


    Kiara,

    Any interest in a bit of table redesign that will greatly improve performance for this particular query? One that will give the reporting performance you seek without interfering with the OLTP nature of these two tables?

    Jeff,

    I'd love to - but I can't touch the table structure in this particular case. The application that owns them will break if I do.

    Not if you build a couple of "pass through" views of the same name as the tables and rename the tables. 😉

    Jeff, you'd have to materialize those views to see any real benefit, afaik, and that might cause unsustainable transactional slowdowns.

    You're thinking of the wrong type of view. Think of it more as a synonym with named columns. It's an age old trick for doing things like using BULK INSERT to import to a table that has more columns than the data without using a BCP Format File. It's nasty fast and the app will never know the difference.

    That's what I thought you meant at first, but I can't see how combining those into this will give performance gains when you still end up working against the base indexing. I'll be hanging out with the dust bunnies in the corner watching what you're coming up with. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (10/31/2011)


    That's what I thought you meant at first, but I can't see how combining those into this will give performance gains when you still end up working against the base indexing. I'll be hanging out with the dust bunnies in the corner watching what you're coming up with. 🙂

    I actually wasn't going to work on this anymore because everyone seems to think the passthrough views and the modified underlying tables would break the app. :hehe: I don't believe anything could be further from the truth (because I've done it before), but I can't (and won't) force someone try something when there's fear involved especially on such a large table.

    I believe the two tables for this problem have been "over engineered". These aren't "relational tables"... they're simple, historical, reporting tables. I was simply going to add the date, EmpID, and 3 calculated columns to the detail table and hide them from the app using a passthrough view with the same name as the original table. Of course, the name of the detail table would change and the app would do all its work through the passthrough view being none the wiser to the underlying change that was made. And, yes... there'd be a key change to the indexing... the date would become the new clustered index column.

    Once that was up an running, the next step would be to partition the table either using Partitioned Views (Standard Edition) or Partitioned Tables (Enterprise Edition) which would greatly relieve the amount of resources used for index maintenance and further enhance temporal query performance on these tables.

    Yeah... we'd probably couldn't actually combine the header and detail tables because of that bloody "line number" column and, yeah, we'd probably need a trigger on the header table to update the detail table with date and empID, but I believe that would be really, really small potatoes compared to the massive performance gains in temporal queries and nightly maintenance.

    As a side bar, it's one of the main reasons I don't "tibble" (tbl) my tables nor use any form of Hungarian Notation on any of my objects... I might want to build a pass through view from one of them or change a view to a table.

    Crud... like I don't have enough to do... I smell another article in here somewhere.:pinch:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/31/2011)


    I was simply going to add the date, EmpID, and 3 calculated columns to the detail table and hide them from the app using a passthrough view with the same name as the original table. Of course, the name of the detail table would change and the app would do all its work through the passthrough view being none the wiser to the underlying change that was made. And, yes... there'd be a key change to the indexing... the date would become the new clustered index column.

    Now I grok. I was going the other direction, expecting you to use a passthrough for the new process, not sidestepping app issues with it and changing the base schema. Vendor updates will break if you do that, because their code changes are going to come through with an ALTER TABLE script, not ALTER VIEW. Yes, you could reverse engineer it each time, but that is ugly. At least, I'm assuming this is vendor app, because of this statement:

    I'm dealing with source data where the table structure is fixed, and TPTB want this data real time, so moving to a data warehouse isn't an option at the moment.

    ... heh, so, sorry I wasn't following your chain of thought there. I shall blame it on ... umm... umm... Look, an elephant! *POOF*


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Jeff Moden (10/31/2011)


    Evil Kraig F (10/31/2011)


    That's what I thought you meant at first, but I can't see how combining those into this will give performance gains when you still end up working against the base indexing. I'll be hanging out with the dust bunnies in the corner watching what you're coming up with. 🙂

    I actually wasn't going to work on this anymore because everyone seems to think the passthrough views and the modified underlying tables would break the app. :hehe: I don't believe anything could be further from the truth (because I've done it before), but I can't (and won't) force someone try something when there's fear involved especially on such a large table.

    I believe the two tables for this problem have been "over engineered". These aren't "relational tables"... they're simple, historical, reporting tables. I was simply going to add the date, EmpID, and 3 calculated columns to the detail table and hide them from the app using a passthrough view with the same name as the original table. Of course, the name of the detail table would change and the app would do all its work through the passthrough view being none the wiser to the underlying change that was made. And, yes... there'd be a key change to the indexing... the date would become the new clustered index column.

    Once that was up an running, the next step would be to partition the table either using Partitioned Views (Standard Edition) or Partitioned Tables (Enterprise Edition) which would greatly relieve the amount of resources used for index maintenance and further enhance temporal query performance on these tables.

    Yeah... we'd probably couldn't actually combine the header and detail tables because of that bloody "line number" column and, yeah, we'd probably need a trigger on the header table to update the detail table with date and empID, but I believe that would be really, really small potatoes compared to the massive performance gains in temporal queries and nightly maintenance.

    As a side bar, it's one of the main reasons I don't "tibble" (tbl) my tables nor use any form of Hungarian Notation on any of my objects... I might want to build a pass through view from one of them or change a view to a table.

    Crud... like I don't have enough to do... I smell another article in here somewhere.:pinch:

    Oh, I'm game to experiment. I'm not afraid of my test environment in the slightest, and I won't move something into production unless I'm convinced it will work well.

    The "can't change the table structure" issue is that the app expects a very specific table structure. Any deviation from that and the application screens accessing those tables crash. But I only know for certain that happens when the object named, for example, "TimeDetail" changes structure. I've never tried to fool the application by giving it something *else* named "TimeDetail" to pull data from. So I get to learn something new in the process as well, whether the actual experiment works or not.

    I'll make some time tomorrow and see if I can make this work, based on your suggestions above. You've all already spent a lot of very much appreciated time helping me out on this.

    (Standard edition at the moment, unfortunately, although I may get to upgrade to Enterprise in the next few months. I'm getting new hardware under the hood here first, though.)

    -Ki

Viewing 15 posts - 31 through 45 (of 53 total)

You must be logged in to reply to this topic. Login to reply