Tricky Aggregation

  • Hello

    I'm trying to figure out the magic SQL to produce a tricky aggregation. Here's what the source table looks like...

    CREATE TABLE [dbo].[Items](

    [ID] [int] NOT NULL, -- FK to another table

    [int] NOT NULL,

    [BEGIN] [varchar](5) NOT NULL,

    [END] [int] NOT NULL,

    [MULT1] [int] NOT NULL,

    [MULT2] [int] NOT NULL,

    [MULT3] [int] NOT NULL,

    [MULT4] [int] NOT NULL

    )

    Here's some sample data that I'm working with...

    ID CODE BEGIN END MULT1 MULT2 MULT3 MULT4

    1112340 19900501207812314500450045004500

    111239824820050701200609301000100010001000

    111239824820061001200612310000

    111239824820070101200703311000100010001000

    111239824820070401200709300000

    111239824820071001200803311000100010001000

    111239824820080401200809300000

    111239824820081001200903312000020000

    111239824820090401207812312000200020002000

    The BEGIN and END are the effective dates for when a CODE is valid. All of the MULTs for CODEs within an ID needed to be added up for the dates that overlap.

    So using the data above, here's what the results should look like...

    11123 19900501 20050700 4500 4500 4500 4500 (only CODE 40 is used)

    11123 20050701 20060930 5500 5500 5500 5500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20061001 20061231 4500 4500 4500 4500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20070101 20070331 5500 5500 5500 5500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20070401 20070930 4500 4500 4500 4500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20071001 20080331 5500 5500 5500 5500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20080401 20080930 4500 4500 4500 4500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20081001 20090331 6500 4500 6500 4500 (CODE 40 and CODE 98248 added for each MULT)

    11123 20090401 20781231 6500 6500 6500 6500 (CODE 40 and CODE 98248 added for each MULT)

    Any ideas on the SQL to produce this result?

    Thanks!

  • Please read and follow the instructions given in the article referenced in the first link in my signature.

    This will allow us to work on a solution instead of trying to set up some sample data to work with.

    Also, please include what you've tried so far and where you got stuck.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Something to start with.

    if object_id('tempdb..#Items')is not null drop table #Items

    CREATE TABLE #Items(

    [ID] [int] NOT NULL, -- FK to another table

    [int] NOT NULL,

    [BEGIN] [int] NOT NULL, -- I think this was not supposed to be varchar(5)

    [END] [int] NOT NULL,

    [MULT1] [int] NOT NULL,

    [MULT2] [int] NOT NULL,

    [MULT3] [int] NOT NULL,

    [MULT4] [int] NOT NULL

    )

    insert #items

    select '11123','40','19900501','20781231','4500','4500','4500','4500' union all

    select '11123','98248','20050701','20060930','1000','1000','1000','1000' union all

    select '11123','98248','20061001','20061231','0','0','0','0' union all

    select '11123','98248','20070101','20070331','1000','1000','1000','1000' union all

    select '11123','98248','20070401','20070930','0','0','0','0' union all

    select '11123','98248','20071001','20080331','1000','1000','1000','1000' union all

    select '11123','98248','20080401','20080930','0','0','0','0' union all

    select '11123','98248','20081001','20090331','2000','0','2000','0' union all

    select '11123','98248','20090401','20781231','2000','2000','2000','2000'

    select * from #items

  • OK, trying again...

    Here's some SQL to get everything setup

    declare @table table(ID int,

    CODE int,

    [BEGIN] int,

    [END] int,

    MULT1 int,

    MULT2 int,

    MULT3 int,

    MULT4 int)

    insert @table

    select 11123,40,19900501,20781231,4500,4500,4500,4500 union all

    select 11123,98248,20050701,20060930,1000,1000,1000,1000 union all

    select 11123,98248,20061001,20061231,0,0,0,0 union all

    select 11123,98248,20070101,20070331,1000,1000,1000,1000 union all

    select 11123,98248,20070401,20070930,0,0,0,0 union all

    select 11123,98248,20071001,20080331,1000,1000,1000,1000 union all

    select 11123,98248,20080401,20080930,0,0,0,0 union all

    select 11123,98248,20081001,20090331,2000,0,2000,0 union all

    select 11123,98248,20090401,20781231,2000,2000,2000,2000

    So far I've played with trying a kind of self-join using the IDs, CODEs, BEGINs and ENDs. But I'm getting too many rows (i.e. 16 instead of 9).

    Thanks!

  • Found some stuff online regarding overlapping or intersecting dates. May want to check that out. I'm sure you've been looking around. Below is what I came up with trying to do as much as I could without changing the table structure.

    if object_id('tempdb..#Items')is not null drop table #Items

    CREATE TABLE #Items(

    [ID] [int] NOT NULL, -- FK to another table

    [int] NOT NULL,

    [BEGIN] [int] NOT NULL, -- I think this was not supposed to be varchar(5)

    [END] [int] NOT NULL,

    [MULT1] [int] NOT NULL,

    [MULT2] [int] NOT NULL,

    [MULT3] [int] NOT NULL,

    [MULT4] [int] NOT NULL

    )

    insert #items

    select '11123','40','19900501','20781231','4500','4500','4500','4500' union all

    select '11123','98248','20050701','20060930','1000','1000','1000','1000' union all

    select '11123','98248','20061001','20061231','0','0','0','0' union all

    select '11123','98248','20070101','20070331','1000','1000','1000','1000' union all

    select '11123','98248','20070401','20070930','0','0','0','0' union all

    select '11123','98248','20071001','20080331','1000','1000','1000','1000' union all

    select '11123','98248','20080401','20080930','0','0','0','0' union all

    select '11123','98248','20081001','20090331','2000','0','2000','0' union all

    select '11123','98248','20090401','20781231','2000','2000','2000','2000'

    select (cast(a.id as varchar) + cast(a.code as varchar) + cast(a.[begin] as varchar) + cast(a.[end] as varchar)),*,(cast(b.id as varchar) + cast(b.code as varchar) + cast(b.[begin] as varchar) + cast(b.[end] as varchar))

    --select a.id,a.code,a.[begin],a.[end],a.mult1 + b.mult1,a.mult2 + b.mult2,a.mult3 + b.mult3,a.mult4 + b.mult4

    from #items a, #items b

    WHERE b.[end] >= a.[begin] and b.[begin] <= a.[end]

    and (cast(a.id as varchar) + cast(a.code as varchar) + cast(a.[begin] as varchar) + cast(a.[end] as varchar)) <>

    (cast(b.id as varchar) + cast(b.code as varchar) + cast(b.[begin] as varchar) + cast(b.[end] as varchar))

    order by a.[begin]

    Would you be able to add an identity column to your original table, if it helps? I have no idea if it will, just trying things out.

  • I've tried something similar without much luck. Thanks for trying!

    I'm now playing around with CTEs to see if that will work. Anybody else have any suggestions?

    Thanks!

  • cgreathouse (5/20/2010)


    OK, trying again...

    Here's some SQL to get everything setup

    declare @table table(ID int,

    CODE int,

    [BEGIN] int,

    [END] int,

    MULT1 int,

    MULT2 int,

    MULT3 int,

    MULT4 int)

    insert @table

    select 11123,40,19900501,20781231,4500,4500,4500,4500 union all

    select 11123,98248,20050701,20060930,1000,1000,1000,1000 union all

    select 11123,98248,20061001,20061231,0,0,0,0 union all

    select 11123,98248,20070101,20070331,1000,1000,1000,1000 union all

    select 11123,98248,20070401,20070930,0,0,0,0 union all

    select 11123,98248,20071001,20080331,1000,1000,1000,1000 union all

    select 11123,98248,20080401,20080930,0,0,0,0 union all

    select 11123,98248,20081001,20090331,2000,0,2000,0 union all

    select 11123,98248,20090401,20781231,2000,2000,2000,2000

    So far I've played with trying a kind of self-join using the IDs, CODEs, BEGINs and ENDs. But I'm getting too many rows (i.e. 16 instead of 9).

    Thanks!

    Try this

    SELECT a.ID,a.[BEGIN],a.[END],

    SUM(b.MULT1) AS MULT1,

    SUM(b.MULT2) AS MULT2,

    SUM(b.MULT3) AS MULT3,

    SUM(b.MULT4) AS MULT4

    FROM @table a

    INNER JOIN @table b ON b.ID=a.ID

    AND b.[BEGIN]<=a.[BEGIN] AND b.[END]>=a.[END]

    GROUP BY a.ID,a.[BEGIN],a.[END]

    ORDER BY a.ID,a.[BEGIN],a.[END]

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • cgreathouse (5/21/2010)


    I've tried something similar without much luck. Thanks for trying!

    I'm now playing around with CTEs to see if that will work. Anybody else have any suggestions?

    Thanks!

    Yep, if you can change your date columns to a DATE type.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • That's excellent!!! Thank you!

  • Oops, guess I spoke too soon. It's close but there's one more thing to take care of.

    The previous solution will give these results

    IDBEGINENDMULT1MULT2MULT3MULT4

    1112319900501207812314500450045004500 -- (Code 40)

    1112320050701200609305500550055005500 -- (Code 40 & 98248)

    1112320061001200612314500450045004500 -- (Code 40 & 98248)

    1112320070101200703315500550055005500 -- (Code 40 & 98248)

    1112320070401200709304500450045004500 -- (Code 40 & 98248)

    1112320071001200803315500550055005500 -- (Code 40 & 98248)

    1112320080401200809304500450045004500 -- (Code 40 & 98248)

    1112320081001200903316500450065004500 -- (Code 40 & 98248)

    1112320090401207812316500650065006500 -- (Code 40 & 98248)

    The first row needs to look like this (different END value)...

    IDBEGINENDMULT1MULT2MULT3MULT4

    1112319900501200506304500450045004500 -- (Code 40)

    The reason is the second row's BEGIN is 20050701. The BEGIN and END intervals for a given ID can't overlap.

    Any thoughts?

    Thanks!

  • I would suggest you first copy your data into another temp table with sorting out overlapping date ranges ad then proceed with the query.

    But if you are lazzy you can try this:

    ;with nextMinBegin

    as

    (select i.ID, MIN(i.[BEGIN]) as MinBegin

    from #items i

    join (select m.ID, MIN(m.[BEGIN]) rmBegin

    from #items m

    group by m.ID

    ) rm on rm.ID = i.ID

    where i.[BEGIN] > rm.rmBegin

    group by i.ID

    )

    SELECT a.ID,a.[BEGIN]

    ,CASE WHEN a.[BEGIN] < nmb.MinBegin AND a.[END] > nmb.MinBegin

    THEN cast(convert(varchar,cast(cast(nmb.MinBegin as varchar(30)) as datetime)- 1,112) as int)

    ELSE a.[END] END as [END],

    SUM(b.MULT1) AS MULT1,

    SUM(b.MULT2) AS MULT2,

    SUM(b.MULT3) AS MULT3,

    SUM(b.MULT4) AS MULT4

    FROM #items a

    INNER JOIN #items b ON b.ID=a.ID

    AND b.[BEGIN]<=a.[BEGIN] AND b.[END]>=a.[END]

    LEFT JOIN nextMinBegin nmb ON nmb.ID = a.ID

    GROUP BY a.ID,a.[BEGIN],CASE WHEN a.[BEGIN] < nmb.MinBegin AND a.[END] > nmb.MinBegin

    THEN cast(convert(varchar,cast(cast(nmb.MinBegin as varchar(30)) as datetime)- 1,112) as int)

    ELSE a.[END] END

    ORDER BY a.ID,a.[BEGIN],CASE WHEN a.[BEGIN] < nmb.MinBegin AND a.[END] > nmb.MinBegin

    THEN cast(convert(varchar,cast(cast(nmb.MinBegin as varchar(30)) as datetime)- 1,112) as int)

    ELSE a.[END] END

    I hate the above code...:Whistling:

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I'm glad someone else came up with that. I was going in an entirely different direction. I ended up hooking this into a date table and was trying to pull the values after tying each one to a specific day. Although I was close, I think I'm going to wait to see if that meets the specs.

  • This looks pretty good! I'll test it out on the real dataset and see how it goes. I did simplify the code a little. Here's what I'm trying out...

    ;with nextMinBegin

    as

    (select i.ID, MIN(i.[BEGIN]) as MinBegin

    from @table i

    join ( select m.ID, MIN(m.[BEGIN]) rmBegin

    from @table m

    group by m.ID

    ) rm on rm.ID = i.ID

    where i.[BEGIN] > rm.rmBegin

    group by i.ID

    )

    SELECT a.ID,a.[BEGIN]

    ,CASE WHEN a.[BEGIN] < nmb.MinBegin AND a.[END] > nmb.MinBegin

    THEN cast(convert(varchar,cast(cast(nmb.MinBegin as varchar(30)) as datetime)- 1,112) as int)

    ELSE a.[END] END as [END],

    SUM(b.MULT1) AS MULT1,

    SUM(b.MULT2) AS MULT2,

    SUM(b.MULT3) AS MULT3,

    SUM(b.MULT4) AS MULT4

    FROM @table a

    INNER JOIN @table b ON b.ID=a.ID

    AND b.[BEGIN]<=a.[BEGIN] AND b.[END]>=a.[END]

    LEFT JOIN nextMinBegin nmb ON nmb.ID = a.ID

    GROUP BY a.ID,a.[BEGIN], nmb.MinBegin, a.[end]

    I don't need the ORDER BY and I made the GROUP BY a little simpler. I'll report back...

    Thanks!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply