Code help, may be can be acheived through CTE

  • I have my records like this:

    IDDateP_IDP_UUID CM

    24842011-07-27 00:00:00.00036926349348432436

    24842011-08-08 00:00:00.00036926349348613149

    24842011-08-10 00:00:00.00036926349348613151

    27262011-07-29 00:00:00.0003406026373248395973

    36292011-07-26 00:00:00.0004982026462548392343

    36292011-08-01 00:00:00.00049926462548419374

    36292011-08-04 00:00:00.0004982026462548535510

    36292011-08-15 00:00:00.0004982026462548590566

    And I want results like this:

    IDMin Date Max_DateP_IDP_UUID CM

    24842011-07-272099-12-3136926349348613151 – Since the P_ID is same, though ON different day

    27262011-07-292099-12-313406026373248395973

    36292011-07-262011-07-314982026462548392343

    36292011-08-012011-08-03 49926462548419374

    36292011-08-042099-12-31 4982026462548590566 – Since P_ID is same, it is flattened out

    I know, through cursor we can achieve this…Any other ways?

    Thanks for your help and feedback..

    -VG

  • VGish (9/26/2011)


    I have my records like this:

    IDDateP_IDP_UUID CM

    24842011-07-27 00:00:00.00036926349348432436

    24842011-08-08 00:00:00.00036926349348613149

    24842011-08-10 00:00:00.00036926349348613151

    27262011-07-29 00:00:00.0003406026373248395973

    36292011-07-26 00:00:00.0004982026462548392343

    36292011-08-01 00:00:00.00049926462548419374

    36292011-08-04 00:00:00.0004982026462548535510

    36292011-08-15 00:00:00.0004982026462548590566

    And I want results like this:

    IDMin Date Max_DateP_IDP_UUID CM

    24842011-07-272099-12-3136926349348613151 – Since the P_ID is same, though ON different day

    27262011-07-292099-12-313406026373248395973

    36292011-07-262011-07-314982026462548392343

    36292011-08-012011-08-03 49926462548419374

    36292011-08-042099-12-31 4982026462548590566 – Since P_ID is same, it is flattened out

    I know, through cursor we can achieve this…Any other ways?

    Thanks for your help and feedback..

    -VG

    Yes, data islanding, date fill methodologies, and grouping techniques... and probably a few windows functions.

    How many rows are you looking at needing to process?

    So I understand your business rules:

    1) Locate each continuous section of P_ID as per the DATE column as your ordering.

    2) Collapse those rows into a begin/end. If it is the last record for the ID, use 12/31/2099 as the ending date. Otherwise, subtract one from the beginning date of the next section and use that as the tail date.

    3) for each section, min/max the P_UUID and the CM fields.

    That sound about right? Any chance we could get a consumable dataset in a temp table to play with?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • VG...

    For future posts, please consider the methods for posting "readily consumable" data which are taught at the first link in my signature line below. Since you're relatively new here, I'll do the honors the first time. The following code uses a slightly different method for posting "readily consumable" data and I took the liberty of adding a UNIQUE CLUSTERED INDEX which might actually serve as a composite PK for this table...

    --===== Create the test data. This is NOT a part of the solution

    SELECT ID,Date = CAST(Date AS DATETIME),P_ID,P_UUID,CM

    INTO dbo.TestTable

    FROM (

    SELECT 2484,'2011-07-27',369,263493,48432436 UNION ALL

    SELECT 2484,'2011-08-08',369,263493,48613149 UNION ALL

    SELECT 2484,'2011-08-10',369,263493,48613151 UNION ALL

    SELECT 2726,'2011-07-29',34060,263732,48395973 UNION ALL

    SELECT 3629,'2011-07-26',49820,264625,48392343 UNION ALL

    SELECT 3629,'2011-08-01',499,264625,48419374 UNION ALL

    SELECT 3629,'2011-08-04',49820,264625,48535510 UNION ALL

    SELECT 3629,'2011-08-15',49820,264625,48590566

    ) d (ID, Date, P_ID, P_UUID, CM)

    ;

    CREATE UNIQUE CLUSTERED INDEX IX_TestTable_Composite01

    ON dbo.TestTable (ID, Date, P_ID)

    ;

    Here's one possible solution for solving your problem. The key is to recognize that you have to create "temporal groups" based on the ID and P_ID according to date. The rest is pretty straight forward and is documented in the comments in the code below...

    [font="Arial Black"]{EDIT} Although the code below works for this particular set of data, I found an "opening" where an incorrect sort could occur. Please see the corrected code in a separate post further down in this thread. Thanks.[/font]--===== This is one possible solution

    WITH

    cteEndDate AS

    ( --=== Calculate the "end date" according to the rules given

    -- and calculate "temporal groups" of ID's

    SELECT GroupNum = ROW_NUMBER() OVER (ORDER BY ID,Date)

    - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID),

    EndDate = ISNULL(( --=== NextDate-1 or NULL if no next date

    SELECT MIN(gr2.Date)-1

    FROM dbo.TestTable gr2

    WHERE gr2.ID = gr1.ID

    AND gr2.Date > gr1.Date)

    ,'2099-12-31'),

    ID, Date, P_ID, P_UUID, CM

    FROM dbo.TestTable gr1

    ) --=== Then we just need to find the min and max dates according to the groups

    SELECT ID = MAX(ID),

    MinDate = MIN(Date),

    MaxDate = MAX(EndDate),

    P_ID = MAX(P_ID),

    P_UUID = MAX(P_UUID),

    CM = MAX(CM)

    FROM cteEndDate

    GROUP BY GroupNum

    ORDER BY ID, MinDate

    ;

    My apologies...I didn't have the time to make my normal "million row test" so the code might not be as optimized as it could be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    This works perfectly like what I wanted...But, I didn’t understand this part …Why are we doing this?

    ROW_NUMBER() OVER (ORDER BY ID,Date) - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID)

    THanks,

    VG

  • Actually, I made a small mistake. While the code works perfectly for the data given, I left an "opening" for an incorrect sort. Please use the following code, instead (test data creation code and solution code combined just to make life a little simpler :-))...

    --===== Create the test data. This is NOT a part of the solution

    SELECT ID,Date = CAST(Date AS DATETIME),P_ID,P_UUID,CM

    INTO dbo.TestTable

    FROM (

    SELECT 2484,'2011-07-27',369,263493,48432436 UNION ALL

    SELECT 2484,'2011-08-08',369,263493,48613149 UNION ALL

    SELECT 2484,'2011-08-10',369,263493,48613151 UNION ALL

    SELECT 2726,'2011-07-29',34060,263732,48395973 UNION ALL

    SELECT 3629,'2011-07-26',49820,264625,48392343 UNION ALL

    SELECT 3629,'2011-08-01',499,264625,48419374 UNION ALL

    SELECT 3629,'2011-08-04',49820,264625,48535510 UNION ALL

    SELECT 3629,'2011-08-15',49820,264625,48590566

    ) d (ID, Date, P_ID, P_UUID, CM)

    ;

    CREATE UNIQUE CLUSTERED INDEX IX_TestTable_Composite01

    ON dbo.TestTable (ID, Date, P_ID)

    ;

    --===== This is one possible solution

    WITH

    cteEndDate AS

    ( --=== Calculate the "end date" according to the rules given

    -- and calculate "temporal groups" of ID's

    SELECT GroupNum = ROW_NUMBER() OVER (ORDER BY ID,Date) --DateNum

    - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID,Date), --PidNum

    EndDate = ISNULL(( --=== NextDate-1 or NULL if no next date

    SELECT MIN(gr2.Date)-1

    FROM dbo.TestTable gr2

    WHERE gr2.ID = gr1.ID

    AND gr2.Date > gr1.Date)

    ,'2099-12-31'),

    ID, Date, P_ID, P_UUID, CM

    FROM dbo.TestTable gr1

    ) --=== Then we just need to find the min and max dates according to the groups

    SELECT ID = MAX(ID),

    MinDate = MIN(Date),

    MaxDate = MAX(EndDate),

    P_ID = MAX(P_ID),

    P_UUID = MAX(P_UUID),

    CM = MAX(CM)

    FROM cteEndDate

    GROUP BY ID,P_ID,GroupNum

    ORDER BY ID, MinDate

    ;

    I got home kind of late tonight... I'll try to find the time to explain how the "magic" row numbers work to solve this problem tomorrow night.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • VGish (9/27/2011)


    Hi Jeff,

    This works perfectly like what I wanted...But, I didn’t understand this part …Why are we doing this?

    ROW_NUMBER() OVER (ORDER BY ID,Date) - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID)

    THanks,

    VG

    Take a look at the corrected code... Notice the name of the column becomes GroupNum.

    Now, look at the GROUP BY in the code... I'll try to explain more tomorrow night.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry folks... I've been burning some midnight oil at the job and the answer to the question of how this works is a bit of a long one. To be real brief, the first ROW_NUMBER creates a simple list of sequential numbers in order by date. The second ROW_NUMBER creates a slightly more complicated list of sequential numbers for each ID and P_ID ordered by Date.

    The difference between the two sets of sequenctial numbers forms a group number for each "set" of contiguous ID's and P_ID's. The rest is simple... we find the min and max of items by ID, P_ID, and GroupNum.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply