Grouping islands of data

  • Hi all,

    Can anyone please help with grouping islands of data? I am revisiting an issue we wound up using a recursive CTE to solve previously because we weren't sure how to do it any other way. But now it's just way too slow. I had to cancel it after 30 minutes when trying to run a day's worth of data (~200k records).

    Here's some sample data:

    CREATE TABLE #t (ID int, filenm int, docnum int, pg char(1));

    GO

    INSERT INTO #t

    SELECT 140675885,072915107,1580,'D' UNION ALL

    SELECT 140675886,072915107,11,'D' UNION ALL

    SELECT 140675887,072915107,1580,'D' UNION ALL

    SELECT 140675888,072915107,109,'D' UNION ALL

    SELECT 140675889,072915107,1511,'D' UNION ALL

    SELECT 140675890,072915107,221,'D' UNION ALL

    SELECT 140675891,072915107,221,'F' UNION ALL

    SELECT 140675892,072915107,46,'D' UNION ALL

    SELECT 140675893,072915107,20,'D' UNION ALL

    SELECT 140675894,072915107,20,'F' UNION ALL

    SELECT 140675895,072915107,20,'F' UNION ALL

    SELECT 140675896,072915107,20,'F' UNION ALL

    SELECT 140675897,072915107,20,'F' UNION ALL

    SELECT 140675898,072915107,20,'F' UNION ALL

    SELECT 140675899,072915107,14,'D' UNION ALL

    SELECT 140675900,072915107,14,'F' UNION ALL

    SELECT 140675901,072915107,14,'F' UNION ALL

    SELECT 140675887,072915205,1580,'D' UNION ALL

    SELECT 140675887,072915205,1580,'F' UNION ALL

    SELECT 140675887,072915205,1580,'D'

    ;

    Essentially a 'pg' of D is the start of a new group or island, and any Fs following that D for the given filenm and docnum in ID order are part of that group.

    I have tried Row_number and Dense_rank. Dense_rank gets sort of close, but the fact that an island can include both D and F records really has me stumped. I have a feeling this should be easy, but I am just not seeing it.

    Attached is an example of how the sample data should be grouped.

    Let me know if there's anything I can clarify or provide additional information about and thank you in advance!

  • Are there supposed to be duplicates in the data? It seems that this row is repeated:

    SELECT 140675887, 072915205, 1580, 'D'

    Cheers!

    EDIT: Fixed a typo.

  • Apologies! No, that's an oversight on my part. Here is the corrected sample data and attached example result set:

    SELECT 140675885,072915107,1580,'D' UNION ALL

    SELECT 140675886,072915107,11,'D' UNION ALL

    SELECT 140675887,072915107,1580,'D' UNION ALL

    SELECT 140675888,072915107,109,'D' UNION ALL

    SELECT 140675889,072915107,1511,'D' UNION ALL

    SELECT 140675890,072915107,221,'D' UNION ALL

    SELECT 140675891,072915107,221,'F' UNION ALL

    SELECT 140675892,072915107,46,'D' UNION ALL

    SELECT 140675893,072915107,20,'D' UNION ALL

    SELECT 140675894,072915107,20,'F' UNION ALL

    SELECT 140675895,072915107,20,'F' UNION ALL

    SELECT 140675896,072915107,20,'F' UNION ALL

    SELECT 140675897,072915107,20,'F' UNION ALL

    SELECT 140675898,072915107,20,'F' UNION ALL

    SELECT 140675899,072915107,14,'D' UNION ALL

    SELECT 140675900,072915107,14,'F' UNION ALL

    SELECT 140675901,072915107,14,'F' UNION ALL

    SELECT 140675902,072915205,1580,'D' UNION ALL

    SELECT 140675903,072915205,1580,'F' UNION ALL

    SELECT 140675904,072915205,1580,'D'

    Thank you for noting that!

  • jessica.green2312 (8/21/2015)


    Apologies! No, that's an oversight on my part. Here is the corrected sample data and attached example result set:

    SELECT 140675885,072915107,1580,'D' UNION ALL

    SELECT 140675886,072915107,11,'D' UNION ALL

    SELECT 140675887,072915107,1580,'D' UNION ALL

    SELECT 140675888,072915107,109,'D' UNION ALL

    SELECT 140675889,072915107,1511,'D' UNION ALL

    SELECT 140675890,072915107,221,'D' UNION ALL

    SELECT 140675891,072915107,221,'F' UNION ALL

    SELECT 140675892,072915107,46,'D' UNION ALL

    SELECT 140675893,072915107,20,'D' UNION ALL

    SELECT 140675894,072915107,20,'F' UNION ALL

    SELECT 140675895,072915107,20,'F' UNION ALL

    SELECT 140675896,072915107,20,'F' UNION ALL

    SELECT 140675897,072915107,20,'F' UNION ALL

    SELECT 140675898,072915107,20,'F' UNION ALL

    SELECT 140675899,072915107,14,'D' UNION ALL

    SELECT 140675900,072915107,14,'F' UNION ALL

    SELECT 140675901,072915107,14,'F' UNION ALL

    SELECT 140675902,072915205,1580,'D' UNION ALL

    SELECT 140675903,072915205,1580,'F' UNION ALL

    SELECT 140675904,072915205,1580,'D'

    Thank you for noting that!

    do you only have "D" and "F" or are there other variants?

    anyways...try this

    SELECT

    ID

    , filenm

    , docnum

    , pg

    , CASE

    WHEN pg = 'F' THEN

    ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY pg ORDER BY id)

    ELSE

    ROW_NUMBER() OVER (PARTITION BY pg ORDER BY id)

    END AS jls

    FROM #t

    ORDER BY

    id;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Yes, we only ever have Ds and Fs. This is perfect - thank you so much!!!

  • jessica.green2312 (8/21/2015)


    Yes, we only ever have Ds and Fs. This is perfect - thank you so much!!!

    you are welcome....how long did it take to run?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • It hardly takes any time at all. The craziness that runs immediately before takes just shy of 2 minutes and when I add this afterwards it was still finished at ~1:56. 🙂

  • jessica.green2312 (8/21/2015)


    It hardly takes any time at all. The craziness that runs immediately before takes just shy of 2 minutes and when I add this afterwards it was still finished at ~1:56. 🙂

    To do what? Return it all to the screen? Also, what is the clustered index on the table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply