Select record from group

  • ChrisM@home (9/12/2013)


    paulcauchon (9/12/2013)


    I see the hitch up here...you need to see the min date and max date from each of the consecutive color/ID group pairings.

    Originally, we had been treating the grouping value (business key) as the pairing of color and ID. The key is actually the pairing of color and ID over sequential, uninterrupted AUTOID values.

    In this case, the simple solution that the group has previously been advocating will not work. At first glance, this is probably going to require either really complicated logic or row by row processing. I'll keep considering the "complicated logic" approach (as to not anger Lord Moden), and report back if I come up with anything.

    Not that complicated;

    SELECT ID, DATEFROM = MIN(DATEFROM), DATETO = MAX(DATETO), COLOR

    FROM (

    SELECT autoid, ID, DATEFROM, DATETO, COLOR,

    NewGroup = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COLOR) - AUTOID

    FROM #mytable

    ) d

    GROUP BY ID, COLOR, NewGroup

    ORDER BY ID, MIN(DATEFROM)

    Thanks Chris. I have used that method before for sets but didn't even think of it here. Sweet!!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Chris, thank you again. This is a magic.

    I need to learn

    "ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COLOR) - AUTOID"

    Would you mind to explain more about this statement?

  • ChrisM@home (9/12/2013)


    Not that complicated;

    Indeed, not that complicated. But terribly clever!

    If I'm understanding this correctly, we're lining up all of the ID and Color groups in order, and then ascribing row numbers to them. From these rownumbers, we subtract the AUTOID. It's easiest to understand if we line up side by side the AUTOID column, the output of row_number(), and the NewGroup column:

    SELECTautoid

    ,RowNumber = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COLOR)

    ,NewGroup = row_number() over(partition by ID order by color) - autoID

    ,ID, DATEFROM, DATETO, COLOR -- record contents

    FROM #mytable

    AUTOID rowNumber NewGroup ID DATEFROM DATETO COLOR

    ----------- ----------- ----------- ---- ----------------------- ----------------------- ----------

    1 1 0 111 2012-01-10 00:00:00.000 2012-01-11 00:00:00.000 BLUE

    2 2 0 111 2012-02-02 00:00:00.000 2012-02-02 00:00:00.000 BLUE

    3 3 0 111 2012-03-15 00:00:00.000 2012-03-16 00:00:00.000 BLUE

    7 4 -3 111 2012-07-17 00:00:00.000 2012-07-17 00:00:00.000 BLUE

    8 5 -3 111 2012-09-08 00:00:00.000 2012-10-10 00:00:00.000 BLUE

    17 6 -11 111 2012-01-10 00:00:00.000 2012-03-16 00:00:00.000 BLUE

    19 7 -12 111 2012-07-17 00:00:00.000 2012-10-10 00:00:00.000 BLUE

    18 8 -10 111 2012-04-05 00:00:00.000 2012-06-16 00:00:00.000 GREEN

    4 9 5 111 2012-04-05 00:00:00.000 2012-04-05 00:00:00.000 GREEN

    5 10 5 111 2012-05-25 00:00:00.000 2012-05-26 00:00:00.000 GREEN

    6 11 5 111 2012-06-06 00:00:00.000 2012-06-16 00:00:00.000 GREEN

    15 1 -14 222 2013-07-17 00:00:00.000 2013-07-17 00:00:00.000 BLUE

    16 2 -14 222 2013-09-08 00:00:00.000 2013-10-10 00:00:00.000 BLUE

    22 3 -19 222 2013-07-17 00:00:00.000 2013-10-10 00:00:00.000 BLUE

    21 4 -17 222 2013-04-05 00:00:00.000 2013-06-16 00:00:00.000 GREEN

    12 5 -7 222 2013-04-05 00:00:00.000 2013-04-05 00:00:00.000 GREEN

    13 6 -7 222 2013-05-25 00:00:00.000 2013-05-26 00:00:00.000 GREEN

    14 7 -7 222 2013-06-06 00:00:00.000 2013-06-16 00:00:00.000 GREEN

    20 8 -12 222 2013-01-10 00:00:00.000 2013-03-16 00:00:00.000 RED

    9 9 0 222 2013-01-10 00:00:00.000 2013-01-11 00:00:00.000 RED

    10 10 0 222 2013-02-02 00:00:00.000 2013-02-02 00:00:00.000 RED

    11 11 0 222 2013-03-15 00:00:00.000 2013-03-16 00:00:00.000 RED

    (22 row(s) affected)

    As the rowNumber increments, the difference between the AUTOID and rowNumber remains consistent across groups. We can then use this in our outer query as the third member of the business key (Color, ID, uninterrupted consecutive AUTOIDs).

    What I would love to understand is why we shouldn't include in the ORDER BY statement of the OVER clause something that will ensure that the records within a "NewGroup" group are processed in order, a la:

    FROM (

    SELECT autoid, ID, DATEFROM, DATETO, COLOR,

    NewGroup = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COLOR, AUTOID) - AUTOID

    FROM #mytable

    ) d

    Am I correct in assuming that if we did not have the clustered index on the table we would need to have the AUTOID in the ORDER BY statement?

  • ChrisM@home (9/12/2013)


    paulcauchon (9/12/2013)


    I see the hitch up here...you need to see the min date and max date from each of the consecutive color/ID group pairings.

    Originally, we had been treating the grouping value (business key) as the pairing of color and ID. The key is actually the pairing of color and ID over sequential, uninterrupted AUTOID values.

    In this case, the simple solution that the group has previously been advocating will not work. At first glance, this is probably going to require either really complicated logic or row by row processing. I'll keep considering the "complicated logic" approach (as to not anger Lord Moden), and report back if I come up with anything.

    Not that complicated;

    SELECT ID, DATEFROM = MIN(DATEFROM), DATETO = MAX(DATETO), COLOR

    FROM (

    SELECT autoid, ID, DATEFROM, DATETO, COLOR,

    NewGroup = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COLOR) - AUTOID

    FROM #mytable

    ) d

    GROUP BY ID, COLOR, NewGroup

    ORDER BY ID, MIN(DATEFROM)

    +1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • paulcauchon (9/12/2013)


    ChrisM@home (9/12/2013)


    Not that complicated;

    ...

    If I'm understanding this correctly, we're lining up all of the ID and Color groups in order, and then ascribing row numbers to them. From these rownumbers, we subtract the AUTOID. It's easiest to understand if we line up side by side the AUTOID column, the output of row_number(), and the NewGroup column:

    SELECTautoid

    ,RowNumber = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY COLOR)

    ,NewGroup = row_number() over(partition by ID order by color) - autoID

    ,ID, DATEFROM, DATETO, COLOR -- record contents

    FROM #mytable

    Paul - if you order the output by AutoID, it all becomes clear. There's a description of how the method works here[/url]. The inventor, as far as I know, was Itzik Ben-Gan - but I don't have a reference, sorry.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/13/2013)


    ...

    Paul - if you order the output by AutoID, it all becomes clear. There's a description of how the method works here[/url]. The inventor, as far as I know, was Itzik Ben-Gan - but I don't have a reference, sorry.

    Whoa...excellent article there. Very in depth and thorough. Feeling kinda like Neo!

    This tool (and the rest of the methods mentioned in that article) are excellent to have in my back pocket.

    Thanks for the explanation!

  • Pretty good explaination. Worth spending time on the read.

  • paulcauchon (9/13/2013)


    ChrisM@Work (9/13/2013)


    ...

    Paul - if you order the output by AutoID, it all becomes clear. There's a description of how the method works here[/url]. The inventor, as far as I know, was Itzik Ben-Gan - but I don't have a reference, sorry.

    Whoa...excellent article there. Very in depth and thorough. Feeling kinda like Neo!

    This tool (and the rest of the methods mentioned in that article) are excellent to have in my back pocket.

    Thanks for the explanation!

    Bwaaahaaaa!!! That's too funny Paul!


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply