How to group based on value and rank

  • Hello All

    I need some pointers on how to get started with a query. The problem is how to group some data in this scenario:

    Say I have a table like so:

    value date

    501/01/2011 00:00:00

    502/01/2011 00:00:00

    503/01/2011 00:00:00

    1004/01/2011 00:00:00

    1005/01/2011 00:00:00

    1006/01/2011 00:00:00

    507/01/2011 00:00:00

    508/01/2011 00:00:00

    509/01/2011 00:00:00

    I would like to group on 'value' but take into account the 'clustering', so the result would be for example:

    value count

    5 3

    10 3

    5 3

    I'm guessing I have to create some dummy value for each 'cluster' but have no idea where to start.

    Cheers all!

    Philippe

  • How do you define cluster of rows in your table? If it’s the order of insertion then you are going on wrong track. The order of rows in table is not guaranteed. Please identify a good logic for clustering / grouping you data & add one identifier in your table (even can be calculated at runtime).

  • Hi Dev

    The criteria for a cluster is that the records are identical (time field excepted) and contiguous when ordered by time. Records that are identical but not part of the same date sequence are grouped separately.

    Hope that makes sense

    Philippe

  • Maybe this one will do. I added the max. date to the group for sorting.

    DECLARE @T TABLE ([Value] INT, [Date] DATETIME)

    INSERT INTO @T([Value], [Date]) VALUES (5, '01/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (5, '02/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (5, '03/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (10, '04/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (10, '05/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (10, '06/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (5, '07/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (5, '08/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (5, '09/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (10, '10/01/2011 00:00:00')

    INSERT INTO @T([Value], [Date]) VALUES (10, '10/01/2011 00:00:00')

    SELECT

    [Value], MAX([Date]), COUNT(*) [Count]

    FROM

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY [Date]) - ROW_NUMBER() OVER (PARTITION BY [Value] ORDER BY [Date]) GrpID,

    *

    FROM

    @T

    ) Q

    GROUP BY

    [Value], GrpID

    ORDER BY

    2, 1

  • DROP TABLE #Sample

    CREATE TABLE #Sample (avalue INT, adate DATETIME)

    INSERT INTO #Sample (avalue, adate)

    SELECT 5, '01/01/2011 00:00:00' UNION ALL

    SELECT 5, '02/01/2011 00:00:00' UNION ALL

    SELECT 5, '03/01/2011 00:00:00' UNION ALL

    SELECT 10, '04/01/2011 00:00:00' UNION ALL

    SELECT 10, '05/01/2011 00:00:00' UNION ALL

    SELECT 10, '06/01/2011 00:00:00' UNION ALL

    SELECT 5, '07/01/2011 00:00:00' UNION ALL

    SELECT 5, '08/01/2011 00:00:00' UNION ALL

    SELECT 5, '09/01/2011 00:00:00'

    ;WITH OrderedData AS (

    SELECT *,

    rn1 = ROW_NUMBER() OVER (ORDER BY aDate),

    rn2 = ROW_NUMBER() OVER (PARTITION BY avalue ORDER BY aDate)

    FROM #Sample

    )

    SELECT

    avalue,

    aCount = COUNT(*)

    FROM OrderedData

    GROUP BY rn1-rn2, avalue

    ORDER BY MAX(aDate)

    -- it doesn't make much sense without an ordering column, e.g. MAX(aDate)

    Quick today, Peter. I see you've also commented on a sort column 😎


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (12/14/2011)


    Quick today, Peter. I see you've also commented on a sort column 😎

    Funny, almost same query, same execution plan. Note that the solutions probably do not scale very well due to the 4 sort operations (one for each row_number() expression, one for the GROUP BY and one for the ORDER BY). A proper index on the date column (including the value column) might help a little.

  • Blinding! well done guys and many thanks

    It's

    ORDER BY [Date]) - ROW_NUMBER()

    that I'd failed to see before as my dummy grouping

    Thanks again

    Philippe

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply