How to grouping/collapse multiple rows into a single row?

  • Hello folks,

    I am faced with a challenging T-SQL problem for which I am unable to find a set based idea/solution that I am satisfied with. Your inputs are greatly appreciated.

    Sample code:

    DECLARE @T TABLE (

    Id int identity(1, 1),

    NKey int,

    P1 int, IsP1Set bit,

    P2 int, IsP2Set bit,

    P3 int, IsP3Set bit,

    LastModified datetime

    )

    INSERT INTO @T

    VALUES

    (1, 1, 1, null, 0, null, 0, '2010-11-11 01:00:00'),

    (1, null, 0, 22, 1, null, 0, '2010-11-11 02:00:00'),

    (1, 11, 1, 222, 1, null, 0, '2010-11-11 03:00:00'),

    (1, null, 0, null, 0, 33, 1, '2010-11-11 04:00:00'),

    (1, 111, 1, null, 0, null, 0, '2010-11-11 05:00:00'),

    (1, null, 0, 2222, 1, null, 0, '2010-11-11 06:00:00'),

    (1, 1111, 1, null, 0, null, 0, '2010-11-11 07:00:00')

    select * from @T

    /*

    Expected result:

    111111222213312010-11-11 07:00:00

    --rows (7) (6) (4) MAX(LastModified)

    */

    ;with Result(NKey, P1, IsP1Set, P1Idx, P2, IsP2Set, P2Idx, P3, IsP3Set, P3Idx, LastModified) as (

    select NKey,

    P1, IsP1Set, ROW_NUMBER() OVER (PARTITION BY IsP1Set ORDER BY LastModified DESC) AS P1Idx,

    P2, IsP2Set, ROW_NUMBER() OVER (PARTITION BY IsP2Set ORDER BY LastModified DESC) AS P2Idx,

    P3, IsP3Set, ROW_NUMBER() OVER (PARTITION BY IsP3Set ORDER BY LastModified DESC) AS P3Idx,

    LastModified

    from @T

    )

    select *

    from Result

    where (IsP1Set = 1 and P1Idx = 1)

    OR (IsP2Set = 1 and P2Idx = 1)

    OR (IsP3Set = 1 and P3Idx = 1)

    Here is the problem (sample code given below):

    We are a datawarehouse team sourcing data from multiple systems in our org.

    One of our sources is an auditing/logging SQL database.

    The entities in the source have a property-bag design which look likes this:

    TableA(NatuarlKey, Property1, IsProperty1Set, Property2, IsPropert2Set,....., LastModified)

    where,

    NaturakKey is typically a uniqueidentifier

    Property1...n can be any datatype

    IsProperty1Set...IsPropertynSet are bit fields which indicate if a value for this property has been set

    n ranges from 1 to 100

    And, for understandable obvious reasons, they have optimized this logging system for writes such that, for every change to a property (or a group of properties) they insert a new row with *only* the updated properties.

    The datawarehouse problem is to get the "most recent" value for each of the properties into a single record.

    I am able to write a query that filters out intermediate records, but I am puzzled with how to group the rows efficiently into a single record. For example, the code below generates 3 rows (1 for each of the 3 properties defined) with the most recent value of each of the properties.

    Your help is deeply appreciated.

    Thanks,

    Manju.

  • Manjeshwar,

    I've used a form of the serial update here which should cure your pain on this specific instance. If you want the 'full rules' to keep you safe from harm in this instance, you'll want to look at the following two items:

    The Quirky Update article: http://www.sqlservercentral.com/articles/T-SQL/68467/

    The Safety Check wrapper: http://www.sqlservercentral.com/Forums/Topic802558-203-4.aspx#bm980118

    Please note I do not follow the full rules in either of these articles, and you should understand both what I'm doing and a review of those to fully encapsulate this for safety if this is not a one-shot set of results.

    However, this is successful across a series of NKeys and LastModifieds. Please be aware, I removed the 'ID' field from @T because it was unused in our scenario here, and you weren't accounting for it in the INSERT INTO statement.

    DECLARE @T TABLE (

    -- Id int identity(1, 1),

    NKey int,

    P1 int, IsP1Set bit,

    P2 int, IsP2Set bit,

    P3 int, IsP3Set bit,

    LastModified datetime

    )

    INSERT INTO @T

    SELECT 1, 1, 1, null, 0, null, 0, '2010-11-11 01:00:00' UNION ALL

    SELECT 1, null, 0, 22, 1, null, 0, '2010-11-11 02:00:00' UNION ALL

    SELECT 1, 11, 1, 222, 1, null, 0, '2010-11-11 03:00:00' UNION ALL

    SELECT 1, null, 0, null, 0, 33, 1, '2010-11-11 04:00:00' UNION ALL

    SELECT 1, 111, 1, null, 0, null, 0, '2010-11-11 05:00:00' UNION ALL

    SELECT 1, null, 0, 2222, 1, null, 0, '2010-11-11 06:00:00' UNION ALL

    SELECT 1, 1111, 1, null, 0, null, 0, '2010-11-11 07:00:00'

    IF Object_ID('tempdb..#serialUpdate') IS NOT NULL

    DROP TABLE #serialUpdate

    CREATE TABLE #serialUpdate

    (NKey int,

    P1 int, IsP1Set bit,

    P2 int, IsP2Set bit,

    P3 int, IsP3Set bit,

    LastModified datetime

    )

    CREATE CLUSTERED INDEX idx_serialUpdate ON #serialUpdate (NKey, LastModified)

    DECLARE @NKeyINT,

    @P1INT,

    @P2INT,

    @P3INT

    INSERT INTO #serialUpdate

    SELECT * FROM @T

    UPDATE#serialUpdate

    SET@p1 = p1 = CASE WHEN Nkey = @NKey

    THEN CASEWHEN p1 IS NULL

    THEN @p1

    ELSE p1 END

    ELSE p1 END,

    @p2 = p2 = CASE WHEN Nkey = @NKey

    THEN CASEWHEN p2 IS NULL

    THEN @p2

    ELSE p2 END

    ELSE p2 END,

    @p3 = p3 = CASE WHEN Nkey = @NKey

    THEN CASEWHEN p3 IS NULL

    THEN @p3

    ELSE p3 END

    ELSE p3 END,

    @Nkey = NKey

    OPTION (MAXDOP 1)

    SELECT

    su1.NKey,

    su1.P1,

    su1.P2,

    su1.P3,

    su1.LastModified

    FROM

    #SerialUpdate AS su1

    JOIN

    (SELECT

    NKey, MAX( LastModified) AS MaxLM

    FROM

    #SerialUpdate AS su2

    GROUP BY

    NKey

    ) AS drv

    ONsu1.NKey = drv.NKey

    AND su1.LastModified = drv.MaxLM


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I doubt it's anywhere near as fast but...

    ;with Result(NKey, P1, IsP1Set, P1Idx, P2, IsP2Set, P2Idx, P3, IsP3Set, P3Idx, LastModified) as (

    select NKey,

    P1, IsP1Set, ROW_NUMBER() OVER (PARTITION BY IsP1Set ORDER BY LastModified DESC) AS P1Idx,

    P2, IsP2Set, ROW_NUMBER() OVER (PARTITION BY IsP2Set ORDER BY LastModified DESC) AS P2Idx,

    P3, IsP3Set, ROW_NUMBER() OVER (PARTITION BY IsP3Set ORDER BY LastModified DESC) AS P3Idx,

    LastModified

    from @T

    )

    select NKey,

    max(ISNULL(P1,'')) as P1,

    max(ISNULL(P2,'')) as P2,

    max(ISNULL(P3,'')) as P3

    from Result

    where (IsP1Set = 1 and P1Idx = 1)

    OR (IsP2Set = 1 and P2Idx = 1)

    OR (IsP3Set = 1 and P3Idx = 1)

    group by NKey

  • Thanks for the response David.

    But I don't really need the MAX value. Instead, what I am looking for is to get the most recent value of any of the properties.

    "OOOOOPS"! I, now, got what you were suggesting; oh, well.

  • Thanks Craig.

    This looks interesting; will explore further more.

    Thanks for the response.

  • In 2008 you have a couple of options. You can cross apply a query to select the top 1 row for each ordered by date descending. This would give you the most recent row. Althernatively, you can use a cte to assign a Row_Number() to each row partioned by your grouping and then ordered by date descending. Then just use a where clause to filter out all rows from the cte where Row_Number() > 1.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Do you mean a solution similar to what David is suggeting above?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply