Filter a table to get unique rows in a certain logic

  • ColdCoffee (1/22/2011)


    @Jeff, u are absolutely right.. i mis-read the requirement; from morning (its morning in INDIA now, as i write) i was thinking about that only.. So i sat and put together my piece of code to find the unique row . Hope this turns out to be correct

    ; WITH ROW_NUMBERED_LIST AS

    (

    SELECT ROW_NUM = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) ,

    idMvt1,idMvt2,idMvt3

    FROM MVTS

    ),

    Unpivoted_List AS

    (

    SELECT ROW_NUM , ValInCol

    FROM

    ( SELECT ROW_NUM ,idMvt1,idMvt2,idMvt3 FROM ROW_NUMBERED_LIST ) AS PIVOT_TABLE

    UNPIVOT

    ( ValInCol FOR ColName IN (idMvt1,idMvt2,idMvt3) ) AS PIVOT_HANDLE

    --ORDER BY ROW_NUM

    ) ,

    Ranked_List AS

    (

    SELECT ROW_NUM , ValInCol,

    RANKING = ROW_NUMBER() OVER( PARTITION BY ValInCol ORDER BY ROW_NUM )

    FROM UNPIVOTED_LIST

    ) ,

    UNIQUE_IDS AS

    (

    SELECT ROW_NUM

    FROM Ranked_List

    GROUP BY ROW_NUM

    HAVING COUNT( DISTINCT RANKING ) = 1

    )

    SELECT R_N_LST.ROW_NUM, R_N_LST.idMvt1,R_N_LST.idMvt2,R_N_LST.idMvt3

    FROM ROW_NUMBERED_LIST R_N_LST

    INNER JOIN UNIQUE_IDS UNIQ

    ON R_N_LST.ROW_NUM = UNIQ.ROW_NUM

    Very cool, CC... that correctly produces rows 1 and 4 on the original test data. Now I have to study what you did because, frankly, I couldn't get my cold-medicine-induced head around this one. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gah! Sorry CC... although your code works with the original data, it doesn't work when we insert a new row. Try yours with the following data...

    DROP TABLE MVTS

    CREATE TABLE MVTS

    (

    ID INT IDENTITY(1,1),

    idMvt1 int,

    idMvt2 int,

    idMvt3 int

    );

    INSERT INTO MVTS

    (idMvt1,idMvt2,idMvt3)

    SELECT 271, 204, 136 UNION ALL --1 This will always show up because it's the "first"

    SELECT 271, 202, 136 UNION ALL --2 This should NOT show because 271 and 136 is contained in 1 (which is the only "active" row at this point)

    SELECT 268, 253, 204 UNION ALL --3 This should NOT show because 204 is contained in 1 (which is the only "active" row at this point)

    SELECT 268, 001, 204 UNION ALL [font="Arial Black"]--New row added to see if Cold Coffee's algo works. It doesn't.[/font]

    SELECT 268, 253, 202 UNION ALL --4 This SHOULD show because even though 268 and 253 are in the previous row,

    -- none of the elements live in 1 which is the only active row at this point.

    SELECT 268, 256, 204 UNION ALL --5 Should NOT show because at least 1 element lives in 1 or 4 which are "active"

    SELECT 268, 256, 202 --6 Should NOT show because at least 1 element lives in 4 which is "active"

    Heh... the good part about this is that I don't feel like such an idiot on this problem, now. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Can I throw in a bone too?

    --===== Peso's code

    ;WITH cteSource(ID, SequenceID)

    AS (

    SELECTu.ID,

    ROW_NUMBER() OVER (PARTITION BY u.theValue ORDER BY u.ID) AS SequenceID

    FROMdbo.Mvts AS s

    UNPIVOT(

    theValue

    FOR theCol IN (s.idMvt1, s.idMvt2, s.idMvt3)

    ) AS u

    )

    SELECTID

    FROMcteSource

    GROUP BYID

    --HAVINGSUM(SequenceID) = MIN(SequenceID) * COUNT(*)

    HAVINGMIN(SequenceID) = MAX(SequenceID)


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (1/23/2011)


    Can I throw in a bone too?

    --===== Peso's code

    ;WITH cteSource(ID, SequenceID)

    AS (

    SELECTu.ID,

    ROW_NUMBER() OVER (PARTITION BY u.theValue ORDER BY u.ID) AS SequenceID

    FROMdbo.Mvts AS s

    UNPIVOT(

    theValue

    FOR theCol IN (s.idMvt1, s.idMvt2, s.idMvt3)

    ) AS u

    )

    SELECTID

    FROMcteSource

    GROUP BYID

    --HAVINGSUM(SequenceID) = MIN(SequenceID) * COUNT(*)

    HAVINGMIN(SequenceID) = MAX(SequenceID)

    Yep. It's just the bone is missing a bit of marrow. With the following data, it should return ID's 1 AND 5 (labeled as "4" in the comments because I added an extra row to break CC's code).

    DROP TABLE MVTS

    CREATE TABLE MVTS

    (

    ID INT IDENTITY(1,1),

    idMvt1 int,

    idMvt2 int,

    idMvt3 int

    );

    INSERT INTO MVTS

    (idMvt1,idMvt2,idMvt3)

    SELECT 271, 204, 136 UNION ALL --1 This will always show up because it's the "first"

    SELECT 271, 202, 136 UNION ALL --2 This should NOT show because 271 and 136 is contained in 1 (which is the only "active" row at this point)

    SELECT 268, 253, 204 UNION ALL --3 This should NOT show because 204 is contained in 1 (which is the only "active" row at this point)

    SELECT 268, 001, 204 UNION ALL --New row added to see if Cold Coffee's algo works. It doesn't.

    SELECT 268, 253, 202 UNION ALL --4 This SHOULD show because even though 268 and 253 are in the previous row,

    -- none of the elements live in 1 which is the only active row at this point.

    SELECT 268, 256, 204 UNION ALL --5 Should NOT show because at least 1 element lives in 1 or 4 which are "active"

    SELECT 268, 256, 202 --6 Should NOT show because at least 1 element lives in 4 which is "active"

    --===== Peso's code

    ;WITH cteSource(ID, SequenceID)

    AS (

    SELECT u.ID,

    ROW_NUMBER() OVER (PARTITION BY u.theValue ORDER BY u.ID) AS SequenceID

    FROM dbo.Mvts AS s

    UNPIVOT (

    theValue

    FOR theCol IN (s.idMvt1, s.idMvt2, s.idMvt3)

    ) AS u

    )

    SELECT ID

    FROM cteSource

    GROUP BY ID

    --HAVING SUM(SequenceID) = MIN(SequenceID) * COUNT(*)

    HAVING MIN(SequenceID) = MAX(SequenceID)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DECLARE@Sample TABLE

    (

    ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    idMvt1 int,

    idMvt2 int,

    idMvt3 int

    )

    INSERT@Sample

    (

    idMvt1,

    idMvt2,

    idMvt3

    )

    SELECT271, 204, 136 UNION ALL --1 This will always show up because it's the "first"

    SELECT271, 202, 136 UNION ALL --2 This should NOT show because 271 and 136 is contained in 1 (which is the only "active" row at this point)

    SELECT268, 253, 204 UNION ALL --3 This should NOT show because 204 is contained in 1 (which is the only "active" row at this point)

    SELECT268, 001, 204 UNION ALL --4 New row added to see if Cold Coffee's algo works. It doesn't.

    SELECT268, 253, 202 UNION ALL --5 This SHOULD show because even though 268 and 253 are in the previous row,

    -- none of the elements live in 1 which is the only active row at this point.

    SELECT268, 256, 204 UNION ALL --6 Should NOT show because at least 1 element lives in 1 or 4 which are "active"

    SELECT268, 256, 202 --7 Should NOT show because at least 1 element lives in 4 which is "active"

    --===== Peso's code

    ;WITH cteSource

    AS (

    SELECTID,

    idMvt1,

    idMvt2,

    idMvt3

    FROM@Sample

    WHEREID = 1

    UNION ALL

    SELECTID,

    idMvt1,

    idMvt2,

    idMvt3

    FROM(

    SELECTs.ID,

    s.idMvt1,

    s.idMvt2,

    s.idMvt3,

    ROW_NUMBER() OVER (ORDER BY s.ID) AS YakID

    FROM@Sample AS s

    INNER JOINcteSource AS c ON c.ID < s.ID

    WHEREs.idMvt1 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

    AND s.idMvt2 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

    AND s.idMvt3 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

    ) AS d

    WHEREYakID = 1

    )

    SELECTID,

    idMvt1,

    idMvt2,

    idMvt3

    FROMcteSource


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (1/23/2011)


    ...

    Sorry Peso, but while testing mine for more complicated datasets, I came across this one that your latest does not cope with...

    INSERT INTO MVTS

    (idMvt1,idMvt2,idMvt3)

    SELECT 1, 2, 3 UNION ALL

    SELECT 4, 5, 6 UNION ALL

    SELECT 2, 1, 3 -- this row should NOT be selected.

    Hope you can work around this because mine is way too slooooowww! and ugly!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Darned good try, Peter. But it doesn't hold up for other data ({edit} Magoo beat me to it while I was writing this post). I know... it's a tough problem to do even with a RBAR loop. Even a "Lasagna" loop is going to take a long time across, say, 5 million rows.

    DECLARE @Sample TABLE

    (

    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    idMvt1 int,

    idMvt2 int,

    idMvt3 int

    );

    --===== Build the test data

    INSERT INTO @Sample

    (idMvt1, idMvt2, idMvt3)

    SELECT TOP 20

    idMvt1 = ABS(CHECKSUM(NEWID()))%10+1,

    idMvt2 = ABS(CHECKSUM(NEWID()))%10+1,

    idMvt3 = ABS(CHECKSUM(NEWID()))%10+1

    FROM sys.All_Columns ac1,

    sys.All_Columns ac2

    ;

    -----------------------------------------------------------------------------------------------------------------------

    --===== Peso's code

    ;WITH cteSource

    AS (

    SELECT ID,

    idMvt1,

    idMvt2,

    idMvt3

    FROM @Sample

    WHERE ID = 1

    UNION ALL

    SELECT ID,

    idMvt1,

    idMvt2,

    idMvt3

    FROM (

    SELECT s.ID,

    s.idMvt1,

    s.idMvt2,

    s.idMvt3,

    ROW_NUMBER() OVER (ORDER BY s.ID) AS YakID

    FROM @Sample AS s

    INNER JOIN cteSource AS c ON c.ID < s.ID

    WHERE s.idMvt1 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

    AND s.idMvt2 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

    AND s.idMvt3 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

    ) AS d

    WHERE YakID = 1

    )

    SELECT ID,

    idMvt1,

    idMvt2,

    idMvt3

    FROM cteSource

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mister.magoo (1/23/2011)


    Hope you can work around this because mine is way too slooooowww! and ugly!

    Glad to hear that my code isn't the only slow and ugly one on this problem. :blush:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/23/2011)


    mister.magoo (1/23/2011)


    Hope you can work around this because mine is way too slooooowww! and ugly!

    Glad to hear that my code isn't the only slow and ugly one on this problem. :blush:

    At the moment, my code works but has been running for three minutes and not completed on a 1M row set....:pinch:

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Jeff Moden (1/23/2011)


    mister.magoo (1/23/2011)


    Hope you can work around this because mine is way too slooooowww! and ugly!

    Glad to hear that my code isn't the only slow and ugly one on this problem. :blush:

    Ok, this may be ugly, but it processes 1M rows in 1 second and correctly processes all the test cases posted so far.

    IF OBJECT_ID ('tempdb..#accepted') IS NOT NULL DROP TABLE #accepted;

    CREATE TABLE #accepted

    (

    ID INT IDENTITY(1,1) ,

    VALUE INT

    );

    CREATE INDEX ix1 ON #accepted(VALUE);

    --= allow identity inserts - this is key to the solution

    SET IDENTITY_INSERT #accepted ON;

    --= pop the first row in

    INSERT #accepted(ID,VALUE)

    SELECT ID,VALUE

    FROM mvts AS o

    UNPIVOT (VALUE FOR c IN (idMvt1,idMvt2,idMvt3)) upvt

    WHERE id=1;

    --== now do a yucky WHILE loop to process the rows one at a time into the accepted table.

    --== ROWCOUNT=3 because I unpivot the values

    WHILE @@ROWCOUNT=3

    INSERT #accepted(ID,VALUE)

    SELECT ID, VALUE

    FROM (

    SELECT TOP 1 ID,idMvt1,idMvt2,idMvt3

    FROM mvts AS o

    WHERE NOT EXISTS (SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt1)

    AND NOT EXISTS ( SELECT1 FROM #accepted AS a WHEREa.value = o.idMvt2)

    AND NOT EXISTS ( SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt3)

    AND id>SCOPE_IDENTITY() --= using SCOPE_IDENTITY to tell me the last ID I stored in the table.

    ORDER BY id

    ) NextRow(ID,idMvt1,idMvt2,idMvt3)

    UNPIVOT (

    VALUE FOR c IN (idMvt1,idMvt2,idMvt3)

    ) upvt

    SET IDENTITY_INSERT #accepted OFF;

    SELECT mvts.id,mvts.idMvt1,mvts.idMvt2,mvts.idMvt3

    FROM mvts AS mvts

    WHERE EXISTS (SELECT 1 FROM #accepted AS a WHERE a.id=mvts.id)

    ORDER BY mvts.id

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Ugly is OK... especially if Ugly works fast. I'll give it a try.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OMG! How did the old cartoon series used to say it? "Ahhhh, Magoo... You've done it again!". Well done. NOW I can go to bed! 😛

    Even on my 9 year old, single 1.8Ghz P4 CPU, it still only takes 14 seconds on a million rows. Freakin' awesome job, Magoo. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ingenious use. 🙂

    One word of warning though set MAXDOP 1 , scope_identity can return incorrect results in a parallel query.



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (1/23/2011)


    Ingenious use. 🙂

    One word of warning though set MAXDOP 1 , scope_identity can return incorrect results in a parallel query.

    And using partitions, right?


    N 56°04'39.16"
    E 12°55'05.25"

  • Thx everyone I would never hoped for this number of solutions, I will shrotly post the results.

  • Viewing 15 posts - 16 through 30 (of 75 total)

    You must be logged in to reply to this topic. Login to reply