Filter a table to get unique rows in a certain logic

  • mister.magoo (1/24/2011)


    Dave,

    I haven't tested yours, but can see that it does not meet with Sir Jeff's strict guidelines....

    BWAA-HAA!!!... I'm just the humble tester here. The OP posted the original guidelines. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/24/2011)


    mister.magoo (1/24/2011)


    Dave,

    I haven't tested yours, but can see that it does not meet with Sir Jeff's strict guidelines....

    BWAA-HAA!!!... I'm just the humble tester here. The OP posted the original guidelines. 😛

    Thought you might like that!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (1/24/2011)


    TABLOCKX on mvts knock about 0.5 second off the time (14%) for a Pool of 10,000

    I will add this tweak to the code when I run the performance tests later.

    Good call Wayne!

    I have to admit... the problem initially had me thinking "quirky update" until I realized that it wouldn't handle it. But the QU stuck in my head, and with the talk about the partitioning and the maxdop 1, it popped back up to the top, and I started wondering if any of the hints in that process would help out here. I'm glad it makes a difference, and I'm interested to see the results of it on the larger set. (Just for the heck of it - try it on the temp table also!)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/24/2011)


    I have to admit... the problem initially had me thinking "quirky update" until I realized that it wouldn't handle it. But the QU stuck in my head, and with the talk about the partitioning and the maxdop 1, it popped back up to the top, and I started wondering if any of the hints in that process would help out here. I'm glad it makes a difference, and I'm interested to see the results of it on the larger set. (Just for the heck of it - try it on the temp table also!)

    I also started out thinking QU but soon realised that the potentially massive recursion required would be a double-hard problem for that.

    What is needed is a "Double-Barrelled Quirky Update" or an "Ultimate Quirky Update" or something....

    I will try the TABLOCKX on both tables and on larger sets.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Here is another approach. Instead of searching for "no duplicate" I keep track of which values that have been used.

    SET NOCOUNT ON

    CREATE TABLE#Sample

    (

    ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    idMvt1 INT NOT NULL,

    idMvt2 INT NOT NULL,

    idMvt3 INT NOT NULL

    )

    INSERT#Sample

    (

    idMvt1,

    idMvt2,

    idMvt3

    )

    SELECTTOP(1000000)

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt1,

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt2,

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt3

    FROMsys.All_Columns ac1,

    sys.All_Columns ac2

    -- Solution here

    CREATE TABLE#Values

    (

    idMvt INT PRIMARY KEY NONCLUSTERED

    )

    INSERT#Values

    (

    idMvt

    )

    SELECT DISTINCTu.idMvt

    FROM#Sample AS s

    UNPIVOT(

    idMvt

    FOR Col IN (s.idMvt1, s.idMvt2, s.idMvt3)

    ) AS u

    CREATE TABLE#Result

    (

    ID INT PRIMARY KEY CLUSTERED

    )

    DECLARE@CurrID INT,

    @OldID INT,

    @idMvt1 INT,

    @idMvt2 INT,

    @idMvt3 INT

    SELECT TOP(1)@CurrID = ID,

    @idMvt1 = idMvt1,

    @idMvt2 = idMvt2,

    @idMvt3 = idMvt3

    FROM#Sample

    ORDER BYID

    INSERT#Result

    (

    ID

    )

    VALUES(

    @CurrID

    )

    DELETE

    FROM#Values

    WHEREidMvt IN (@idMvt1, @idMvt2, @idMvt3)

    WHILE @CurrID IS NOT NULL

    BEGIN

    SELECT@OldID = @CurrID,

    @CurrID = NULL

    SELECT TOP(1)@CurrID = s.ID,

    @idMvt1 = s.idMvt1,

    @idMvt2 = s.idMvt2,

    @idMvt3 = s.idMvt3

    FROM#Sample AS s

    WHEREEXISTS(SELECT * FROM #Values AS v WHERE v.idMvt = s.idMvt1)

    AND EXISTS(SELECT * FROM #Values AS v WHERE v.idMvt = s.idMvt2)

    AND EXISTS(SELECT * FROM #Values AS v WHERE v.idMvt = s.idMvt3)

    AND s.ID > @OldID

    ORDER BYs.ID

    IF @CurrID IS NOT NULL

    BEGIN

    INSERT#Result

    (

    ID

    )

    VALUES(

    @CurrID

    )

    DELETE

    FROM#Values

    WHEREidMvt IN (@idMvt1, @idMvt2, @idMvt3)

    END

    END

    SELECTID

    FROM#Result

    DROP TABLE#Sample,

    #Values,

    #Result


    N 56°04'39.16"
    E 12°55'05.25"

  • Look at us, all doing WHILE loops ! Someone should take a photograph to preserve the moment 😛

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Performance Stats as promised

    First my solution WITH and WITHOUT TABLOCKX on MVTS and #accepted:

    And next a comparison with SwePeso's solution which shows an interesting variance between the two solutions for small and large sets....

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • How about this as a solution?

    SET NOCOUNT ON

    CREATE TABLE #Sample

    (

    ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    idMvt1 INT NOT NULL,

    idMvt2 INT NOT NULL,

    idMvt3 INT NOT NULL

    )

    INSERT #Sample

    (

    idMvt1,

    idMvt2,

    idMvt3

    )

    SELECT TOP(1000000)

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt1,

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt2,

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt3

    FROM sys.all_columns ac1,

    sys.all_columns ac2;

    -- Solution here

    set statistics io on;

    set statistics time on;

    with MyIDs (

    Mvt,

    ID

    ) as (

    select

    idMvt1,

    ID

    from

    #Sample

    union

    select

    idMvt2,

    ID

    from

    #Sample

    union

    select

    idMvt3,

    ID

    from

    #Sample

    ),

    UniqueMvtIDs (

    Mvt,

    ID

    ) as (

    select

    Mvt,

    min(ID)

    from

    MyIDs

    group by

    Mvt

    )

    select distinct

    s.*

    from

    #Sample s

    inner join UniqueMvtIDs umi

    on (s.ID = umi.ID)

    ;

    set statistics time off;

    set statistics io off;

    drop table #Sample;

    Simple testing using statistics time on and io on:

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Sample_____________________________________________________________________________________________________________000000000050'. Scan count 4, logical reads 9403, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5875 ms, elapsed time = 6341 ms.

    !!!! Nevermind, doesn't work !!!!

  • Lynn Pettis (1/24/2011)


    How about this as a solution?

    SET NOCOUNT ON

    CREATE TABLE #Sample

    (

    ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    idMvt1 INT NOT NULL,

    idMvt2 INT NOT NULL,

    idMvt3 INT NOT NULL

    )

    INSERT #Sample

    (

    idMvt1,

    idMvt2,

    idMvt3

    )

    SELECT TOP(1000000)

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt1,

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt2,

    ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt3

    FROM sys.all_columns ac1,

    sys.all_columns ac2;

    -- Solution here

    set statistics io on;

    set statistics time on;

    with MyIDs (

    Mvt,

    ID

    ) as (

    select

    idMvt1,

    ID

    from

    #Sample

    union

    select

    idMvt2,

    ID

    from

    #Sample

    union

    select

    idMvt3,

    ID

    from

    #Sample

    ),

    UniqueMvtIDs (

    Mvt,

    ID

    ) as (

    select

    Mvt,

    min(ID)

    from

    MyIDs

    group by

    Mvt

    )

    select distinct

    s.*

    from

    #Sample s

    inner join UniqueMvtIDs umi

    on (s.ID = umi.ID)

    ;

    set statistics time off;

    set statistics io off;

    drop table #Sample;

    Simple testing using statistics time on and io on:

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Sample_____________________________________________________________________________________________________________000000000050'. Scan count 4, logical reads 9403, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5875 ms, elapsed time = 6341 ms.

    !!!! Nevermind, doesn't work !!!!

    The problem is, it doesn't work correctly. I changed the number of rows to 20 and the random number generators to 10 and here's what I got very first time out...

    ID idMvt1 idMvt2 idMvt3

    ----------- ----------- ----------- -----------

    1 8 9 1

    2 10 3 10

    3 10 1 4

    4 5 2 6

    7 7 9 6

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Agh! Sorry, Lynn, I missed where you said...

    !!!! Nevermind, doesn't work !!!!

    ... at the bottom of your post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/24/2011)


    Agh! Sorry, Lynn, I missed where you said...

    !!!! Nevermind, doesn't work !!!!

    ... at the bottom of your post.

    That's okay, I think I fixed it!

    Here is the corrected code:

    with MyIDs (

    Mvt,

    ID

    ) as (

    select

    idMvt1,

    ID

    from

    #Sample

    union all

    select

    idMvt2,

    ID

    from

    #Sample

    union all

    select

    idMvt3,

    ID

    from

    #Sample

    ),

    UniqueMvtIDs (

    Mvt,

    ID

    ) as (

    select

    Mvt,

    min(ID)

    from

    MyIDs

    group by

    Mvt

    )

    select distinct

    s.*

    into

    #Results

    from

    #Sample s

    where

    s.idMvt1 in (select umi.Mvt from UniqueMvtIDs umi where s.ID <= umi.ID)

    and s.idMvt2 in (select umi.Mvt from UniqueMvtIDs umi where s.ID <= umi.ID)

    and s.idMvt3 in (select umi.Mvt from UniqueMvtIDs umi where s.ID <= umi.ID)

    ;

    Use the original test suite from my previous post.

    And here is the simple timing of the code with output to the screen:

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Sample_____________________________________________________________________________________________________________000000000059'. Scan count 30, logical reads 31020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 22015 ms, elapsed time = 12492 ms.

    And writing to a temp table (#Results):

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Sample_____________________________________________________________________________________________________________00000000005C'. Scan count 30, logical reads 31020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 21873 ms, elapsed time = 11964 ms.

  • Excellent solution. Good Job Mr. Magoo

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (1/26/2011)


    Excellent solution. Good Job Mr. Magoo

    Thank you - it would be good to know how the OP got on..hint.hint...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Okay, tested my solution against that of mister.magoo and his won. Mine returns the same result set with the same sample data.

    I guess I'm going to have to start playing more with pivot and unpivot.

    Well done, sir!

  • Hi everyone !!

    like I've said I opted for a CLR solution, even if we do prefer a sql one. it was basically because I was short in time. but right now I'm working on these solutions and I have to say The results were quite a surprise!

    I attached the set I'm using for testing, it contains 64178 rows, this is far from a set with millions of rows but I have the expected results here

    I tested 4 solutions :

    - Jeff Moden & Peso's solution within 12 sec it throws a max recursivity exception

    - the first solution of Mister Magoo and Lynn Pettis's solutions were good but doesn't return the expected result

    - ColdCoffee's first solutio turn in 608 ms but it returns only 158 rows out of the expected 163

    - Mister Magoo's final solution was perfect (even if I couldn't turn it on temporay table or table variable) it returns the 163 combinations in 3 seconds

    I will shortly post the result on a larger scale.

  • Viewing 15 posts - 61 through 75 (of 75 total)

    You must be logged in to reply to this topic. Login to reply