Filter a table to get unique rows in a certain logic

  • Agreed. "It Depends". I've found that my old box does some things quite differently especially when compared to something like a laptop.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/24/2011)


    Agreed. "It Depends". I've found that my old box does some things quite differently especially when compared to something like a laptop.

    Ok, had a brief moment during lunch to test this and found that for 1M rows:

    Number Pool Size: 100

    CROSS APPLY : CPU: 10826 Reads: 2947278 Writes: 0 Duration 10939

    UNPIVOT : CPU: 11357 Reads: 2948994 Writes: 1 Duration 11511

    Number Pool Size: 1000

    CROSS APPLY : CPU: 3604 Reads: 2065474 Writes: 0 Duration 3728

    UNPIVOT : CPU: 3042 Reads: 2063737 Writes: 0 Duration 3143

    Number Pool Size: 10000

    CROSS APPLY : CPU: 1295 Reads: 204247 Writes: 51 Duration 1395

    UNPIVOT : CPU: 1279 Reads: 204260 Writes: 46 Duration 1424

    So, it seems on my laptop (Core2 Duo 2.4Ghz SQL2008 32nit 3GB RAM) there is very little between them.

    Now, the big game changer is when the number pool is 100,000 - I need to find a way to handle that because at the moment my code is worse than using a pen and paper - at 11 minutes + and still going I cancelled...:angry:

    EDIT: Jeff - any chance you could post or PM me the code you used to generate 1M rows with 100,000 unique ids ? I can't think why yours finished in 14 secs and mine is out of control...

    EDIT2: Scrap that - problem found and fixed - new faster code coming soon....

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • New Code :

    1. modified the index on the temp table #accepted to be unique

    2. added FASTFIRSTROW table hint

    3. replaced UNPIVOT with CROSS APPLY/SELECT UNION ALL (might be slightly quicker - might not)

    4. Added OPTION(MAXDOP 1) for safety.

    SET NOCOUNT ON;

    IF OBJECT_ID ('tempdb..#accepted') IS NOT NULL DROP TABLE #accepted;

    CREATE TABLE #accepted

    (

    ID INT IDENTITY(1,1) ,

    VALUE INT

    );

    --= Modified to be UNIQUE - speed increase on large Number Pools (100,000+)

    CREATE UNIQUE INDEX ix1 ON #accepted(VALUE);

    --= allow identity inserts - this is key to the solution

    SET IDENTITY_INSERT #accepted ON;

    --= pop the first row in

    INSERT #accepted(ID,VALUE)

    SELECT ID,VALUE

    FROM mvts AS o

    UNPIVOT (VALUE FOR c IN (idMvt1,idMvt2,idMvt3)) upvt

    WHERE id=1;

    --== now do a yucky WHILE loop to process the rows one at a time into the accepted table.

    --== ROWCOUNT=3 because I unpivot the values - 3 rows per ID

    WHILE @@ROWCOUNT=3

    INSERT #accepted(ID,VALUE)

    SELECT ID, VALUE

    FROM (

    SELECT TOP 1 ID,idMvt1,idMvt2,idMvt3

    FROM mvts AS o WITH(FASTFIRSTROW)

    WHERE NOT EXISTS (SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt1)

    AND NOT EXISTS ( SELECT1 FROM #accepted AS a WHEREa.value = o.idMvt2)

    AND NOT EXISTS ( SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt3)

    AND id>SCOPE_IDENTITY() --= using SCOPE_IDENTITY to tell me the last ID I stored in the table.

    ORDER BY id

    ) NextRow(ID,idMvt1,idMvt2,idMvt3)

    CROSS APPLY(SELECT idMvt1 as VALUE union all SELECT idMvt2 union all SELECT idMvt3) upvt

    OPTION(MAXDOP 1); --== Added to prevent issues with SCOPE_IDENTITY and parallelism - Just In Case.

    SET IDENTITY_INSERT #accepted OFF;

    SELECT mvts.id,mvts.idMvt1,mvts.idMvt2,mvts.idMvt3

    FROM mvts AS mvts

    WHERE EXISTS (SELECT 1 FROM #accepted AS a WHERE a.id=mvts.id)

    ORDER BY mvts.id;

    This version happily chunks it way through 1M rows made up from a pool of 100,000 unique numbers in about 6 seconds on my Core2 Duo 2.4Ghz Laptop.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • And the other Pool sizes? 100, 1000 and 10000.

    And just in case ID doesn't start with 1, changeWHERE id=1;toWHERE id=(select min(id) from mvts);


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (1/24/2011)


    And the other Pool sizes? 100, 1000 and 10000.

    They are fine..?? if that's what you mean? The smaller pool sizes were ok anyway - adding the UNIQUEness to the index was something I should have done straight-away and only really affects the larger pool significantly.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I meant, what is the new timings for all pool sizes; 100, 1000, 10000 and 100000 unique records on a million sample dataset?


    N 56°04'39.16"
    E 12°55'05.25"

  • Heh, I rather hoped you didn't - as I was timing them in profiler, pausing between each one so that I could test a few tweaks along the way and just near then end I accidentally clicked STOP instead of pause, so the next time I started the trace it cleared all the previous results...

    I will measure again later, but off the top of my head, they were mostly around the three second mark.

    One thing I did find was that for a Pool of 100 numbers, it was quicker to include an OPTION(RECOMPILE) on the INSERT in the WHILE loop, but as that was detrimental to larger pools I left it out.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • MM - very impressive solution. Very well done!

    Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/24/2011)


    MM - very impressive solution. Very well done!

    Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)

    No, I didn't. Are you suggesting that for the #accepted table?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (1/24/2011)


    WayneS (1/24/2011)


    MM - very impressive solution. Very well done!

    Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)

    No, I didn't. Are you suggesting that for the #accepted table?

    That is what I was thinking, and being a local temporary table (no one else has access to it), I don't think it would make any difference.

    However, your question got me wondering if it would help any on the mvts table...

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/24/2011)


    mister.magoo (1/24/2011)


    WayneS (1/24/2011)


    MM - very impressive solution. Very well done!

    Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)

    No, I didn't. Are you suggesting that for the #accepted table?

    That is what I was thinking, and being a local temporary table (no one else has access to it), I don't think it would make any difference.

    However, your question got me wondering if it would help any on the mvts table...

    I will give it a try in a bit...and let you know.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • TABLOCKX on mvts knock about 0.5 second off the time (14%) for a Pool of 10,000

    I will add this tweak to the code when I run the performance tests later.

    Good call Wayne!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Hmm , not sure if ive understood the problem correctly as im seeing different results...

    But , just to add to the discussion ....

    A couple of sorts im not to happy with, but are the results correct ?

    with cteExplode

    as

    (

    Select id as pkid,

    case a.a when 1 then idMvt1

    when 2 then idMvt2

    when 3 then idMvt3 end as idMvt

    from #sample cross join (select 1 as a union all select 2 union all select 3) as a

    )

    ,

    cteMins

    as(

    select idmvt,MIN(pkid) as pkid

    from cteExplode

    group by idMvt

    )

    Select pkid

    from cteMins

    group by pkid

    having COUNT(*) = 3



    Clear Sky SQL
    My Blog[/url]

  • Dave,

    I haven't tested yours, but can see that it does not meet with Sir Jeff's strict guidelines....

    Seriously, a simple test is this:

    INSERT MVTS(IdMvt1,idMvt2,idMvt3)

    SELECT 1,2,3 UNION ALL

    SELECT 3,4,5 UNION ALL

    SELECT 4,5,6

    Rows 1 and 3 should be selected

    Row 2 is not selected because it is not the first occurrence of "3"

    Row 3 is selected because Row 2 is not, so Row 3's "4" and "5" are now the first occurrences of those numbers.

    There are a few posts with test data in that you can use.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Ok , thanks for clarifying



    Clear Sky SQL
    My Blog[/url]

  • Viewing 15 posts - 46 through 60 (of 75 total)

    You must be logged in to reply to this topic. Login to reply