Filter a table to get unique rows in a certain logic

  • Jeff Moden (1/23/2011)


    OMG! How did the old cartoon series used to say it? "Ahhhh, Magoo... You've done it again!". Well done. NOW I can go to bed! 😛

    Even on my 9 year old, single 1.8Ghz P4 CPU, it still only takes 14 seconds on a million rows. Freakin' awesome job, Magoo. 🙂

    Yes - and I went to bed after that, too!

    Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?

    Peso - Any chance you could elaborate about the effects of partitioning on scope_identity() for me - that one is beyond my current knowledge.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (1/24/2011)


    Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?

    Its not the looping that will parallelise , but the insert statement itself.

    TBH , i very very very much doubt if it will hit here, but , having being bitten once , im twice shy :).

    http://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (1/24/2011)


    mister.magoo (1/24/2011)


    Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?

    Its not the looping that will parallelise , but the insert statement itself.

    TBH , i very very very much doubt if it will hit here, but , having being bitten once , im twice shy :).

    http://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value

    Thanks Dave! Very useful information.

    In this case I feel it is very unlikely as the insert will only ever insert one row at a time - unless I am very mistaken about the problem!

    But, I will update the code to include MAXDOP 1 as soon as I get a moment - as we all know "unlikely" still means possible!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but

    WHILE @@ROWCOUNT=3

    So , you are inserting 3 rows at a time , correct ?



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (1/24/2011)


    WHILE @@ROWCOUNT=3

    Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...

  • Dave Ballantyne (1/24/2011)


    Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but

    WHILE @@ROWCOUNT=3

    So , you are inserting 3 rows at a time , correct ?

    As Magoo UNPIVOTs the data to be inserted, each ID (which will constitute 3 columns UNPIVOTed to 3 rows ) which meet the requirement will end up with 3 rows. If UNPIVOT could be avoided (which i believe can be) then he can change it something else.. Am i right , Mister Magoo ?

    BTW, fantastic piece of code that..

  • Dave Ballantyne (1/24/2011)


    Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but

    WHILE @@ROWCOUNT=3

    So , you are inserting 3 rows at a time , correct ?

    Well, sort of yes! Three rows, but they all have the same IDENTITY value - bit of a sneaky trick 😉

    So, no matter which value of ID that SCOPE_IDENTITY returned - they are all the same!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • ColdCoffee (1/24/2011)


    Dave Ballantyne (1/24/2011)


    WHILE @@ROWCOUNT=3

    Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...

    The UNPIVOT (or a CROSS APPLY / SELECT UNION ) is required to populate the #accepted table with all VALUES that have been used.

    This is so that the query can check the three idMvt? columns in the current row against a simple one column table of used values to exclude rows that contain duplicates.

    If you can find another way to do that check as quickly (or quicker) then I am all ears - ever eager to learn 😛

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Addendum:

    To clarify , the WHILE @@ROWCOUNT=3 is just a way of looping until there are no more rows, it could just as easily have been WHILE @@ROWCOUNT<>0, but I prefer to make positive selections where possible.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (1/24/2011)


    Dave Ballantyne (1/24/2011)


    Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but

    WHILE @@ROWCOUNT=3

    So , you are inserting 3 rows at a time , correct ?

    Well, sort of yes! Three rows, but they all have the same IDENTITY value - bit of a sneaky trick 😉

    So, no matter which value of ID that SCOPE_IDENTITY returned - they are all the same!

    OH yes, i get it.. i totally forgot about the temp table ; sorry 🙂

  • Yes , very sneaky 🙂

    Ok , so that shouldn't be a problem....

    Getting back to the original issue though , although there may be X millions rows , how many unique values of id are there in the real data ?



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (1/24/2011)


    Yes , very sneaky 🙂

    Ok , so that shouldn't be a problem....

    Getting back to the original issue though , although there may be X millions rows , how many unique values of id are there in the real data ?

    Obviously it is for the OP to answer that, but it is my understanding in the original data , "one row is one unique id"

    The sample data did not include an id - I think Jeff added an identity to his test data...

    So it may need a pre-process row_number to assign one...however there has to be something by which we can sort the data otherwise the question needs redefining...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • moadh.bs (1/24/2011)


    Thx everyone I would never hoped for this number of solutions, I will shrotly post the results.

    Moad,

    The final solution that Magoo posted is the right one. It both works correctly and it's nasty fast. I tested it with a million rows. With 10,000 unique ID's in the idMVT columns, it returns in 14 seconds. With 100,000 unique ID's in the idMVT columns, it returns in 22 seconds. And THAT, my friend, is on a 9 year old, single 1.8 GHz CPU!:w00t:

    Dave Ballantyne is correct though. Add OPTION(MAXDOP 1) to the very end of the query.

    And, to be sure, you DO need an IDENTITY column on the original data to preserve the "original order of insert" or this just isn't going to work. In fact, no method will be guaranteed to work without it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mister.magoo (1/24/2011)


    ColdCoffee (1/24/2011)


    Dave Ballantyne (1/24/2011)


    WHILE @@ROWCOUNT=3

    Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...

    The UNPIVOT (or a CROSS APPLY / SELECT UNION ) is required to populate the #accepted table with all VALUES that have been used.

    This is so that the query can check the three idMvt? columns in the current row against a simple one column table of used values to exclude rows that contain duplicates.

    If you can find another way to do that check as quickly (or quicker) then I am all ears - ever eager to learn 😛

    Just an FYI... when I used the CROSS APPLY/SELECT UNION ALL method last night, it shave about 7% of the time off on my old box.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess this is another "depends" moment re: unpivot vs cross apply/select union all.

    in my initial tests unpivot was faster. i will re-test.

    thanks Jeff

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 15 posts - 31 through 45 (of 75 total)

    You must be logged in to reply to this topic. Login to reply