Filter a table to get unique rows in a certain logic

Question

Filter a table to get unique rows in a certain logic

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

mister.magoo SSC-Forever Points: 47174 More actions · Answer 1

Jeff Moden (1/23/2011)
OMG! How did the old cartoon series used to say it? "Ahhhh, Magoo... You've done it again!". Well done. NOW I can go to bed! 😛
Even on my 9 year old, single 1.8Ghz P4 CPU, it still only takes 14 seconds on a million rows. Freakin' awesome job, Magoo. 🙂

Yes - and I went to bed after that, too!

Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?

Peso - Any chance you could elaborate about the effects of partitioning on scope_identity() for me - that one is beyond my current knowledge.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 2

mister.magoo (1/24/2011)
Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?

Its not the looping that will parallelise , but the insert statement itself.

TBH , i very very very much doubt if it will hit here, but , having being bitten once , im twice shy :).

http://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value

Clear Sky SQL
My Blog[/url]

mister.magoo SSC-Forever Points: 47174 More actions · Answer 3

Dave Ballantyne (1/24/2011)
mister.magoo (1/24/2011)
Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?
Its not the looping that will parallelise , but the insert statement itself.
TBH , i very very very much doubt if it will hit here, but , having being bitten once , im twice shy :).
http://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value

Thanks Dave! Very useful information.

In this case I feel it is very unlikely as the insert will only ever insert one row at a time - unless I am very mistaken about the problem!

But, I will update the code to include MAXDOP 1 as soon as I get a moment - as we all know "unlikely" still means possible!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 4

Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but

WHILE @@ROWCOUNT=3

So , you are inserting 3 rows at a time , correct ?

Clear Sky SQL
My Blog[/url]

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 5

Dave Ballantyne (1/24/2011)
WHILE @@ROWCOUNT=3

Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 6

Dave Ballantyne (1/24/2011)
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?

As Magoo UNPIVOTs the data to be inserted, each ID (which will constitute 3 columns UNPIVOTed to 3 rows ) which meet the requirement will end up with 3 rows. If UNPIVOT could be avoided (which i believe can be) then he can change it something else.. Am i right , Mister Magoo ?

BTW, fantastic piece of code that..

mister.magoo SSC-Forever Points: 47174 More actions · Answer 7

Dave Ballantyne (1/24/2011)
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?

Well, sort of yes! Three rows, but they all have the same IDENTITY value - bit of a sneaky trick 😉

So, no matter which value of ID that SCOPE_IDENTITY returned - they are all the same!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 8

ColdCoffee (1/24/2011)
Dave Ballantyne (1/24/2011)
WHILE @@ROWCOUNT=3
Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...

The UNPIVOT (or a CROSS APPLY / SELECT UNION ) is required to populate the #accepted table with all VALUES that have been used.

This is so that the query can check the three idMvt? columns in the current row against a simple one column table of used values to exclude rows that contain duplicates.

If you can find another way to do that check as quickly (or quicker) then I am all ears - ever eager to learn 😛

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 9

Addendum:

To clarify , the WHILE @@ROWCOUNT=3 is just a way of looping until there are no more rows, it could just as easily have been WHILE @@ROWCOUNT<>0, but I prefer to make positive selections where possible.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 10

mister.magoo (1/24/2011)
Dave Ballantyne (1/24/2011)
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?
Well, sort of yes! Three rows, but they all have the same IDENTITY value - bit of a sneaky trick 😉
So, no matter which value of ID that SCOPE_IDENTITY returned - they are all the same!

OH yes, i get it.. i totally forgot about the temp table ; sorry 🙂

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 11

Yes , very sneaky 🙂

Ok , so that shouldn't be a problem....

Getting back to the original issue though , although there may be X millions rows , how many unique values of id are there in the real data ?

Clear Sky SQL
My Blog[/url]

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

Dave Ballantyne (1/24/2011)
Yes , very sneaky 🙂
Ok , so that shouldn't be a problem....
Getting back to the original issue though , although there may be X millions rows , how many unique values of id are there in the real data ?

Obviously it is for the OP to answer that, but it is my understanding in the original data , "one row is one unique id"

The sample data did not include an id - I think Jeff added an identity to his test data...

So it may need a pre-process row_number to assign one...however there has to be something by which we can sort the data otherwise the question needs redefining...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 13

moadh.bs (1/24/2011)
Thx everyone I would never hoped for this number of solutions, I will shrotly post the results.

Moad,

The final solution that Magoo posted is the right one. It both works correctly and it's nasty fast. I tested it with a million rows. With 10,000 unique ID's in the idMVT columns, it returns in 14 seconds. With 100,000 unique ID's in the idMVT columns, it returns in 22 seconds. And THAT, my friend, is on a 9 year old, single 1.8 GHz CPU!:w00t:

Dave Ballantyne is correct though. Add OPTION(MAXDOP 1) to the very end of the query.

And, to be sure, you DO need an IDENTITY column on the original data to preserve the "original order of insert" or this just isn't going to work. In fact, no method will be guaranteed to work without it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 14

mister.magoo (1/24/2011)
ColdCoffee (1/24/2011)
Dave Ballantyne (1/24/2011)
WHILE @@ROWCOUNT=3
Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...
The UNPIVOT (or a CROSS APPLY / SELECT UNION ) is required to populate the #accepted table with all VALUES that have been used.
This is so that the query can check the three idMvt? columns in the current row against a simple one column table of used values to exclude rows that contain duplicates.
If you can find another way to do that check as quickly (or quicker) then I am all ears - ever eager to learn 😛

Just an FYI... when I used the CROSS APPLY/SELECT UNION ALL method last night, it shave about 7% of the time off on my old box.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 15

I guess this is another "depends" moment re: unpivot vs cross apply/select union all.

in my initial tests unpivot was faster. i will re-test.

thanks Jeff

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);