January 24, 2011 at 3:12 am
Jeff Moden (1/23/2011)
OMG! How did the old cartoon series used to say it? "Ahhhh, Magoo... You've done it again!". Well done. NOW I can go to bed! 😛Even on my 9 year old, single 1.8Ghz P4 CPU, it still only takes 14 seconds on a million rows. Freakin' awesome job, Magoo. 🙂
Yes - and I went to bed after that, too!
Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?
Peso - Any chance you could elaborate about the effects of partitioning on scope_identity() for me - that one is beyond my current knowledge.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 4:53 am
mister.magoo (1/24/2011)
Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?
Its not the looping that will parallelise , but the insert statement itself.
TBH , i very very very much doubt if it will hit here, but , having being bitten once , im twice shy :).
January 24, 2011 at 5:03 am
Dave Ballantyne (1/24/2011)
mister.magoo (1/24/2011)
Dave - Good point, thanks - do you think SQL will ever Parallelise (?) a while loop - not that I am arguing against MAXDOP 1, just curious as to functionality of the optimiser?Its not the looping that will parallelise , but the insert statement itself.
TBH , i very very very much doubt if it will hit here, but , having being bitten once , im twice shy :).
Thanks Dave! Very useful information.
In this case I feel it is very unlikely as the insert will only ever insert one row at a time - unless I am very mistaken about the problem!
But, I will update the code to include MAXDOP 1 as soon as I get a moment - as we all know "unlikely" still means possible!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 5:13 am
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?
January 24, 2011 at 5:19 am
Dave Ballantyne (1/24/2011)
WHILE @@ROWCOUNT=3
Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...
January 24, 2011 at 5:22 am
Dave Ballantyne (1/24/2011)
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?
As Magoo UNPIVOTs the data to be inserted, each ID (which will constitute 3 columns UNPIVOTed to 3 rows ) which meet the requirement will end up with 3 rows. If UNPIVOT could be avoided (which i believe can be) then he can change it something else.. Am i right , Mister Magoo ?
BTW, fantastic piece of code that..
January 24, 2011 at 5:26 am
Dave Ballantyne (1/24/2011)
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?
Well, sort of yes! Three rows, but they all have the same IDENTITY value - bit of a sneaky trick 😉
So, no matter which value of ID that SCOPE_IDENTITY returned - they are all the same!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 5:30 am
ColdCoffee (1/24/2011)
Dave Ballantyne (1/24/2011)
WHILE @@ROWCOUNT=3
Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...
The UNPIVOT (or a CROSS APPLY / SELECT UNION ) is required to populate the #accepted table with all VALUES that have been used.
This is so that the query can check the three idMvt? columns in the current row against a simple one column table of used values to exclude rows that contain duplicates.
If you can find another way to do that check as quickly (or quicker) then I am all ears - ever eager to learn 😛
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 5:32 am
Addendum:
To clarify , the WHILE @@ROWCOUNT=3 is just a way of looping until there are no more rows, it could just as easily have been WHILE @@ROWCOUNT<>0, but I prefer to make positive selections where possible.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 5:35 am
mister.magoo (1/24/2011)
Dave Ballantyne (1/24/2011)
Im speaking from a point of pure ignorance about your code, ie i havent run in through, 😉 , but
WHILE @@ROWCOUNT=3
So , you are inserting 3 rows at a time , correct ?
Well, sort of yes! Three rows, but they all have the same IDENTITY value - bit of a sneaky trick 😉
So, no matter which value of ID that SCOPE_IDENTITY returned - they are all the same!
OH yes, i get it.. i totally forgot about the temp table ; sorry 🙂
January 24, 2011 at 5:48 am
Yes , very sneaky 🙂
Ok , so that shouldn't be a problem....
Getting back to the original issue though , although there may be X millions rows , how many unique values of id are there in the real data ?
January 24, 2011 at 5:54 am
Dave Ballantyne (1/24/2011)
Yes , very sneaky 🙂Ok , so that shouldn't be a problem....
Getting back to the original issue though , although there may be X millions rows , how many unique values of id are there in the real data ?
Obviously it is for the OP to answer that, but it is my understanding in the original data , "one row is one unique id"
The sample data did not include an id - I think Jeff added an identity to his test data...
So it may need a pre-process row_number to assign one...however there has to be something by which we can sort the data otherwise the question needs redefining...
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 5:56 am
moadh.bs (1/24/2011)
Thx everyone I would never hoped for this number of solutions, I will shrotly post the results.
Moad,
The final solution that Magoo posted is the right one. It both works correctly and it's nasty fast. I tested it with a million rows. With 10,000 unique ID's in the idMVT columns, it returns in 14 seconds. With 100,000 unique ID's in the idMVT columns, it returns in 22 seconds. And THAT, my friend, is on a 9 year old, single 1.8 GHz CPU!:w00t:
Dave Ballantyne is correct though. Add OPTION(MAXDOP 1) to the very end of the query.
And, to be sure, you DO need an IDENTITY column on the original data to preserve the "original order of insert" or this just isn't going to work. In fact, no method will be guaranteed to work without it.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2011 at 6:00 am
mister.magoo (1/24/2011)
ColdCoffee (1/24/2011)
Dave Ballantyne (1/24/2011)
WHILE @@ROWCOUNT=3
Even i have a doubt in the code , Magoo... Do we really need UNPIVOT ? i guess, your code will work even without the UNPIVOT , which of course will drag down the processing time further. Please shed some light on it...
The UNPIVOT (or a CROSS APPLY / SELECT UNION ) is required to populate the #accepted table with all VALUES that have been used.
This is so that the query can check the three idMvt? columns in the current row against a simple one column table of used values to exclude rows that contain duplicates.
If you can find another way to do that check as quickly (or quicker) then I am all ears - ever eager to learn 😛
Just an FYI... when I used the CROSS APPLY/SELECT UNION ALL method last night, it shave about 7% of the time off on my old box.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2011 at 6:08 am
I guess this is another "depends" moment re: unpivot vs cross apply/select union all.
in my initial tests unpivot was faster. i will re-test.
thanks Jeff
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 31 through 45 (of 75 total)
You must be logged in to reply to this topic. Login to reply