January 24, 2011 at 6:18 am
Agreed. "It Depends". I've found that my old box does some things quite differently especially when compared to something like a laptop.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2011 at 7:31 am
Jeff Moden (1/24/2011)
Agreed. "It Depends". I've found that my old box does some things quite differently especially when compared to something like a laptop.
Ok, had a brief moment during lunch to test this and found that for 1M rows:
Number Pool Size: 100
CROSS APPLY : CPU: 10826 Reads: 2947278 Writes: 0 Duration 10939
UNPIVOT : CPU: 11357 Reads: 2948994 Writes: 1 Duration 11511
Number Pool Size: 1000
CROSS APPLY : CPU: 3604 Reads: 2065474 Writes: 0 Duration 3728
UNPIVOT : CPU: 3042 Reads: 2063737 Writes: 0 Duration 3143
Number Pool Size: 10000
CROSS APPLY : CPU: 1295 Reads: 204247 Writes: 51 Duration 1395
UNPIVOT : CPU: 1279 Reads: 204260 Writes: 46 Duration 1424
So, it seems on my laptop (Core2 Duo 2.4Ghz SQL2008 32nit 3GB RAM) there is very little between them.
Now, the big game changer is when the number pool is 100,000 - I need to find a way to handle that because at the moment my code is worse than using a pen and paper - at 11 minutes + and still going I cancelled...:angry:
EDIT: Jeff - any chance you could post or PM me the code you used to generate 1M rows with 100,000 unique ids ? I can't think why yours finished in 14 secs and mine is out of control...
EDIT2: Scrap that - problem found and fixed - new faster code coming soon....
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 8:00 am
New Code :
1. modified the index on the temp table #accepted to be unique
2. added FASTFIRSTROW table hint
3. replaced UNPIVOT with CROSS APPLY/SELECT UNION ALL (might be slightly quicker - might not)
4. Added OPTION(MAXDOP 1) for safety.
SET NOCOUNT ON;
IF OBJECT_ID ('tempdb..#accepted') IS NOT NULL DROP TABLE #accepted;
CREATE TABLE #accepted
(
ID INT IDENTITY(1,1) ,
VALUE INT
);
--= Modified to be UNIQUE - speed increase on large Number Pools (100,000+)
CREATE UNIQUE INDEX ix1 ON #accepted(VALUE);
--= allow identity inserts - this is key to the solution
SET IDENTITY_INSERT #accepted ON;
--= pop the first row in
INSERT #accepted(ID,VALUE)
SELECT ID,VALUE
FROM mvts AS o
UNPIVOT (VALUE FOR c IN (idMvt1,idMvt2,idMvt3)) upvt
WHERE id=1;
--== now do a yucky WHILE loop to process the rows one at a time into the accepted table.
--== ROWCOUNT=3 because I unpivot the values - 3 rows per ID
WHILE @@ROWCOUNT=3
INSERT #accepted(ID,VALUE)
SELECT ID, VALUE
FROM (
SELECT TOP 1 ID,idMvt1,idMvt2,idMvt3
FROM mvts AS o WITH(FASTFIRSTROW)
WHERE NOT EXISTS (SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt1)
AND NOT EXISTS ( SELECT1 FROM #accepted AS a WHEREa.value = o.idMvt2)
AND NOT EXISTS ( SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt3)
AND id>SCOPE_IDENTITY() --= using SCOPE_IDENTITY to tell me the last ID I stored in the table.
ORDER BY id
) NextRow(ID,idMvt1,idMvt2,idMvt3)
CROSS APPLY(SELECT idMvt1 as VALUE union all SELECT idMvt2 union all SELECT idMvt3) upvt
OPTION(MAXDOP 1); --== Added to prevent issues with SCOPE_IDENTITY and parallelism - Just In Case.
SET IDENTITY_INSERT #accepted OFF;
SELECT mvts.id,mvts.idMvt1,mvts.idMvt2,mvts.idMvt3
FROM mvts AS mvts
WHERE EXISTS (SELECT 1 FROM #accepted AS a WHERE a.id=mvts.id)
ORDER BY mvts.id;
This version happily chunks it way through 1M rows made up from a pool of 100,000 unique numbers in about 6 seconds on my Core2 Duo 2.4Ghz Laptop.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 8:13 am
And the other Pool sizes? 100, 1000 and 10000.
And just in case ID doesn't start with 1, changeWHERE id=1;
toWHERE id=(select min(id) from mvts);
N 56°04'39.16"
E 12°55'05.25"
January 24, 2011 at 8:18 am
SwePeso (1/24/2011)
And the other Pool sizes? 100, 1000 and 10000.
They are fine..?? if that's what you mean? The smaller pool sizes were ok anyway - adding the UNIQUEness to the index was something I should have done straight-away and only really affects the larger pool significantly.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 8:22 am
I meant, what is the new timings for all pool sizes; 100, 1000, 10000 and 100000 unique records on a million sample dataset?
N 56°04'39.16"
E 12°55'05.25"
January 24, 2011 at 8:52 am
Heh, I rather hoped you didn't - as I was timing them in profiler, pausing between each one so that I could test a few tweaks along the way and just near then end I accidentally clicked STOP instead of pause, so the next time I started the trace it cleared all the previous results...
I will measure again later, but off the top of my head, they were mostly around the three second mark.
One thing I did find was that for a Pool of 100 numbers, it was quicker to include an OPTION(RECOMPILE) on the INSERT in the WHILE loop, but as that was detrimental to larger pools I left it out.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 8:58 am
MM - very impressive solution. Very well done!
Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
January 24, 2011 at 9:05 am
WayneS (1/24/2011)
MM - very impressive solution. Very well done!Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)
No, I didn't. Are you suggesting that for the #accepted table?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 9:14 am
mister.magoo (1/24/2011)
WayneS (1/24/2011)
MM - very impressive solution. Very well done!Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)
No, I didn't. Are you suggesting that for the #accepted table?
That is what I was thinking, and being a local temporary table (no one else has access to it), I don't think it would make any difference.
However, your question got me wondering if it would help any on the mvts table...
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
January 24, 2011 at 9:26 am
WayneS (1/24/2011)
mister.magoo (1/24/2011)
WayneS (1/24/2011)
MM - very impressive solution. Very well done!Out of curiosity... did you try the tablockx hint? (I don't think it would make a difference.)
No, I didn't. Are you suggesting that for the #accepted table?
That is what I was thinking, and being a local temporary table (no one else has access to it), I don't think it would make any difference.
However, your question got me wondering if it would help any on the mvts table...
I will give it a try in a bit...and let you know.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 9:32 am
TABLOCKX on mvts knock about 0.5 second off the time (14%) for a Pool of 10,000
I will add this tweak to the code when I run the performance tests later.
Good call Wayne!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 12:00 pm
Hmm , not sure if ive understood the problem correctly as im seeing different results...
But , just to add to the discussion ....
A couple of sorts im not to happy with, but are the results correct ?
with cteExplode
as
(
Select id as pkid,
case a.a when 1 then idMvt1
when 2 then idMvt2
when 3 then idMvt3 end as idMvt
from #sample cross join (select 1 as a union all select 2 union all select 3) as a
)
,
cteMins
as(
select idmvt,MIN(pkid) as pkid
from cteExplode
group by idMvt
)
Select pkid
from cteMins
group by pkid
having COUNT(*) = 3
January 24, 2011 at 12:41 pm
Dave,
I haven't tested yours, but can see that it does not meet with Sir Jeff's strict guidelines....
Seriously, a simple test is this:
INSERT MVTS(IdMvt1,idMvt2,idMvt3)
SELECT 1,2,3 UNION ALL
SELECT 3,4,5 UNION ALL
SELECT 4,5,6
Rows 1 and 3 should be selected
Row 2 is not selected because it is not the first occurrence of "3"
Row 3 is selected because Row 2 is not, so Row 3's "4" and "5" are now the first occurrences of those numbers.
There are a few posts with test data in that you can use.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 24, 2011 at 1:19 pm
Ok , thanks for clarifying
Viewing 15 posts - 46 through 60 (of 75 total)
You must be logged in to reply to this topic. Login to reply