August 24, 2007 at 9:01 am
LOL Jeff
Actually, the results from my tests above don't show that the data is 'more random' with checksum - only that the queries are faster for my data.
I got the 'more random' point from the post below from Adam Machanic, who got the idea from Itzik Ben-Gan (both good guys, as you probably know).
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
August 24, 2007 at 9:17 am
Adrian,
I've just had a go at implementing your suggestion on my data (I really don't have time, but I can't help myself ). I've called it method #4.
It's quite long, so I'm going to post 3 messages - this one, 1 for the code I used (so you can tell me if I've implemented anything incorrectly), and 1 for results (so you can see the raw data for yourselves).
The upshot (for my data) is that #3 is faster in general, but #4 is pretty much the same for small data sets (it's generally slower for 1-10 rows, and slightly faster for 10-about 2000 rows).
Please post if you do tests on your own data, I'd be interested to see any differing results.
Ryan
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
August 24, 2007 at 9:20 am
Here's the code I used (revised from the previous code to include method #4)...
--table to hold results
CREATE TABLE #Results (NumberOfTableRows INT, NumberOfRows INT, ProbabilityOfEachRow FLOAT,
Method1_TimeInMs INT, Method2_TimeInMs INT, Method3_TimeInMs INT, Method4_TimeInMs INT)
SET NOCOUNT ON
DECLARE @Counter TINYINT
SET @Counter = 0
WHILE @Counter <= 100 --perform the comparison this many times
BEGIN
DECLARE @NumberOfRows INT
SET @NumberOfRows = power(10, rand()*5) --random sample size to test with
--
DECLARE @d DATETIME, @t1 INT, @t2 INT, @t3 INT, @t4 INT
SET @d = GETDATE();
--Method 1
SELECT TOP (@NumberOfRows) MessageId FROM dbo.Messages ORDER BY NEWID()
--
SET @t1 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();
--Method 2
SELECT TOP (@NumberOfRows) MessageId FROM dbo.Messages ORDER BY CHECKSUM(NEWID())
--
SET @t2 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();
--Method 3
DECLARE @NumberOfTableRows INT, @ProbabilityOfEachRow FLOAT
SELECT @NumberOfTableRows = COUNT(*) from dbo.Messages
SET @ProbabilityOfEachRow = (1+SQRT(1.0/@NumberOfRows)*10) * @NumberOfRows / @NumberOfTableRows --weighted overestimate
SELECT TOP (@NumberOfRows) MessageId FROM Messages --select required number from overestimated sample
WHERE @ProbabilityOfEachRow >= CAST(CHECKSUM(NEWID(), MessageId) & 0x7fffffff AS FLOAT) / CAST (0x7fffffff AS INT)
--
DECLARE @TV table(rand_num int)
DELETE FROM @TV
SET @t3 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();
--Method 4
SELECT @NumberOfTableRows = COUNT(*) from dbo.Messages;
DECLARE @i int
SET @i = 0
WHILE (@i < @NumberOfRows)
BEGIN
INSERT @TV values (ceiling(rand() * @NumberOfTableRows))
SET @i = @i + 1
END
SELECT MessageId FROM
(SELECT ROW_NUMBER() OVER (ORDER BY MessageId) row, * from dbo.Messages) a
WHERE a.row in (SELECT rand_num from @TV)
--
SET @t4 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();
--Insert results
INSERT #Results SELECT @NumberOfTableRows, @NumberOfRows, @ProbabilityOfEachRow, @t1, @t2, @t3, @t4
--
SET @Counter = @Counter + 1
END
--select results and tidy up
PRINT 'Summary'
SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',
AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'
FROM #Results
PRINT 'Summary: NumberOfRows < 2000'
SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',
AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'
FROM #Results WHERE NumberOfRows < 2000
PRINT 'Summary: NumberOfRows BETWEEN 10 AND 2000'
SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',
AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'
FROM #Results WHERE NumberOfRows BETWEEN 10 AND 2000
PRINT 'Summary: NumberOfRows < 10'
SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',
AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'
FROM #Results WHERE NumberOfRows < 10
PRINT 'All data (100 rows)'
SELECT NumberOfTableRows, NumberOfRows, Method1_TimeInMs, Method2_TimeInMs, Method3_TimeInMs, Method4_TimeInMs FROM #Results
--DROP TABLE #Results
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
August 24, 2007 at 9:23 am
And here's the results data (for 100 separate tests)...
/*
Summary
Method1 Method2 Method3 Method4
----------- ----------- ----------- -----------
2539 1976 1057 2476
Summary: NumberOfRows < 2000
Method1 Method2 Method3 Method4
----------- ----------- ----------- -----------
2214 1624 719 780
Summary: NumberOfRows BETWEEN 10 AND 2000
Method1 Method2 Method3 Method4
----------- ----------- ----------- -----------
2235 1629 847 808
Summary: NumberOfRows < 10
Method1 Method2 Method3 Method4
----------- ----------- ----------- -----------
2151 1608 317 694
All data (100 rows)
NumberOfTableRows NumberOfRows Method1_TimeInMs Method2_TimeInMs Method3_TimeInMs Method4_TimeInMs
----------------- ------------ ---------------- ---------------- ---------------- ----------------
1079894 9 2140 1563 623 610
1079894 9943 2533 2046 1356 2766
1079894 12562 2656 2233 1376 4563
1079894 85 2106 1563 733 843
1079894 1850 2093 1626 1203 1016
1079894 1 2170 1580 153 283
1079894 13529 2936 2170 1393 4296
1079894 221 2186 1640 953 860
1079894 349 2186 1623 1016 733
1079894 9233 2533 1966 1393 2296
1079894 15335 2826 2266 1406 3560
1079894 9720 2486 2170 1360 2563
1079894 26 2153 1566 826 780
1079894 32 2173 1610 686 640
1079894 73 2156 1656 780 640
1079894 19701 3220 2780 1423 4716
1079894 1171 2236 1716 1186 1266
1079894 26 2143 1576 623 673
1079894 1427 2106 1596 1216 1046
1079894 11913 2593 2046 1390 3626
1079894 295 2203 1610 1000 890
1079894 9 2093 1563 593 656
1079894 2 2076 1546 186 330
1079894 10604 2500 2013 1376 2813
1079894 1777 2156 1640 1250 1140
1079894 10 2220 1530 423 653
1079894 1653 2470 1640 1186 1050
1079894 19 2200 1610 563 640
1079894 4 2126 2030 326 610
1079894 20 2656 1593 593 626
1079894 232 2076 1576 1000 660
1079894 3564 2420 1686 1283 1403
1079894 3986 2393 1793 1330 1813
1079894 17862 3000 2656 1420 6390
1079894 4 2080 1546 280 873
1079894 3955 2220 1690 1310 2080
1079894 73174 5543 4813 3860 16436
1079894 4032 2390 1783 1313 2560
1079894 29 2283 1593 716 703
1079894 63555 5766 4906 2750 13513
1079894 6 2233 1580 360 1560
1079894 25 2156 1580 530 640
1079894 69 2283 1576 780 656
1079894 2895 2203 1640 1296 1486
1079894 102 2296 1716 796 690
1079894 19 2076 1563 500 640
1079894 11 2203 1610 326 623
1079894 236 2160 1606 923 733
1079894 9 2140 1563 296 610
1079894 2166 2173 1640 1280 1093
1079894 247 2156 1593 920 690
1079894 12269 2640 2123 1393 3903
1079894 20852 3360 2593 1470 4843
1079894 8740 2516 2046 1360 2623
1079894 13 2140 1563 390 733
1079894 70059 5000 4783 3406 16530
1079894 17 2906 2123 610 1733
1079894 1109 2093 1593 1203 906
1079894 9069 2673 1920 1376 2936
1079894 1 2140 1580 360 780
1079894 119 2060 1593 800 640
1079894 111 2123 1626 920 656
1079894 10 2140 1610 593 640
1079894 2 2203 1563 296 453
1079894 4 2156 1576 313 546
1079894 4 2173 1640 406 500
1079894 14 2360 1560 376 610
1079894 478 2543 1596 1093 750
1079894 577 2233 1563 1046 813
1079894 168 2623 2110 936 656
1079894 49114 4390 3890 2406 10346
1079894 1 2123 1530 250 1156
1079894 127 2216 1626 860 656
1079894 78744 6170 5656 3000 17216
1079894 18300 2876 2326 1406 5390
1079894 4427 2236 1780 1343 1890
1079894 68 2190 1593 703 733
1079894 1493 2280 1626 1296 983
1079894 253 2500 1703 970 686
1079894 1 2186 1720 156 653
1079894 11 2283 1576 406 623
1079894 21056 2813 2860 1470 5093
1079894 1293 2170 1533 1203 1186
1079894 32914 3326 3030 1673 6983
1079894 1 2313 1593 156 830
1079894 20495 2983 2483 1436 4406
1079894 2 2076 1563 330 656
1079894 4490 2263 1796 1330 1796
1079894 6762 2376 1886 1376 2360
1079894 64 2140 1640 720 733
1079894 1377 2216 1783 1216 890
1079894 11596 2720 2190 1390 4013
1079894 1703 2233 1596 1263 1236
1079894 50737 4653 4236 2576 17970
1079894 723 2140 1576 1173 1563
1079894 748 2153 1580 1156 860
1079894 35 2140 1563 653 656
1079894 218 2236 1576 890 656
1079894 13 2250 1796 593 640
1079894 27 2250 1580 593 640
1079894 54905 4890 4516 2640 12296
*/
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply