December 1, 2010 at 2:02 pm
Thanks for the code. Working on getting a copy of VS2010 to test it out. We're still old school around here using 2005.
December 2, 2010 at 12:56 am
Can you post some execution results and timings ? (cpu , elaps, io rates)
Seems to me pulling this kind of stuff into your CLR space could easily blow up your runtime.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
December 2, 2010 at 3:40 am
You also need to take into account, tempdb will need to be large enough to hold all the working sets data.
/* 100% equal */
Select L.IdNo
, R.IdNo
, 9999999999 as Score
, 999 as nMatches
from T_Left L
inner join T_Right R
on L.RwCheckSum = R.RwCheckSum
--where L.IdNo = -1999983358
order by L.IdNo
, R.IdNo ;
go
Select L.IdNo as L_IdNo
, R.IdNo as R_IdNo
, case when L.[col1] = R.[col1] then 1048576 else 0 end
+ case when L.[col2] = R.[col2] then 524288 else 0 end
+ case when L.[col3] = R.[col3] then 262144 else 0 end
+ case when L.[col4] = R.[col4] then 131072 else 0 end
+ case when L.[col5] = R.[col5] then 65536 else 0 end
+ case when L.[col6] = R.[col6] then 32768 else 0 end
+ case when L.[col7] = R.[col7] then 16384 else 0 end
+ case when L.[col8] = R.[col8] then 8192 else 0 end
+ case when L.[col9] = R.[col9] then 4096 else 0 end
+ case when L.[col10] = R.[col10] then 2048 else 0 end
+ case when L.[col11] = R.[col11] then 1024 else 0 end
+ case when L.[col12] = R.[col12] then 512 else 0 end
+ case when L.[col13] = R.[col13] then 256 else 0 end
+ case when L.[col14] = R.[col14] then 128 else 0 end
+ case when L.[col15] = R.[col15] then 64 else 0 end
+ case when L.[col16] = R.[col16] then 32 else 0 end
+ case when L.[col17] = R.[col17] then 16 else 0 end
+ case when L.[col18] = R.[col18] then 8 else 0 end
+ case when L.[col19] = R.[col19] then 4 else 0 end
+ case when L.[col20] = R.[col20] then 2 else 0 end
as Score
, case when L.[col1] = R.[col1] then 1 else 0 end
+ case when L.[col2] = R.[col2] then 1 else 0 end
+ case when L.[col3] = R.[col3] then 1 else 0 end
+ case when L.[col4] = R.[col4] then 1 else 0 end
+ case when L.[col5] = R.[col5] then 1 else 0 end
+ case when L.[col6] = R.[col6] then 1 else 0 end
+ case when L.[col7] = R.[col7] then 1 else 0 end
+ case when L.[col8] = R.[col8] then 1 else 0 end
+ case when L.[col9] = R.[col9] then 1 else 0 end
+ case when L.[col10] = R.[col10] then 1 else 0 end
+ case when L.[col11] = R.[col11] then 1 else 0 end
+ case when L.[col12] = R.[col12] then 1 else 0 end
+ case when L.[col13] = R.[col13] then 1 else 0 end
+ case when L.[col14] = R.[col14] then 1 else 0 end
+ case when L.[col15] = R.[col15] then 1 else 0 end
+ case when L.[col16] = R.[col16] then 1 else 0 end
+ case when L.[col17] = R.[col17] then 1 else 0 end
+ case when L.[col18] = R.[col18] then 1 else 0 end
+ case when L.[col19] = R.[col19] then 1 else 0 end
+ case when L.[col20] = R.[col20] then 1 else 0 end
as nMatches
from T_Left L
inner join T_Right R
on R.RwCheckSum <> L.RwCheckSum
-- Where L.IdNo = -1999983358
order by L_IdNo, Score desc, nMatches
I ran this on a 20000 vs 90000 set.
first query:
SQL Server parse and compile time:
CPU time = 61 ms, elapsed time = 61 ms.
(40005 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Right'. Scan count 1, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Left'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 2097 ms.
second query still running after 30minutes ....
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
December 2, 2010 at 5:23 am
ALZDBA (12/2/2010)
Can you post some execution results and timings ? (cpu , elaps, io rates)Seems to me pulling this kind of stuff into your CLR space could easily blow up your runtime.
i tried this with a CLR stored procedure and it was god awfull slow.
use a new windows console application, much better
the part where you send stuff back to the database may need some work too, with a 300k row dataset may indeed kill the tempdb. please feel free to change the code and post back!
--
Thiago Dantas
@DantHimself
December 2, 2010 at 7:15 am
Since OP stated order is of importance, maybe a join on col1 helps to restrict the cartesian product as well as the where-clause for nMatches > 5
/* 100% equal */
Select L.IdNo
, R.IdNo
, 9999999999 as Score
, 999 as nMatches
from T_Left L
inner join T_Right R
on L.RwCheckSum = R.RwCheckSum
--where L.IdNo = -1999983358
order by L.IdNo
, R.IdNo ;
go
Select L.IdNo as L_IdNo
, R.IdNo as R_IdNo
, case when L.[col1] = R.[col1] then 1048576 else 0 end
+ case when L.[col2] = R.[col2] then 524288 else 0 end
+ case when L.[col3] = R.[col3] then 262144 else 0 end
+ case when L.[col4] = R.[col4] then 131072 else 0 end
+ case when L.[col5] = R.[col5] then 65536 else 0 end
+ case when L.[col6] = R.[col6] then 32768 else 0 end
+ case when L.[col7] = R.[col7] then 16384 else 0 end
+ case when L.[col8] = R.[col8] then 8192 else 0 end
+ case when L.[col9] = R.[col9] then 4096 else 0 end
+ case when L.[col10] = R.[col10] then 2048 else 0 end
+ case when L.[col11] = R.[col11] then 1024 else 0 end
+ case when L.[col12] = R.[col12] then 512 else 0 end
+ case when L.[col13] = R.[col13] then 256 else 0 end
+ case when L.[col14] = R.[col14] then 128 else 0 end
+ case when L.[col15] = R.[col15] then 64 else 0 end
+ case when L.[col16] = R.[col16] then 32 else 0 end
+ case when L.[col17] = R.[col17] then 16 else 0 end
+ case when L.[col18] = R.[col18] then 8 else 0 end
+ case when L.[col19] = R.[col19] then 4 else 0 end
+ case when L.[col20] = R.[col20] then 2 else 0 end
as Score
, case when L.[col1] = R.[col1] then 1 else 0 end
+ case when L.[col2] = R.[col2] then 1 else 0 end
+ case when L.[col3] = R.[col3] then 1 else 0 end
+ case when L.[col4] = R.[col4] then 1 else 0 end
+ case when L.[col5] = R.[col5] then 1 else 0 end
+ case when L.[col6] = R.[col6] then 1 else 0 end
+ case when L.[col7] = R.[col7] then 1 else 0 end
+ case when L.[col8] = R.[col8] then 1 else 0 end
+ case when L.[col9] = R.[col9] then 1 else 0 end
+ case when L.[col10] = R.[col10] then 1 else 0 end
+ case when L.[col11] = R.[col11] then 1 else 0 end
+ case when L.[col12] = R.[col12] then 1 else 0 end
+ case when L.[col13] = R.[col13] then 1 else 0 end
+ case when L.[col14] = R.[col14] then 1 else 0 end
+ case when L.[col15] = R.[col15] then 1 else 0 end
+ case when L.[col16] = R.[col16] then 1 else 0 end
+ case when L.[col17] = R.[col17] then 1 else 0 end
+ case when L.[col18] = R.[col18] then 1 else 0 end
+ case when L.[col19] = R.[col19] then 1 else 0 end
+ case when L.[col20] = R.[col20] then 1 else 0 end
as nMatches
from T_Left L
inner join T_Right R
on R.RwCheckSum <> L.RwCheckSum
and L.[col1] = R.[col1]
-- Where L.IdNo = -1999983358
where case when L.[col1] = R.[col1] then 1 else 0 end
+ case when L.[col2] = R.[col2] then 1 else 0 end
+ case when L.[col3] = R.[col3] then 1 else 0 end
+ case when L.[col4] = R.[col4] then 1 else 0 end
+ case when L.[col5] = R.[col5] then 1 else 0 end
+ case when L.[col6] = R.[col6] then 1 else 0 end
+ case when L.[col7] = R.[col7] then 1 else 0 end
+ case when L.[col8] = R.[col8] then 1 else 0 end
+ case when L.[col9] = R.[col9] then 1 else 0 end
+ case when L.[col10] = R.[col10] then 1 else 0 end
+ case when L.[col11] = R.[col11] then 1 else 0 end
+ case when L.[col12] = R.[col12] then 1 else 0 end
+ case when L.[col13] = R.[col13] then 1 else 0 end
+ case when L.[col14] = R.[col14] then 1 else 0 end
+ case when L.[col15] = R.[col15] then 1 else 0 end
+ case when L.[col16] = R.[col16] then 1 else 0 end
+ case when L.[col17] = R.[col17] then 1 else 0 end
+ case when L.[col18] = R.[col18] then 1 else 0 end
+ case when L.[col19] = R.[col19] then 1 else 0 end
+ case when L.[col20] = R.[col20] then 1 else 0 end > 5
order by L_IdNo, Score desc, nMatches
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(40005 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Right'. Scan count 1, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Left'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 453 ms, elapsed time = 17497 ms.
SQL Server parse and compile time:
CPU time = 125 ms, elapsed time = 289 ms.
(147596 row(s) affected)
Table 'T_Left'. Scan count 5, logical reads 742, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Right'. Scan count 5, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 871860 ms, elapsed time = 678345 ms.
*/
So it needed i.e. CPU 00:14:31. elaps 00:11:18 on my little test box.
( 4core 32bit 1.7GB ram for sqlserver slow disks )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
December 2, 2010 at 9:19 am
ALZDBA (12/2/2010)
Since OP stated order is of importance, maybe a join on col1 helps to restrict the cartesian product as well as the where-clause for nMatches > 5So it needed i.e. CPU 00:14:31. elaps 00:11:18 on my little test box.
( 4core 32bit 1.7GB ram for sqlserver slow disks )
makes sense, those stats are for how many rows?
--
Thiago Dantas
@DantHimself
December 2, 2010 at 9:49 am
@dant12 - I created a console app based on your code. However, I commented out the final updated back to SQL. I'm was only concerned about the matching part. I ran it on both my dev box (i7 quad with HT and 8GB ram) and the production server (16 processors & 128GB ram).
Dev box 100 left, 300K right = 92 sec
Dev box 500 left, 300k right = 462 sec
Dev box 1000 left, 300K right, 921 sec
Prod 100 left, 300K right, 104 sec (prod box has lots of other stuff running on it.)
Seems pretty linear, so using 300,000 in my left table it would take 76hrs (
921 - 1000 records
* 100 - get to 100,000
* 3 - get to 300,000
/ 60 - minutes
/ 60 - hours
------------------
76.75 hours
What kind of times are you seeing?
December 2, 2010 at 10:07 am
@alzdba - I found an issue with using the Checksum for the equal compare. I have 2 differing rows that produce the same checksum. They have the same values, just in a different order. Since order matters, I don't think I can use it.
select checksum(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
select checksum(1,2,11,4,5,6,7,8,9,10,3,12,13,14,15,16,17,18,19,20)
Both produce -2003203156
select checksum(1,2,12345,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
select checksum(1,2,11,4,5,6,7,8,9,10,12345,12,13,14,15,16,17,18,19,20)
Both product -2003138548
December 2, 2010 at 10:37 am
gaestes (12/2/2010)
@ALZDBA - I found an issue with using the Checksum for the equal compare. I have 2 differing rows that produce the same checksum. They have the same values, just in a different order. Since order matters, I don't think I can use it.
select checksum(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
select checksum(1,2,11,4,5,6,7,8,9,10,3,12,13,14,15,16,17,18,19,20)
Both produce -2003203156
select checksum(1,2,12345,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
select checksum(1,2,11,4,5,6,7,8,9,10,12345,12,13,14,15,16,17,18,19,20)
Both product -2003138548
Thank you for sharing this as it is important and emphasizes the fact that it is only a hash value.
I didn't encounter this in the few tests I did.
It proves you also need to add the case statement in that query.
However the rwchecksum column will speed up these tests for rows that generate the same hash value, hence are plausable to be equal.
So it would still make sence, if you are searching for those rows.
Avoid the cart. prod.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
December 2, 2010 at 10:42 am
dant12 (12/2/2010)
ALZDBA (12/2/2010)
Since OP stated order is of importance, maybe a join on col1 helps to restrict the cartesian product as well as the where-clause for nMatches > 5So it needed i.e. CPU 00:14:31. elaps 00:11:18 on my little test box.
( 4core 32bit 1.7GB ram for sqlserver slow disks )
makes sense, those stats are for how many rows?
As shown at the bottom of the script ... hidden because to long :blush:
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(40005 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Right'. Scan count 1, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Left'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 453 ms, elapsed time = 17497 ms.
SQL Server parse and compile time:
CPU time = 125 ms, elapsed time = 289 ms.
(147596 row(s) affected)
Table 'T_Left'. Scan count 5, logical reads 742, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'T_Right'. Scan count 5, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 871860 ms, elapsed time = 678345 ms.
*/
Source tables 40000 left, 90000 right ..... so not so hopeful
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
December 2, 2010 at 10:46 am
for perfect matches HASHBYTES can be used instead of checksum
SELECT HASHBYTES('SHA1','123')
SELECT HASHBYTES('SHA1','321')
--
Thiago Dantas
@DantHimself
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply