Need ideas for a process

  • Thanks for the code. Working on getting a copy of VS2010 to test it out. We're still old school around here using 2005.

  • Can you post some execution results and timings ? (cpu , elaps, io rates)

    Seems to me pulling this kind of stuff into your CLR space could easily blow up your runtime.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • You also need to take into account, tempdb will need to be large enough to hold all the working sets data.

    /* 100% equal */

    Select L.IdNo

    , R.IdNo

    , 9999999999 as Score

    , 999 as nMatches

    from T_Left L

    inner join T_Right R

    on L.RwCheckSum = R.RwCheckSum

    --where L.IdNo = -1999983358

    order by L.IdNo

    , R.IdNo ;

    go

    Select L.IdNo as L_IdNo

    , R.IdNo as R_IdNo

    , case when L.[col1] = R.[col1] then 1048576 else 0 end

    + case when L.[col2] = R.[col2] then 524288 else 0 end

    + case when L.[col3] = R.[col3] then 262144 else 0 end

    + case when L.[col4] = R.[col4] then 131072 else 0 end

    + case when L.[col5] = R.[col5] then 65536 else 0 end

    + case when L.[col6] = R.[col6] then 32768 else 0 end

    + case when L.[col7] = R.[col7] then 16384 else 0 end

    + case when L.[col8] = R.[col8] then 8192 else 0 end

    + case when L.[col9] = R.[col9] then 4096 else 0 end

    + case when L.[col10] = R.[col10] then 2048 else 0 end

    + case when L.[col11] = R.[col11] then 1024 else 0 end

    + case when L.[col12] = R.[col12] then 512 else 0 end

    + case when L.[col13] = R.[col13] then 256 else 0 end

    + case when L.[col14] = R.[col14] then 128 else 0 end

    + case when L.[col15] = R.[col15] then 64 else 0 end

    + case when L.[col16] = R.[col16] then 32 else 0 end

    + case when L.[col17] = R.[col17] then 16 else 0 end

    + case when L.[col18] = R.[col18] then 8 else 0 end

    + case when L.[col19] = R.[col19] then 4 else 0 end

    + case when L.[col20] = R.[col20] then 2 else 0 end

    as Score

    , case when L.[col1] = R.[col1] then 1 else 0 end

    + case when L.[col2] = R.[col2] then 1 else 0 end

    + case when L.[col3] = R.[col3] then 1 else 0 end

    + case when L.[col4] = R.[col4] then 1 else 0 end

    + case when L.[col5] = R.[col5] then 1 else 0 end

    + case when L.[col6] = R.[col6] then 1 else 0 end

    + case when L.[col7] = R.[col7] then 1 else 0 end

    + case when L.[col8] = R.[col8] then 1 else 0 end

    + case when L.[col9] = R.[col9] then 1 else 0 end

    + case when L.[col10] = R.[col10] then 1 else 0 end

    + case when L.[col11] = R.[col11] then 1 else 0 end

    + case when L.[col12] = R.[col12] then 1 else 0 end

    + case when L.[col13] = R.[col13] then 1 else 0 end

    + case when L.[col14] = R.[col14] then 1 else 0 end

    + case when L.[col15] = R.[col15] then 1 else 0 end

    + case when L.[col16] = R.[col16] then 1 else 0 end

    + case when L.[col17] = R.[col17] then 1 else 0 end

    + case when L.[col18] = R.[col18] then 1 else 0 end

    + case when L.[col19] = R.[col19] then 1 else 0 end

    + case when L.[col20] = R.[col20] then 1 else 0 end

    as nMatches

    from T_Left L

    inner join T_Right R

    on R.RwCheckSum <> L.RwCheckSum

    -- Where L.IdNo = -1999983358

    order by L_IdNo, Score desc, nMatches

    I ran this on a 20000 vs 90000 set.

    first query:

    SQL Server parse and compile time:

    CPU time = 61 ms, elapsed time = 61 ms.

    (40005 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Right'. Scan count 1, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Left'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 265 ms, elapsed time = 2097 ms.

    second query still running after 30minutes ....

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (12/2/2010)


    Can you post some execution results and timings ? (cpu , elaps, io rates)

    Seems to me pulling this kind of stuff into your CLR space could easily blow up your runtime.

    i tried this with a CLR stored procedure and it was god awfull slow.

    use a new windows console application, much better

    the part where you send stuff back to the database may need some work too, with a 300k row dataset may indeed kill the tempdb. please feel free to change the code and post back!

    --
    Thiago Dantas
    @DantHimself

  • Since OP stated order is of importance, maybe a join on col1 helps to restrict the cartesian product as well as the where-clause for nMatches > 5

    /* 100% equal */

    Select L.IdNo

    , R.IdNo

    , 9999999999 as Score

    , 999 as nMatches

    from T_Left L

    inner join T_Right R

    on L.RwCheckSum = R.RwCheckSum

    --where L.IdNo = -1999983358

    order by L.IdNo

    , R.IdNo ;

    go

    Select L.IdNo as L_IdNo

    , R.IdNo as R_IdNo

    , case when L.[col1] = R.[col1] then 1048576 else 0 end

    + case when L.[col2] = R.[col2] then 524288 else 0 end

    + case when L.[col3] = R.[col3] then 262144 else 0 end

    + case when L.[col4] = R.[col4] then 131072 else 0 end

    + case when L.[col5] = R.[col5] then 65536 else 0 end

    + case when L.[col6] = R.[col6] then 32768 else 0 end

    + case when L.[col7] = R.[col7] then 16384 else 0 end

    + case when L.[col8] = R.[col8] then 8192 else 0 end

    + case when L.[col9] = R.[col9] then 4096 else 0 end

    + case when L.[col10] = R.[col10] then 2048 else 0 end

    + case when L.[col11] = R.[col11] then 1024 else 0 end

    + case when L.[col12] = R.[col12] then 512 else 0 end

    + case when L.[col13] = R.[col13] then 256 else 0 end

    + case when L.[col14] = R.[col14] then 128 else 0 end

    + case when L.[col15] = R.[col15] then 64 else 0 end

    + case when L.[col16] = R.[col16] then 32 else 0 end

    + case when L.[col17] = R.[col17] then 16 else 0 end

    + case when L.[col18] = R.[col18] then 8 else 0 end

    + case when L.[col19] = R.[col19] then 4 else 0 end

    + case when L.[col20] = R.[col20] then 2 else 0 end

    as Score

    , case when L.[col1] = R.[col1] then 1 else 0 end

    + case when L.[col2] = R.[col2] then 1 else 0 end

    + case when L.[col3] = R.[col3] then 1 else 0 end

    + case when L.[col4] = R.[col4] then 1 else 0 end

    + case when L.[col5] = R.[col5] then 1 else 0 end

    + case when L.[col6] = R.[col6] then 1 else 0 end

    + case when L.[col7] = R.[col7] then 1 else 0 end

    + case when L.[col8] = R.[col8] then 1 else 0 end

    + case when L.[col9] = R.[col9] then 1 else 0 end

    + case when L.[col10] = R.[col10] then 1 else 0 end

    + case when L.[col11] = R.[col11] then 1 else 0 end

    + case when L.[col12] = R.[col12] then 1 else 0 end

    + case when L.[col13] = R.[col13] then 1 else 0 end

    + case when L.[col14] = R.[col14] then 1 else 0 end

    + case when L.[col15] = R.[col15] then 1 else 0 end

    + case when L.[col16] = R.[col16] then 1 else 0 end

    + case when L.[col17] = R.[col17] then 1 else 0 end

    + case when L.[col18] = R.[col18] then 1 else 0 end

    + case when L.[col19] = R.[col19] then 1 else 0 end

    + case when L.[col20] = R.[col20] then 1 else 0 end

    as nMatches

    from T_Left L

    inner join T_Right R

    on R.RwCheckSum <> L.RwCheckSum

    and L.[col1] = R.[col1]

    -- Where L.IdNo = -1999983358

    where case when L.[col1] = R.[col1] then 1 else 0 end

    + case when L.[col2] = R.[col2] then 1 else 0 end

    + case when L.[col3] = R.[col3] then 1 else 0 end

    + case when L.[col4] = R.[col4] then 1 else 0 end

    + case when L.[col5] = R.[col5] then 1 else 0 end

    + case when L.[col6] = R.[col6] then 1 else 0 end

    + case when L.[col7] = R.[col7] then 1 else 0 end

    + case when L.[col8] = R.[col8] then 1 else 0 end

    + case when L.[col9] = R.[col9] then 1 else 0 end

    + case when L.[col10] = R.[col10] then 1 else 0 end

    + case when L.[col11] = R.[col11] then 1 else 0 end

    + case when L.[col12] = R.[col12] then 1 else 0 end

    + case when L.[col13] = R.[col13] then 1 else 0 end

    + case when L.[col14] = R.[col14] then 1 else 0 end

    + case when L.[col15] = R.[col15] then 1 else 0 end

    + case when L.[col16] = R.[col16] then 1 else 0 end

    + case when L.[col17] = R.[col17] then 1 else 0 end

    + case when L.[col18] = R.[col18] then 1 else 0 end

    + case when L.[col19] = R.[col19] then 1 else 0 end

    + case when L.[col20] = R.[col20] then 1 else 0 end > 5

    order by L_IdNo, Score desc, nMatches

    /*

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (40005 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Right'. Scan count 1, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Left'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 453 ms, elapsed time = 17497 ms.

    SQL Server parse and compile time:

    CPU time = 125 ms, elapsed time = 289 ms.

    (147596 row(s) affected)

    Table 'T_Left'. Scan count 5, logical reads 742, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Right'. Scan count 5, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 871860 ms, elapsed time = 678345 ms.

    */

    So it needed i.e. CPU 00:14:31. elaps 00:11:18 on my little test box.

    ( 4core 32bit 1.7GB ram for sqlserver slow disks )

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (12/2/2010)


    Since OP stated order is of importance, maybe a join on col1 helps to restrict the cartesian product as well as the where-clause for nMatches > 5

    So it needed i.e. CPU 00:14:31. elaps 00:11:18 on my little test box.

    ( 4core 32bit 1.7GB ram for sqlserver slow disks )

    makes sense, those stats are for how many rows?

    --
    Thiago Dantas
    @DantHimself

  • @dant12 - I created a console app based on your code. However, I commented out the final updated back to SQL. I'm was only concerned about the matching part. I ran it on both my dev box (i7 quad with HT and 8GB ram) and the production server (16 processors & 128GB ram).

    Dev box 100 left, 300K right = 92 sec

    Dev box 500 left, 300k right = 462 sec

    Dev box 1000 left, 300K right, 921 sec

    Prod 100 left, 300K right, 104 sec (prod box has lots of other stuff running on it.)

    Seems pretty linear, so using 300,000 in my left table it would take 76hrs (

    921 - 1000 records

    * 100 - get to 100,000

    * 3 - get to 300,000

    / 60 - minutes

    / 60 - hours

    ------------------

    76.75 hours

    What kind of times are you seeing?

  • @alzdba - I found an issue with using the Checksum for the equal compare. I have 2 differing rows that produce the same checksum. They have the same values, just in a different order. Since order matters, I don't think I can use it.

    select checksum(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

    select checksum(1,2,11,4,5,6,7,8,9,10,3,12,13,14,15,16,17,18,19,20)

    Both produce -2003203156

    select checksum(1,2,12345,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

    select checksum(1,2,11,4,5,6,7,8,9,10,12345,12,13,14,15,16,17,18,19,20)

    Both product -2003138548

  • gaestes (12/2/2010)


    @ALZDBA - I found an issue with using the Checksum for the equal compare. I have 2 differing rows that produce the same checksum. They have the same values, just in a different order. Since order matters, I don't think I can use it.

    select checksum(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

    select checksum(1,2,11,4,5,6,7,8,9,10,3,12,13,14,15,16,17,18,19,20)

    Both produce -2003203156

    select checksum(1,2,12345,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

    select checksum(1,2,11,4,5,6,7,8,9,10,12345,12,13,14,15,16,17,18,19,20)

    Both product -2003138548

    Thank you for sharing this as it is important and emphasizes the fact that it is only a hash value.

    I didn't encounter this in the few tests I did.

    It proves you also need to add the case statement in that query.

    However the rwchecksum column will speed up these tests for rows that generate the same hash value, hence are plausable to be equal.

    So it would still make sence, if you are searching for those rows.

    Avoid the cart. prod.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • dant12 (12/2/2010)


    ALZDBA (12/2/2010)


    Since OP stated order is of importance, maybe a join on col1 helps to restrict the cartesian product as well as the where-clause for nMatches > 5

    So it needed i.e. CPU 00:14:31. elaps 00:11:18 on my little test box.

    ( 4core 32bit 1.7GB ram for sqlserver slow disks )

    makes sense, those stats are for how many rows?

    As shown at the bottom of the script ... hidden because to long :blush:

    /*

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (40005 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Right'. Scan count 1, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Left'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 453 ms, elapsed time = 17497 ms.

    SQL Server parse and compile time:

    CPU time = 125 ms, elapsed time = 289 ms.

    (147596 row(s) affected)

    Table 'T_Left'. Scan count 5, logical reads 742, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T_Right'. Scan count 5, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 871860 ms, elapsed time = 678345 ms.

    */

    Source tables 40000 left, 90000 right ..... so not so hopeful

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • for perfect matches HASHBYTES can be used instead of checksum

    SELECT HASHBYTES('SHA1','123')

    SELECT HASHBYTES('SHA1','321')

    --
    Thiago Dantas
    @DantHimself

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply