August 6, 2013 at 12:31 pm
Is there any tool which can scan through each script and spit out list of scripts which are same?
August 6, 2013 at 12:37 pm
WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:
will that do what you want?
Lowell
August 6, 2013 at 1:42 pm
Lowell (8/6/2013)
WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:will that do what you want?
Actually i do use winmerge, basically i have 10 scripts, out of these there is possibility that many scripts might have same code. I just need to know which one's are same, instead of me doing a 1 to many comparison for each script.
August 12, 2013 at 9:35 pm
curious_sqldba (8/6/2013)
Lowell (8/6/2013)
WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:will that do what you want?
Actually i do use winmerge, basically i have 10 scripts, out of these there is possibility that many scripts might have same code. I just need to know which one's are same, instead of me doing a 1 to many comparison for each script.
Although there are better tools for this, it is possible to check for duplication and even produce a count of which scripts are "identical" using T-SQL as a "hammer".
The first question would be, what do you mean "might have same code"? If you mean they are absolutely identical including any white space, control characters (Cr, Lf, Tab, etc), and casing, then you could load the scripts into a table, use HASHBYTES to produce a hashcode for each, and then compare the hashcodes. If you're using a case-insensitive collation, then casing won't matter.
If the code has differences in white space or control characters, then you'll need to replace the control characters with spaces and de-duplicate the spaces prior to the HASHBYTES conversion.
Also keep in mind that the above process does not strip out comments. If the comments differ, so will the HASHBYTES conversion. To remove the comments would be very much more difficult to do in T-SQL. It could be done but then you'd have to evaluate how long it would take to develop such a thing compared to doing the task manually. Even such a simple difference as someone including a single semi-colon where the others have not will cause a difference in the HASHBYTES. Of course, you could treat semi-colons as if they were control characters in this case.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply