June 9, 2005 at 9:47 am
Let me start by saying I have read all the articles on sql comparison tools and scripts in respect of comparing data from 2 tables. I have used 3 or 4 of these and they work fine. BUT....
All of the tables in our datawarehouse have update date and time columns. If I check live with test every row in the table differs because of these timestamps. We cannot simply update one to the other as there are millions of rows and other people need these dates and times for other testing.
I want to be able to compare a list of columns from two tables instead leaving out those I know will differ. A view would have been the best way but all these third party products only work directly with tables as far as I can see/have tried out.
If anyone can help, I would be deeply grateful.
Just to re-iterate, it is only the data I am interested in, NOT the structure.
June 9, 2005 at 10:12 am
Something like this?
Select * from dbo.Table1 T1 inner join dbo.Table2 T2 on T1.id = T2.id and (T1.DiffCol1 T2.DiffCol1 or T1.DiffCol2 T2.DiffCol2)
June 9, 2005 at 10:27 am
Unfortunately we have fact tables with in the region of 1000 columns. Just to be that extra bit picky, my boss wants to see a visual report of the differences with the actual difference highlighted rather than the row.
June 9, 2005 at 11:00 am
You change your signature to : The Boss is always right - when I'm not wrong!
Select T1.id, T2.id, T1.col1, T2.Col1, case when T1.Coli = T2.Col THEN '=' ELSE '' END as Col1Equal.... repeat 1000 times .
from Table1 join Table2
You'd have to make sure that this always handles nulls correctly.
Also you'll have to use some pretty good concatenation technics to make sure you get all the columns. I don't need to tell you how shitty slow this is gonna be????
June 9, 2005 at 11:01 am
Or you could always do the comparaison client side... but it's still gonn be slow as hell.
June 10, 2005 at 2:33 am
Thanks, Remi. Unfortunately, whi is it that the boss IS always right, especially when they ask for so much detail?
I have a text comparison tool that runs like a pig, but does display diffs on screen and in a report. I guess I'll have to continue to output to a text file and let it run overnight. At least there is a minimum amount of coding involved.
Thanks for the tips though. I will use the case technique if I know which columns have potantial errors.
June 10, 2005 at 4:12 am
Just a thought on constructing large repetitive SQL statements... I tend to use Excel a lot to construct the repetitive parts using formulas against lists etc. and then copy & paste back into Query Analyzer.
Does nothing for the performance of the query itself though!
June 10, 2005 at 6:35 am
it's even easier to do query generation on the server using QA. You just got to know your way around the sys* tables.. or the information_schema views
June 10, 2005 at 6:54 am
I have built queries before around the sys tables, so i will probably go that way if needed.
June 10, 2005 at 7:00 am
ya.. you're gonna have fun though... 1000 columns will certainly break the 8000 char limit .
I'm wondering how I would get around that problem elegantly.
June 10, 2005 at 7:32 am
Have you tried checksum? First you need a unique or primary key. Then calculate a checksum for each row based upon the selected column list for each primary key (or rows returned by a view of the selected columns?) Rows with differing checksums will be rows with differences which can then be checked to determine exactly what is different. Checksum calculations run pretty fast. Might be worth a try.
June 10, 2005 at 7:36 am
That's a nice idea, but if I remember correctly, his boss wants to see all the lines, and all the differences in all the columns (and leave the good ones there too).
Maybe you would be willing to have only the lines with different values...
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply