January 19, 2009 at 10:19 am
I have two tables 'Tbl1' and 'Tbl2' in the same database. Both tables have different number of rows and columns. Typically the number of rows is in the range 3 to 5 million.
I want to check (and list) what rows in Tbl1 do not exist in Tbl2.
The check will be based on 3 columns in each table with same data types; i.e. ColumnA in both tables is 'varchar(20)', ColumnB in both tables is 'datetime' (2009-01-16 21:43:35), and ColumnC in both cases is 'int'.
In case any one of the three columns does not match then that row in Tbl1 does not exist in Tbl2. A deviation to this criteria is that a difference of approx 5 minutes above or below the value in ColumnB of Tbl2 with respect to Tbl1 is acceptable
Could someone give me the sql code?
January 19, 2009 at 10:44 am
hi rahydri,
To me what you described here is a Left Outer Join.
select column
from tableA
left outer join table b
on tableA.columnA = tableB.columnA
where tableB.anycolumn is null
I dont know your level of comfort with SQL queries so do you need help to write the actually query or?
In your data is it possible that a record exists only in the second table?
added missing where clause...
January 19, 2009 at 10:47 am
This sounds a lot like homework?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 19, 2009 at 11:30 am
Maxim has pointed you in the right direction. Keep in mind that you can use all comparison operators in join's.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 19, 2009 at 11:30 pm
Hi Maxim
thanks for the reply. replys to your questions are as under
Yes, I would appreciate your help in writing the actual quary
Yes, there are data rows in Tbl2 which do not exist in Tbl1
thanks once again
January 20, 2009 at 1:20 am
Jack Corbett (1/19/2009)
Maxim has pointed you in the right direction. Keep in mind that you can use all comparison operators in join's.
jack isn't is good to use WHERE EXISTS clause instead of JOIN, as it took less CPU time as well as elasped time.
Abhijit - http://abhijitmore.wordpress.com
January 20, 2009 at 6:27 am
Yes, WHERE EXISTS/NOT EXISTS can perform slightly better. Until recently I was more comfortable reading the LEFT JOIN syntax, and in this case both will work, one, IMO, the LEFT JOIN, is easier for less experienced SQL folks to read.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 20, 2009 at 6:41 am
hi again!
The left join will include all records from first table even if no records match in the second table. All fields from second table are listed as NULL values when no match was found. You can use that as a condition to keep or filter records that do not exist.
When you join two tables you can list multiple predicates and since your columns are of the same datatype in both tables there is no need to do any conversion.
To compare two dates you can use the datediff function. The first parameter is the part of the date you want to compare. In this case I chose ms (millisecond) for a better precision.
select
Tbl1.ColumnA
from
Tbl1 left outer join Tbl2
on Tbl1.ColumnA = Tbl2.ColumnA
and datediff(ms,Tbl1.ColumnB,Tbl2.ColumnB) > -300001
and datediff(ms,Tbl1.ColumnB,Tbl2.ColumnB) < 300001
and Tbl1.ColumnC = Tbl2.ColumnC
where
Tbl2.ColumnA is null
If this is part of a homework as RBY suggests well I guess you will have to understand how to write a query like this because sooner or later you will be facing an exam and I am pretty sure you wont have access to SSC!
Let me know if you have more questions!
January 21, 2009 at 2:22 am
Hi maxim
Thanks again for the reply. I do not have to appear in any exam so no issue on that account. I am new to SQL. am learning it on my own, I want to learn it due to the job i have recently switched.
I think I may be able to do the remaining code myself. If I can't I would request you again.
Regards ans best wishes
January 21, 2009 at 8:44 am
Forget homework. This problem, AS STATED, is simply not possible to solve. The problem is how to determine that any two given records are a matched set, when the criteria quite clearly might allow one record in one table to match multiple records in the other table. Until you can guarantee the ability to generate a list of matched pairs of records, the rest of the problem is irrelevant.
However, with the ASSUMPTION that the +/- 5 minutes scenario will NEVER cause a 1 to many situation, you generate the list of Tbl1 records that have matches in Tbl2, and then select those records in Tbl1 that aren't in that list. That last will require comparing ALL fields between Tbl1 and the list.
Steve
(aka smunson)
:):):)
rahydri (1/19/2009)
I have two tables 'Tbl1' and 'Tbl2' in the same database. Both tables have different number of rows and columns. Typically the number of rows is in the range 3 to 5 million.I want to check (and list) what rows in Tbl1 do not exist in Tbl2.
The check will be based on 3 columns in each table with same data types; i.e. ColumnA in both tables is 'varchar(20)', ColumnB in both tables is 'datetime' (2009-01-16 21:43:35), and ColumnC in both cases is 'int'.
In case any one of the three columns does not match then that row in Tbl1 does not exist in Tbl2. A deviation to this criteria is that a difference of approx 5 minutes above or below the value in ColumnB of Tbl2 with respect to Tbl1 is acceptable
Could someone give me the sql code?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
January 22, 2009 at 1:35 am
Dear Smunson
Thanks for the interest
If you analyze the problem i formulated, I do not want to check the matched items. I want to list the rows from Tbl1 which do not exist in Tbl2. It does not matter if a row in Tbl1 exists once, twice or morer times in Tbl2. The tolerance value of +/- five minutes is acceptable to us, it does not matter how many records match in this duration.
Regards
January 22, 2009 at 1:50 am
select tb1.*
from tb1 left outer join tb2 on (tb1.key = tb2.key)
where tb2.key is null
January 22, 2009 at 7:17 am
The method remains the same. One still needs to identify whether or not a given record in Tbl1 has a "match" in Tbl2 or not, and if not, output it. Try this:
;WITH MATCH_TWO AS (
SELECT T1.*
FROM TBL1 AS T1 INNER JOIN TBL2 AS T2
ON T1.INTFIELD = T2.INTFIELD AND
T1.VARCHARFIELD = T2.VARCHARFIELD AND
T1.DATEFIELD BETWEEN DATEADD(minute,-5,T2.DATEFIELD) AND DATEADD(minute,5,T2.DATEFIELD)
)
SELECT T1.*
FROM TBL1 AS T1 INNER JOIN MATCH_TWO AS MT
ON T1.INTFIELD <> MT.INTFIELD AND
T1.VARCHARFIELD <> MT.VARCHARFIELD AND'
T1.DATEFIELD <> MT.DATEFIELD AND
T1.OTHERFIELD1 <> MT.OTHERFIELD1 AND ...
.
.
.
This is untested, and you'll have to add EVERY field in TBL1 to the final query. I suspect there may be a better way to do this, but without any test data to work with, I have nothing to go on.
Steve
(aka smunson)
:):):)
rahydri (1/22/2009)
Dear SmunsonThanks for the interest
If you analyze the problem i formulated, I do not want to check the matched items. I want to list the rows from Tbl1 which do not exist in Tbl2. It does not matter if a row in Tbl1 exists once, twice or morer times in Tbl2. The tolerance value of +/- five minutes is acceptable to us, it does not matter how many records match in this duration.
Regards
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
January 22, 2009 at 11:00 am
;WITH MATCH_TWO AS (
SELECT T1.*
FROM TBL1 AS T1 INNER JOIN TBL2 AS T2
ON T1.INTFIELD = T2.INTFIELD AND
T1.VARCHARFIELD = T2.VARCHARFIELD AND
T1.DATEFIELD BETWEEN DATEADD(minute,-5,T2.DATEFIELD) AND DATEADD(minute,5,T2.DATEFIELD)
)
SELECT T1.*
FROM TBL1 AS T1 INNER JOIN MATCH_TWO AS MT
ON T1.INTFIELD <> MT.INTFIELD AND
T1.VARCHARFIELD <> MT.VARCHARFIELD AND'
T1.DATEFIELD <> MT.DATEFIELD AND
T1.OTHERFIELD1 <> MT.OTHERFIELD1 AND ...
.
.
.
The query you wrote here is basically a triangular join.
The CTE MATCH_TWO returns the records that match between Tbl1 and Tbl2.
The query you wrote after is a CROSS JOIN between Tbl1 and MATCH_TWO but you filter out the records where fields match in both tables.
It will not only perform poorly on big tables but it also does not return the correct records... You will get all records from Tbl1 and a duplicate for all those that do not exist in Tbl2.
I have test tables and data setup for this and I stand behind my query as one of the possible correct answers for the problem as described.
January 22, 2009 at 11:36 am
Oops... I find it difficult to code correctly when I have nothing to work with. Anyway, here's another shot at it. I'm not sure if it's any better on the performance issue, but it should at least work:
;WITH TABLE_ONE AS (
SELECT ROW_NUMBER() OVER(ORDER BY INTFIELD, VARCHARFIELD, DATEFIELD) AS RN, T1.*
FROM TBL1
),
MATCH_TWO AS (
SELECT T1.*
FROM TABLE_ONE AS T1 INNER JOIN TBL2 AS T2
ON T1.INTFIELD = T2.INTFIELD AND
T1.VARCHARFIELD = T2.VARCHARFIELD AND
T1.DATEFIELD BETWEEN DATEADD(minute,-5,T2.DATEFIELD) AND DATEADD(minute,5,T2.DATEFIELD)
)
SELECT *
FROM TABLE_ONE
WHERE RN NOT IN (SELECT RN FROM MATCH_TWO)
Steve
(aka smunson)
:):):)
Maxim Picard (1/22/2009)
;WITH MATCH_TWO AS (
SELECT T1.*
FROM TBL1 AS T1 INNER JOIN TBL2 AS T2
ON T1.INTFIELD = T2.INTFIELD AND
T1.VARCHARFIELD = T2.VARCHARFIELD AND
T1.DATEFIELD BETWEEN DATEADD(minute,-5,T2.DATEFIELD) AND DATEADD(minute,5,T2.DATEFIELD)
)
SELECT T1.*
FROM TBL1 AS T1 INNER JOIN MATCH_TWO AS MT
ON T1.INTFIELD <> MT.INTFIELD AND
T1.VARCHARFIELD <> MT.VARCHARFIELD AND'
T1.DATEFIELD <> MT.DATEFIELD AND
T1.OTHERFIELD1 <> MT.OTHERFIELD1 AND ...
.
.
.
The query you wrote here is basically a triangular join.
The CTE MATCH_TWO returns the records that match between Tbl1 and Tbl2.
The query you wrote after is a CROSS JOIN between Tbl1 and MATCH_TWO but you filter out the records where fields match in both tables.
It will not only perform poorly on big tables but it also does not return the correct records... You will get all records from Tbl1 and a duplicate for all those that do not exist in Tbl2.
I have test tables and data setup for this and I stand behind my query as one of the possible correct answers for the problem as described.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply