April 2, 2014 at 7:15 pm
I was going to report the same thing. The article should be updated to add rows to both tables, and to explain the usefulness of the joins, in restricting the except to only compare columns that exist ( as defined by the primary key ) in both tables.
January 9, 2015 at 4:15 am
Hi,
I'm interested to learn whether this generates a high or low-cost queryplan?
have you been comparing the entire dataset with a fresher copy each time, or just checking against a newly loaded subset?
I'd guess that performance ought to be somewhat similar to join performance which forces a table lookup on both sides [to get the column data] ...
clustering both tables over the pkey should be quickest, since ultimately it will have to check all the data across both tables.
what's your performance experience been like?
January 9, 2015 at 5:24 am
To echo what other posters have said, this is a HUGE win, when comparing data. There have been countless times where I need to do exactly what you're describing, and this method is so unbelievably simple, it made my day.
Cheers!
January 9, 2015 at 5:31 am
Just wanted to share this trick that I've used with a lot of success over the years:
[font="Courier New"]
SELECT 'In A not B' location,
f1,
f2,
f3
FROM a
EXCEPT
SELECT 'In A not B' location,
f1,
f2,
f3
FROM B
UNION ALL
SELECT 'In B not A' location,
f1,
f2,
f3
FROM B
EXCEPT
SELECT 'In B not A' location,
f1,
f2,
f3
FROM A
UNION ALL
(
SELECT 'In Both' location,
f1,
f2,
f3
FROM B
INTERSECT
SELECT 'In Both' location,
f1,
f2,
f3
FROM A
)
[/font]
January 9, 2015 at 5:44 am
I have three tables I sometimes need to sync (development, acceptance and production) and use the following sort of query to find all differences. This is not quite what the author wanted which is differences for rows in both/all tables but could easily be modified to do so. Also note that my tables are relatively small and I have not had to worry about performance. Here is a query that compare constraints column usage across instances:
select sum(src) as srcnum,
case sum(src) when 1 then 'D__'
when 2 then '_A_'
when 3 then 'DA_'
when 4 then '__P'
when 5 then 'D_P'
when 6 then '_AP'
when 7 then 'DAP' end as src,
a.TABLE_SCHEMA, a.TABLE_NAME, COLUMN_NAME, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
from (
select 1 src, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
from Dev.information_schema.CONSTRAINT_COLUMN_USAGE
union
select 2 src, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
from Acc.information_schema.CONSTRAINT_COLUMN_USAGE
union
select 4 src, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
from Pro.information_schema.CONSTRAINT_COLUMN_USAGE
) a
group by data_grp, a.TABLE_SCHEMA, a.TABLE_NAME, COLUMN_NAME, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
having sum(src) < 7
order by data_grp, a.TABLE_SCHEMA, a.TABLE_NAME, COLUMN_NAME, CONSTRAINT_NAME, sum(src);
For data table I typically simplify the inner union queries as 'select 1 src, *'. I know use of * is often discouraged. However, if it bombs because my columns are not in the same order or of different data types, that can be useful information as well.
January 9, 2015 at 5:53 am
victordba (3/26/2014)
Thank you, @james.jensen1350 for your reply.I've been using MERGE for many years to do the upsert. And my source and target tables are large, with millions of records. So, MERGE takes it's sweet time to compare large tables to find the differences between those tables and to do upsert. But MERGE is not perfect, and you have to be very careful with it and also not to forget to take case of NULLs in every field. So, I am always on the lookout for some interesting and more efficient ways to do my upserts. And this article got my attention. I did some quick test comparisons between MERGE and EXCEPT to find changes between 2 tables with about 10 million records in each, and EXCEPT took about 35 seconds, while MERGE about 7 minutes. But I just ran EXCEPT once, to find changes in matching records. I would have to do a full test to also find new and missing records, so I would have to run EXCEPT 2 more times, and also run INSERT, UPDATE and DELETE commands when I find those differences. So, more results to follow. I was just wondering if anyone else tried some other ways to do UPSERTs besides MERGE command on large tables.
@victordba - Did you ever prove whether the all-inclusive MERGE statement performed better or worse than all the separate statements needed when using the EXCEPT approach?
January 9, 2015 at 6:53 am
Regarding the query optimization, only the first join is needed, it ensures you are only taking the Source items that are also found on Target. The second one is not necessary.
Actually, on this situation you could even avoid using joins and table aliases:
SELECT ID
, [Item]
, [Price]
, [OrderDate]
, [Units]
, [ShipmentDate]
FROM [#Source]
WHERE [ID] IN (SELECT [ID] FROM #Target)
EXCEPT
SELECT [ID]
, [Item]
, [Price]
, [OrderDate]
, [Units]
, [ShipmentDate]
FROM [#Target]
January 9, 2015 at 7:00 am
I don't know that where I could use the MERGE I would use EXCEPT. I've used EXCEPT a couple of times on permanent queries. The first was to take a list of thousands of phone numbers and add only the new ones to a list containing millions of phone numbers. The insert performed so quickly that I saw no need to try to find a different way to do it.
I used EXCEPT to do the same thing and also found it very fast. This produced a source recordset that was then inserted into the table. As I only needed to insert as you can't update a phone number (a person's phone number yes, but not an actual phone number) I can't think that a MERGE would provide any advantage.
January 9, 2015 at 7:01 am
I have been using EXCEPT in ETL MERGE statements because it handles NULL values without the need for picking a suitable COALESCE default (TIME datatypes ?). I join the source and target rows using their primary key, and perform an EXISTS on the result of the EXCEPT statement between source and target. However I have found that for large tables, EXCEPT becomes less efficient. I can only assume this is because EXCEPT always compares all the column pairs in the the two rows, whereas I suspect individual ...OR src.ColA <> tgt.ColA... comparisons can return a result at the first TRUE condition without making any further comparisons.
January 9, 2015 at 7:28 am
Very nice. By using LEFT JOIN rather than INNER, the example also returns the unmatched rows.
DELETE FROM #Target WHERE Item = 'Fig';
GO
INSERT INTO #Source
( [Item], [Price], [OrderDate], [Units], [ShipmentDate] )
VALUES ( 'Prune', 2.49, '1/1/2001', NULL, '1/02/2001' );
GO
SELECT .[ID]
, .[Item]
, .[Price]
, .[OrderDate]
, .[Units]
, .[ShipmentDate]
FROM [#Source]
LEFT JOIN [#Target] [T]
ON .[ID] = [T].[ID]
EXCEPT
SELECT [T].[ID]
, [T].[Item]
, [T].[Price]
, [T].[OrderDate]
, [T].[Units]
, [T].[ShipmentDate]
FROM [#Target] [T]
LEFT JOIN [#Source]
ON [T].[ID] = .[ID];
returns 5 rows rather than the 3 from the example.
There are no facts, only interpretations.
Friedrich Nietzsche
January 9, 2015 at 8:15 am
I use this approach :
MERGE statement and comparing columns with SET ANSI NULLS OFF.
Any objections against using SET ANSI NULLS OFF ?
January 9, 2015 at 9:22 am
What do you think about using this in a trigger to find what has actually changed during a DML statement?
January 9, 2015 at 10:23 am
I scanned through the articles, but I may have missed it if someone discussed the "proper" way to do upserts with MERGE. I recommend http://www.made2mentor.com/2013/05/writing-t-sql-merge-statements-the-right-way/ , and for a lot of interesting details on query processor operators that aren't explicitly available in T-SQL you should read http://sqlblog.com/blogs/paul_white/archive/2011/06/22/undocumented-query-plans-equality-comparisons.aspx . The upshot is that NOT EXISTS (SELECT S.* INTERSECT T.*) is a good way to check for row equivalence in a MERGE.
When making changes to stored procedures or complicated queries, I frequently want to ensure that I'm not changing the result sets. In these situations, I'm not worried about duplicate rows, so EXCEPT works well. I generally use the following syntax for these sort of "diff" queries:
WITH
DSO AS (
SELECT *
FROM #Old
),
DSN AS (
SELECT *
FROM #New
)
SELECT 'Old' AS Tbl, * FROM (SELECT * FROM DSO EXCEPT SELECT * FROM DSN) AS U1
UNION
SELECT 'New' AS Tbl, * FROM (SELECT * FROM DSN EXCEPT SELECT * FROM DSO) AS U2
ORDER BY foo, Tbl;
I use the above structure because if I need to ignore certain columns, I can do so in one place in the CTEs rather than having to repeat myself in the pair if EXCEPT sub-queries. In the ORDER BY, "foo" should be replaced by the candidate key columns to get changes to pair up nicely.
January 9, 2015 at 5:09 pm
Does anyone have any information on the performance impact of using this method to create type 2 slowly changing dimensions versus using a merge statement?
January 9, 2015 at 5:27 pm
I'm with the others that don't see a need for the join
the query
SELECT * FROM dbo.source
except
SELECT * FROM dbo.target
gives me the same results! so? ...
I was told to add a row to Source and try again...
the extra row shows up in the results.... that is what you would want, yes/no?
IDItemPriceOrderDateUnitsShipmentDate
2Coconut0.992003-03-031.3500NULL
3Eggplant22.442005-05-05NULL2005-05-06
6Lychee0.292012-12-121000.55002012-12-14
7GrApple2.222001-01-01NULL2001-01-02 <--this row in Source, that does not exist in Target showed up.
So the inner join, does remove the GrApple row.
example:
SELECT * FROM dbo.source INNER JOIN dbo.target ON dbo.source.ID=dbo.target.id
except
SELECT * FROM dbo.target INNER JOIN dbo.source ON dbo.source.ID=dbo.target.id
returns just the 3 rows:
2Coconut0.992003-03-031.3500NULL
3Eggplant22.442005-05-05NULL2005-05-06
6Lychee0.292012-12-121000.55002012-12-14
so the point is that you are only looking for rows that are DIFFERENT, that should be the same, excluding rows with new ID's which have been produced in the production environment.... but you are still able to verify that your copy is accurate, as far as what you have.....
yes?
Thanks!
Peter A.R. Johnson
DBA / Data Therapist / Federal Way Public Schools
Viewing 15 posts - 46 through 60 (of 72 total)
You must be logged in to reply to this topic. Login to reply