T-SQL queries to simulate INTERSECT and EXCEPT but with records in the same table

Question

T-SQL queries to simulate INTERSECT and EXCEPT but with records in the same table

gacanepa

Old Hand

Points: 352
More actions
June 4, 2015 at 7:07 pm

#301241

Hi everyone,
Here's the scenario. I have a table (let's call it MyTable) that consists of four fields: Id, Source, FirstField, and SecondField, where Source only takes one of two values: Source1 and Source2.
The records in this table look as follows:
I need to return, using 3 different T-SQL queries:
1) Products that exist only in Source2 (in red above)
2) Products that exist only in Source1 (in green above)
3) Products that exist both in Source1 and Source2 (in black above)
For 1) so far I've been doing something along the lines of
SELECT * FROM MyTable WHERE Source=Source1 AND FirstField NOT IN (SELECT DISTINCT(FirstField) FROM MyTable WHERE Source=Source2)
Not being a T-SQL expert myself, I'm wondering if this is the right or more efficient way to go. I have read about INTERSECT and EXCEPT, but I am a little unclear if they could be applied in this case out of the box.
I hope I've explained myself, and would appreciate any tips or suggestions anyone can share. Thanks in advance.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86948 More actions · Answer 1

gacanepa (6/4/2015)
Hi everyone,
Here's the scenario. I have a table (let's call it MyTable) that consists of four fields: Id, Source, FirstField, and SecondField, where Source only takes one of two values: Source1 and Source2.
The records in this table look as follows:
I need to return, using 3 different T-SQL queries:
1) Products that exist only in Source2 (in red above)
2) Products that exist only in Source1 (in green above)
3) Products that exist both in Source1 and Source2 (in black above)
For 1) so far I've been doing something along the lines of
SELECT * FROM MyTable WHERE Source=Source1 AND FirstField NOT IN (SELECT DISTINCT(FirstField) FROM MyTable WHERE Source=Source2)
Not being a T-SQL expert myself, I'm wondering if this is the right or more efficient way to go. I have read about INTERSECT and EXCEPT, but I am a little unclear if they could be applied in this case out of the box.
I hope I've explained myself, and would appreciate any tips or suggestions anyone can share. Thanks in advance.

First of all, never use DISTINCT when you're constructing a sub-query result to the right of IN. It is not a high performance thing to do because IN will operate just as fast whether there's duplicates in the set or not.

Edit: Sorry if that sounded like a rant but I see it way too many times and it is one of my pet peeves.

Otherwise, you're on the right track to use EXCEPT and INTERSECT.

-- Red

SELECT *

FROM MyTable

WHERE Source='Source1'

EXCEPT

SELECT *

FROM MyTable

WHERE Source='Source2';

-- Green

SELECT *

FROM MyTable

WHERE Source='Source2'

EXCEPT

SELECT *

FROM MyTable

WHERE Source='Source1';

-- Black

SELECT *

FROM MyTable

WHERE Source='Source2'

INTERSECT

SELECT *

FROM MyTable

WHERE Source='Source1';

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

gacanepa Old Hand Points: 352 More actions · Answer 2

Thank you for taking the time to write! I believe your suggestion has definitely put me on the right track.

However, let me ask you a further question.

If I do either an EXCEPT or an INTERSECT between the results of 2 SELECT * queries in this case, the Id column will affect the overall result as there are no records with the same Id value.

So I though of doing adding the relevant columns only in the SELECT statements, but in that case the performance is worse (the query takes 28 secs) than before (~15 secs) and causes ~2M logical reads:

Scan count 2, logical reads 2047535, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Any ideas?

sgmunson SSC Guru Points: 110640 More actions · Answer 3

gacanepa (6/5/2015)
Thank you for taking the time to write! I believe your suggestion has definitely put me on the right track.
However, let me ask you a further question.
If I do either an EXCEPT or an INTERSECT between the results of 2 SELECT * queries in this case, the Id column will affect the overall result as there are no records with the same Id value.
So I though of doing adding the relevant columns only in the SELECT statements, but in that case the performance is worse (the query takes 28 secs) than before (~15 secs) and causes ~2M logical reads:
Scan count 2, logical reads 2047535, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Any ideas?

Let me know how the following works:

DECLARE @MyTable AS TABLE (

Id int IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,

[Source] varchar(10) NOT NULL,

FirstField varchar(20),

SecondField varchar(30)

);

INSERT INTO @MyTable ([Source], FirstField, SecondField)

SELECT 'Source1' AS [Source], 'Product 3 name' AS FirstField, 'Product 3 description' AS SecondField UNION ALL

SELECT 'Source2', 'Product 1 name', 'Product 1 description' UNION ALL

SELECT 'Source1', 'Product 1 name', 'Product 1 description' UNION ALL

SELECT 'Source2', 'Product 4 name', 'Product 4 description' UNION ALL

SELECT 'Source1', 'Product 5 name', 'Product 5 description' UNION ALL

SELECT 'Source2', 'Product 6 name', 'Product 6 description';

SELECT MT.*

FROM @MyTable AS MT

LEFT OUTER JOIN @MyTable AS MT2

ON MT.FirstField = MT2.FirstField

AND MT.[Source] <> MT2.[Source]

WHERE MT.[Source] = 'Source2'

AND MT2.FirstField IS NULL

ORDER BY MT.[Source], MT.FirstField;

SELECT MT.*

FROM @MyTable AS MT

LEFT OUTER JOIN @MyTable AS MT2

ON MT.FirstField = MT2.FirstField

AND MT.[Source] <> MT2.[Source]

WHERE MT.[Source] = 'Source1'

AND MT2.FirstField IS NULL

ORDER BY MT.[Source], MT.FirstField;

SELECT MT.*

FROM @MyTable AS MT

INNER JOIN @MyTable AS MT2

ON MT.FirstField = MT2.FirstField

AND MT.[Source] <> MT2.[Source]

ORDER BY MT.[Source], MT.FirstField;

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

Just another way to do it as well:

DECLARE @MyTable AS TABLE (

Id int IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,

[Source] varchar(10) NOT NULL,

FirstField varchar(20),

SecondField varchar(30)

);

INSERT INTO @MyTable ([Source], FirstField, SecondField)

SELECT 'Source1' AS [Source], 'Product 3 name' AS FirstField, 'Product 3 description' AS SecondField UNION ALL

SELECT 'Source2', 'Product 1 name', 'Product 1 description' UNION ALL

SELECT 'Source1', 'Product 1 name', 'Product 1 description' UNION ALL

SELECT 'Source2', 'Product 4 name', 'Product 4 description' UNION ALL

SELECT 'Source1', 'Product 5 name', 'Product 5 description' UNION ALL

SELECT 'Source2', 'Product 6 name', 'Product 6 description';

with Source1Only as (

select

FirstField, SecondField

from

@MyTable

where

[Source] = 'Source1'

except

select

FirstField, SecondField

from

@MyTable

where

[Source] = 'Source2'

)

select mt.Source, mt.FirstField, mt.SecondField from @MyTable mt where exists(select 1 from Source1Only so where so.FirstField = mt.FirstField and so.SecondField = mt.SecondField)

order by mt.Source, mt.FirstField, mt.SecondField;

with Source2Only as (

select

FirstField, SecondField

from

@MyTable

where

[Source] = 'Source2'

except

select

FirstField, SecondField

from

@MyTable

where

[Source] = 'Source1'

)

select mt.Source, mt.FirstField, mt.SecondField from @MyTable mt where exists(select 1 from Source2Only so where so.FirstField = mt.FirstField and so.SecondField = mt.SecondField)

order by mt.Source, mt.FirstField, mt.SecondField;

with CommonItems as (

select

FirstField, SecondField

from

@MyTable

where

[Source] = 'Source1'

intersect

select

FirstField, SecondField

from

@MyTable

where

[Source] = 'Source2'

)

select mt.Source, mt.FirstField, mt.SecondField from @MyTable mt where exists(select 1 from CommonItems ci where ci.FirstField = mt.FirstField and ci.SecondField = mt.SecondField)

order by mt.Source, mt.FirstField, mt.SecondField;

ScottPletcher SSC Guru Points: 101098 More actions · Answer 5

Is there some reason you want to use multiple queries? One query can do this just fine. I've properly clustered the table to match lookup requirements.

DECLARE @MyTable AS TABLE (

Id int IDENTITY(1, 1) NOT NULL,

[Source] varchar(10) NOT NULL,

FirstField varchar(20),

SecondField varchar(30),

UNIQUE CLUSTERED ( [Source], FirstField )

);

SET NOCOUNT ON

INSERT INTO @MyTable ([Source], FirstField, SecondField)

SELECT 'Source1' AS [Source], 'Product 3 name' AS FirstField, 'Product 3 description' AS SecondField UNION ALL

SELECT 'Source2', 'Product 1 name', 'Product 1 description' UNION ALL

SELECT 'Source1', 'Product 1 name', 'Product 1 description' UNION ALL

SELECT 'Source2', 'Product 4 name', 'Product 4 description' UNION ALL

SELECT 'Source1', 'Product 5 name', 'Product 5 description' UNION ALL

SELECT 'Source2', 'Product 6 name', 'Product 6 description';

SET NOCOUNT OFF

SELECT Id, Source, FirstField, SecondField,

CASE WHEN Source = 'Source1'

THEN CASE WHEN EXISTS(SELECT 1 FROM @MyTable mt2 WHERE mt2.Source = 'Source2' AND mt2.FirstField = mt.FirstField)

THEN 'Both' ELSE 'Source1 Only' END

ELSE CASE WHEN EXISTS(SELECT 1 FROM @MyTable mt2 WHERE mt2.Source = 'Source1' AND mt2.FirstField = mt.FirstField)

THEN 'Both' ELSE 'Source2 Only' END

END AS Matching_State

FROM @MyTable mt

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".