June 4, 2015 at 7:07 pm
Hi everyone,
Here's the scenario. I have a table (let's call it MyTable) that consists of four fields: Id, Source, FirstField, and SecondField, where Source only takes one of two values: Source1 and Source2.
The records in this table look as follows:
I need to return, using 3 different T-SQL queries:
1) Products that exist only in Source2 (in red above)
2) Products that exist only in Source1 (in green above)
3) Products that exist both in Source1 and Source2 (in black above)
For 1) so far I've been doing something along the lines of
SELECT * FROM MyTable WHERE Source=Source1 AND FirstField NOT IN (SELECT DISTINCT(FirstField) FROM MyTable WHERE Source=Source2)
Not being a T-SQL expert myself, I'm wondering if this is the right or more efficient way to go. I have read about INTERSECT and EXCEPT, but I am a little unclear if they could be applied in this case out of the box.
I hope I've explained myself, and would appreciate any tips or suggestions anyone can share. Thanks in advance.
June 4, 2015 at 7:20 pm
gacanepa (6/4/2015)
Hi everyone,Here's the scenario. I have a table (let's call it MyTable) that consists of four fields: Id, Source, FirstField, and SecondField, where Source only takes one of two values: Source1 and Source2.
The records in this table look as follows:
I need to return, using 3 different T-SQL queries:
1) Products that exist only in Source2 (in red above)
2) Products that exist only in Source1 (in green above)
3) Products that exist both in Source1 and Source2 (in black above)
For 1) so far I've been doing something along the lines of
SELECT * FROM MyTable WHERE Source=Source1 AND FirstField NOT IN (SELECT DISTINCT(FirstField) FROM MyTable WHERE Source=Source2)
Not being a T-SQL expert myself, I'm wondering if this is the right or more efficient way to go. I have read about INTERSECT and EXCEPT, but I am a little unclear if they could be applied in this case out of the box.
I hope I've explained myself, and would appreciate any tips or suggestions anyone can share. Thanks in advance.
First of all, never use DISTINCT when you're constructing a sub-query result to the right of IN. It is not a high performance thing to do because IN will operate just as fast whether there's duplicates in the set or not.
Edit: Sorry if that sounded like a rant but I see it way too many times and it is one of my pet peeves.
Otherwise, you're on the right track to use EXCEPT and INTERSECT.
-- Red
SELECT *
FROM MyTable
WHERE Source='Source1'
EXCEPT
SELECT *
FROM MyTable
WHERE Source='Source2';
-- Green
SELECT *
FROM MyTable
WHERE Source='Source2'
EXCEPT
SELECT *
FROM MyTable
WHERE Source='Source1';
-- Black
SELECT *
FROM MyTable
WHERE Source='Source2'
INTERSECT
SELECT *
FROM MyTable
WHERE Source='Source1';
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 5, 2015 at 7:10 am
Thank you for taking the time to write! I believe your suggestion has definitely put me on the right track.
However, let me ask you a further question.
If I do either an EXCEPT or an INTERSECT between the results of 2 SELECT * queries in this case, the Id column will affect the overall result as there are no records with the same Id value.
So I though of doing adding the relevant columns only in the SELECT statements, but in that case the performance is worse (the query takes 28 secs) than before (~15 secs) and causes ~2M logical reads:
Scan count 2, logical reads 2047535, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Any ideas?
June 5, 2015 at 9:21 am
gacanepa (6/5/2015)
Thank you for taking the time to write! I believe your suggestion has definitely put me on the right track.However, let me ask you a further question.
If I do either an EXCEPT or an INTERSECT between the results of 2 SELECT * queries in this case, the Id column will affect the overall result as there are no records with the same Id value.
So I though of doing adding the relevant columns only in the SELECT statements, but in that case the performance is worse (the query takes 28 secs) than before (~15 secs) and causes ~2M logical reads:
Scan count 2, logical reads 2047535, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Any ideas?
Let me know how the following works:
DECLARE @MyTable AS TABLE (
Id int IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
[Source] varchar(10) NOT NULL,
FirstField varchar(20),
SecondField varchar(30)
);
INSERT INTO @MyTable ([Source], FirstField, SecondField)
SELECT 'Source1' AS [Source], 'Product 3 name' AS FirstField, 'Product 3 description' AS SecondField UNION ALL
SELECT 'Source2', 'Product 1 name', 'Product 1 description' UNION ALL
SELECT 'Source1', 'Product 1 name', 'Product 1 description' UNION ALL
SELECT 'Source2', 'Product 4 name', 'Product 4 description' UNION ALL
SELECT 'Source1', 'Product 5 name', 'Product 5 description' UNION ALL
SELECT 'Source2', 'Product 6 name', 'Product 6 description';
SELECT MT.*
FROM @MyTable AS MT
LEFT OUTER JOIN @MyTable AS MT2
ON MT.FirstField = MT2.FirstField
AND MT.[Source] <> MT2.[Source]
WHERE MT.[Source] = 'Source2'
AND MT2.FirstField IS NULL
ORDER BY MT.[Source], MT.FirstField;
SELECT MT.*
FROM @MyTable AS MT
LEFT OUTER JOIN @MyTable AS MT2
ON MT.FirstField = MT2.FirstField
AND MT.[Source] <> MT2.[Source]
WHERE MT.[Source] = 'Source1'
AND MT2.FirstField IS NULL
ORDER BY MT.[Source], MT.FirstField;
SELECT MT.*
FROM @MyTable AS MT
INNER JOIN @MyTable AS MT2
ON MT.FirstField = MT2.FirstField
AND MT.[Source] <> MT2.[Source]
ORDER BY MT.[Source], MT.FirstField;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
June 5, 2015 at 9:42 am
Just another way to do it as well:
DECLARE @MyTable AS TABLE (
Id int IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
[Source] varchar(10) NOT NULL,
FirstField varchar(20),
SecondField varchar(30)
);
INSERT INTO @MyTable ([Source], FirstField, SecondField)
SELECT 'Source1' AS [Source], 'Product 3 name' AS FirstField, 'Product 3 description' AS SecondField UNION ALL
SELECT 'Source2', 'Product 1 name', 'Product 1 description' UNION ALL
SELECT 'Source1', 'Product 1 name', 'Product 1 description' UNION ALL
SELECT 'Source2', 'Product 4 name', 'Product 4 description' UNION ALL
SELECT 'Source1', 'Product 5 name', 'Product 5 description' UNION ALL
SELECT 'Source2', 'Product 6 name', 'Product 6 description';
with Source1Only as (
select
FirstField, SecondField
from
@MyTable
where
[Source] = 'Source1'
except
select
FirstField, SecondField
from
@MyTable
where
[Source] = 'Source2'
)
select mt.Source, mt.FirstField, mt.SecondField from @MyTable mt where exists(select 1 from Source1Only so where so.FirstField = mt.FirstField and so.SecondField = mt.SecondField)
order by mt.Source, mt.FirstField, mt.SecondField;
with Source2Only as (
select
FirstField, SecondField
from
@MyTable
where
[Source] = 'Source2'
except
select
FirstField, SecondField
from
@MyTable
where
[Source] = 'Source1'
)
select mt.Source, mt.FirstField, mt.SecondField from @MyTable mt where exists(select 1 from Source2Only so where so.FirstField = mt.FirstField and so.SecondField = mt.SecondField)
order by mt.Source, mt.FirstField, mt.SecondField;
with CommonItems as (
select
FirstField, SecondField
from
@MyTable
where
[Source] = 'Source1'
intersect
select
FirstField, SecondField
from
@MyTable
where
[Source] = 'Source2'
)
select mt.Source, mt.FirstField, mt.SecondField from @MyTable mt where exists(select 1 from CommonItems ci where ci.FirstField = mt.FirstField and ci.SecondField = mt.SecondField)
order by mt.Source, mt.FirstField, mt.SecondField;
June 5, 2015 at 9:45 am
Is there some reason you want to use multiple queries? One query can do this just fine. I've properly clustered the table to match lookup requirements.
DECLARE @MyTable AS TABLE (
Id int IDENTITY(1, 1) NOT NULL,
[Source] varchar(10) NOT NULL,
FirstField varchar(20),
SecondField varchar(30),
UNIQUE CLUSTERED ( [Source], FirstField )
);
SET NOCOUNT ON
INSERT INTO @MyTable ([Source], FirstField, SecondField)
SELECT 'Source1' AS [Source], 'Product 3 name' AS FirstField, 'Product 3 description' AS SecondField UNION ALL
SELECT 'Source2', 'Product 1 name', 'Product 1 description' UNION ALL
SELECT 'Source1', 'Product 1 name', 'Product 1 description' UNION ALL
SELECT 'Source2', 'Product 4 name', 'Product 4 description' UNION ALL
SELECT 'Source1', 'Product 5 name', 'Product 5 description' UNION ALL
SELECT 'Source2', 'Product 6 name', 'Product 6 description';
SET NOCOUNT OFF
SELECT Id, Source, FirstField, SecondField,
CASE WHEN Source = 'Source1'
THEN CASE WHEN EXISTS(SELECT 1 FROM @MyTable mt2 WHERE mt2.Source = 'Source2' AND mt2.FirstField = mt.FirstField)
THEN 'Both' ELSE 'Source1 Only' END
ELSE CASE WHEN EXISTS(SELECT 1 FROM @MyTable mt2 WHERE mt2.Source = 'Source1' AND mt2.FirstField = mt.FirstField)
THEN 'Both' ELSE 'Source2 Only' END
END AS Matching_State
FROM @MyTable mt
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply