December 19, 2012 at 7:58 am
Hi All
Some DDL
--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])
--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'
select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'
Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?
Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?
Thanks
December 19, 2012 at 8:04 am
Sample data would be good as well
Using the following sample data
insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)
I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.
Also post your execution plans in a SQLPLAN format so we can take a look.
December 19, 2012 at 8:05 am
Essentially, the 2 queries are the same.
Fraid not ,
if a value is used multiple time in Indexing for Col2 then you will get "duplicate" rows back
select col2,count(*)
from indexing
group by col2
having count(*) >1
Will tell you which values are used multiple times.
the exists clause only cares that there is at least one matching row.
December 19, 2012 at 8:15 am
...
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'
select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'
Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?
Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?
Thanks
Your two query are essentially very different!
Having the same filter in the WHERE clause, still your first query returns Col4 from Indexing2 for every matching record from Indexing, but the second query only returns one row from Indexing2, regardless how many matching records found in Indexing.
If you want query one to behave same way as your query two you need to use DISTINCT in select.
December 20, 2012 at 1:44 am
anthony.green (12/19/2012)
Sample data would be good as wellUsing the following sample data
insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)
I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.
Also post your execution plans in a SQLPLAN format so we can take a look.
Thanks
I have 30Mil rows in this table
December 20, 2012 at 1:47 am
anthony.green (12/19/2012)
Sample data would be good as wellUsing the following sample data
insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)
I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.
Also post your execution plans in a SQLPLAN format so we can take a look.
When I say duplicate, I mean that for Col4 there is more that one row with the same value. The inner join query is returning all those rows while the correlated query doesn't.
December 20, 2012 at 1:50 am
SQLSACT (12/19/2012)
Hi AllSome DDL
--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])
--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'
select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'
Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?
Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?
Thanks
Plan Attached
December 21, 2012 at 2:52 pm
SQLSACT (12/19/2012)
Why does the 1st query return duplicates and the 2nd query doesn't?
... the 2nd query having a stream aggregate. Why is this being added?
Thanks
The first query does a JOIN; joins return all matching rows, whether duplicate matches or not.
In the second query, you used EXISTS() rather than join. EXISTS() just verifies whether or not a row exists. Since SQL has chosen to use a JOIN to generate the final results, SQL uses the stream aggregate to eliminate duplicate join values, thus insuring that only a single match occurs for each join to the other table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 21, 2012 at 3:14 pm
You have two very different queries there (different in meaning), so it shouldn't be much of a surprise that they return different results.
See http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ (IN and EXISTS use the same mechanism)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply