Correlated Subquery - Stream Aggregate

Question

Post reply

Correlated Subquery - Stream Aggregate

SQLSACT

SSC-Insane

Points: 21838
More actions
December 19, 2012 at 7:58 am

#266528

Hi All
Some DDL
--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])
--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'
select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'
Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?
Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?
Thanks

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Ant-Green SSC Guru Points: 113507 More actions · Answer 1

Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)

insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 2

Essentially, the 2 queries are the same.

Fraid not ,

if a value is used multiple time in Indexing for Col2 then you will get "duplicate" rows back

select col2,count(*)

from indexing

group by col2

having count(*) >1

Will tell you which values are used multiple times.

the exists clause only cares that there is at least one matching row.

Clear Sky SQL
My Blog[/url]

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 3

...
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'
select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'
Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?
Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?
Thanks

Your two query are essentially very different!

Having the same filter in the WHERE clause, still your first query returns Col4 from Indexing2 for every matching record from Indexing, but the second query only returns one row from Indexing2, regardless how many matching records found in Indexing.

If you want query one to behave same way as your query two you need to use DISTINCT in select.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

SQLSACT SSC-Insane Points: 21838 More actions · Answer 4

anthony.green (12/19/2012)
Sample data would be good as well
Using the following sample data
insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)
I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.
Also post your execution plans in a SQLPLAN format so we can take a look.

Thanks

I have 30Mil rows in this table

SQLSACT SSC-Insane Points: 21838 More actions · Answer 5

anthony.green (12/19/2012)
Sample data would be good as well
Using the following sample data
insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)
I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.
Also post your execution plans in a SQLPLAN format so we can take a look.

When I say duplicate, I mean that for Col4 there is more that one row with the same value. The inner join query is returning all those rows while the correlated query doesn't.

SQLSACT SSC-Insane Points: 21838 More actions · Answer 6

SQLSACT (12/19/2012)
Hi All
Some DDL
--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])
--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)
--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'
select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'
Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?
Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?
Thanks

Plan Attached

ScottPletcher SSC Guru Points: 101126 More actions · Answer 7

SQLSACT (12/19/2012)
Why does the 1st query return duplicates and the 2nd query doesn't?
... the 2nd query having a stream aggregate. Why is this being added?
Thanks

The first query does a JOIN; joins return all matching rows, whether duplicate matches or not.

In the second query, you used EXISTS() rather than join. EXISTS() just verifies whether or not a row exists. Since SQL has chosen to use a JOIN to generate the final results, SQL uses the stream aggregate to eliminate duplicate join values, thus insuring that only a single match occurs for each join to the other table.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

You have two very different queries there (different in meaning), so it shouldn't be much of a surprise that they return different results.

See http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ (IN and EXISTS use the same mechanism)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass