December 30, 2008 at 12:06 am
Well here is something interesting. Unfortunately, I'm still learning about the underlying database engine (I'm only half way through Karen Delaney's excellent Query Tuning and Optimization book for SQL Server 2005...) but this is what I'm seeing when I use Jeff's test data tables (btw, thanks Jeff!):
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT o.SomeInt
FROM #MyInTable AS o
WHERE o.SomeInt in
(
SELECT top 1 p.SomeInt FROM JBMTest AS p
WHERE p.SomeInt = o.SomeInt
)
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
(1000 row(s) affected)
Table 'Worktable'. Scan count 1000, logical reads 2920331, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'JBMTest'. Scan count 1, logical reads 15186, physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table '#MyInTable ___ 000000000081'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 4469 ms, elapsed time = 4478 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT o.SomeInt
FROM #MyInTable AS o
WHERE exists --o.SomeInt in
(
SELECT top 1 p.SomeInt FROM JBMTest AS p
WHERE p.SomeInt = o.SomeInt
)
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
(1000 row(s) affected)
Table '#MyInTable___000000000081'. Scan count 9, logical reads 7, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'JBMTest'. Scan count 9, logical reads 15296, physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 406 ms, elapsed time = 59 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Look at the logical reads on the first query as compared to the second query!!!!
December 30, 2008 at 12:08 am
December 30, 2008 at 12:13 am
You'd have to shorten the lines in the code box you made... but you're ok.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2008 at 1:51 am
December 30, 2008 at 6:35 am
One thing that I noticed is that you have reversed the roles of the tables in Jeff's example. Originally, Jeff had #MyInTable as the subordinate table, but you are using it as the primary table.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 30, 2008 at 6:51 am
RBarryYoung (12/30/2008)
One thing that I noticed is that you have reversed the roles of the tables in Jeff's example. Originally, Jeff had #MyInTable as the subordinate table, but you are using it as the primary table.
I did this at work... I'll go back to my workstation tomorrow and have a look to see I did everything correctly... but I thought that's what I said 🙁 100,000 people in the OrgUnit... where #MyInTable is 1000 records (thus it is OrgUnit) and 100,000 records for JBMTest (Person table).
I may have been a tad unclear.
December 30, 2008 at 1:08 pm
Peculiar... on our 2008 development box (which is pretty quiet this week), I ran Jeff's original test and all three ran almost neck and neck. EXISTS squeaked out a win over Join and IN came in third.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
December 30, 2008 at 5:48 pm
Bob Hovious (12/30/2008)
Peculiar... on our 2008 development box (which is pretty quiet this week), I ran Jeff's original test and all three ran almost neck and neck. EXISTS squeaked out a win over Join and IN came in third.
Whew! Glad I'm not the only one!
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2008 at 6:53 pm
It's interesting looking at the execution plans for the two queries:
SELECT o.SomeInt
FROM #MyInTable AS o
WHERE o.SomeInt in
(
SELECT top 1 p.SomeInt FROM JBMTest AS p
WHERE p.SomeInt = o.SomeInt
)
|--Nested Loops(Left Semi Join, OUTER REFERENCES: ([o].[SomeInt]))
|--Index Scan(OBJECT: ([tempdb].[dbo].[#MyInTable] AS [o]))
|--Filter(WHERE: ([ISS_V8_CS].[dbo].[JBMTest].[SomeInt] as [p].[SomeInt]=
[tempdb].[dbo].[#MyInTable].[SomeInt] as [o].[SomeInt]))
|--Top(TOP EXPRESSION: ((1)))
|--Index Spool(SEEK: ([p].[SomeInt]=
[tempdb].[dbo].[#MyInTable].[SomeInt] as [o].[SomeInt]))
|--Clustered Index Scan
(OBJECT: ([ISS_V8_CS].[dbo].[JBMTest].[PK__JBMTest__7EAD8B99] AS [p]))
SELECT o.SomeInt
FROM #MyInTable AS o
WHERE EXISTS
(
SELECT top 1 p.SomeInt FROM JBMTest AS p
WHERE p.SomeInt = o.SomeInt
)
|--Parallelism(Gather Streams)
|--Hash Match(Left Semi Join, HASH: ([o].[SomeInt])=([p].[SomeInt]),
RESIDUAL: ([ISS_V8_CS].[dbo].[JBMTest].[SomeInt] as [p].[SomeInt]
=[tempdb].[dbo].[#MyInTable].[SomeInt] as [o].[SomeInt]))
|--Bitmap(HASH: ([o].[SomeInt]), DEFINE: ([Bitmap1004]))
| |--Parallelism(Repartition Streams, Hash Partitioning,
PARTITION COLUMNS: ([o].[SomeInt]))
| |--Index Scan(OBJECT: ([tempdb].[dbo].[#MyInTable] AS [o]))
|--Parallelism(Repartition Streams, Hash Partitioning,
PARTITION COLUMNS: ([p].[SomeInt]), WHERE: (PROBE([Bitmap1004])=TRUE))
|--Clustered Index Scan
(OBJECT: ([ISS_V8_CS].[dbo].[JBMTest].[PK__JBMTest__7EAD8B99] AS [p]))
So what I'm seeing is that a hash match is being run on the WHERE EXISTS version, and a nested loop is being used for the IN version. Because I've not set MAXDOP hint to 0 in the WHERE EXISTS version, this is trying to setup parallelism. I'm not really clear what the Bitmap operator is doing, though.
My understanding of things is that high degrees of parallelism is in general a good thing for data warehouses, but not necessarily something you want to have occur in a heavy use OLTP situation. However, I also thought that a hash match is probably a better idea for large amounts of data, but then again the nested join doesn't block where the hash match does.
So I guess it's all a massive trade off, and depends on your environment. For a heavy use OLTP system, I think I'd use the IN statement for the semi join, or alternatively if it was a large amount of data (as in the case of 100,000 records in the outer query) I'd use the WHERE EXISTS version but with a restricted MAXDOP hint.
What do people thinK?
December 30, 2008 at 7:10 pm
OK, now I'm slightly confused. It actually looks like the TOP 1 in the IN statement does make a difference!
Check out the execution plan:
SELECT o.SomeInt
FROM #MyInTable AS o
WHERE o.SomeInt in
(
SELECT p.SomeInt FROM JBMTest AS p
WHERE p.SomeInt = o.SomeInt
)
|--Parallelism(Gather Streams)
|--Hash Match(Left Semi Join, HASH: ([o].[SomeInt])=([p].[SomeInt]),
RESIDUAL: ([ISS_V8_CS].[dbo].[JBMTest].[SomeInt] as [p].[SomeInt]=
[tempdb].[dbo].[#MyInTable].[SomeInt] as [o].[SomeInt]
AND [ISS_V8_CS].[dbo].[JBMTest].[SomeInt] as [p].[SomeInt]=
[tempdb].[dbo].[#MyInTable].[SomeInt] as [o].[SomeInt]))
|--Bitmap(HASH: ([o].[SomeInt]), DEFINE: ([Bitmap1004]))
| |--Parallelism(Repartition Streams, Hash Partitioning,
PARTITION COLUMNS: ([o].[SomeInt]))
| |--Index Scan(OBJECT: ([tempdb].[dbo].[#MyInTable] AS [o]))
|--Parallelism(Repartition Streams, Hash Partitioning,
PARTITION COLUMNS: ([p].[SomeInt]), WHERE: (PROBE([Bitmap1004])=TRUE))
|--Clustered Index Scan(
OBJECT: ([ISS_V8_CS].[dbo].[JBMTest].[PK__JBMTest__7EAD8B99] AS [p]))
This exhibits pretty much the same behaviour now as the WHERE EXISTS query!
I would love to know why this occurs.
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply