December 9, 2011 at 5:54 am
Hi,
Please could someone help me understand why query A below is a lot slower than query B
A] select * from table where field >= 10 --- only values >=10 are 10 to 20 inclusive
B] select * from table where field in (10,11,12,13,14,15,16,17,18,19,20)
I am running them on a 116million row table and the execution plans seem to be the same
Thanks,
Jason
December 9, 2011 at 6:07 am
I assume it's going for Index Scan (if table has any index on that field). Your query requirements can be satisfied with Index Seek.
Try field = 10, it will seek on index. (Just trying to explain... I don’t want to change your business requirements :-)).
December 9, 2011 at 6:13 am
Here's a 1 million row attempt to reproduce what you're describing.
BEGIN TRAN
SET NOCOUNT ON
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END
--1,000,000 Random rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
(ABS(CHECKSUM(NEWID())) % 20) + 1 AS number_1_to_20
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
PRINT '========== BASELINE =========='
SET STATISTICS TIME ON
SELECT COUNT(*) FROM #testEnvironment
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== >= =========='
SET STATISTICS TIME ON
SELECT COUNT(*) FROM #testEnvironment
WHERE number_1_to_20 >= 10
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== IN =========='
SET STATISTICS TIME ON
SELECT COUNT(*) FROM #testEnvironment
WHERE number_1_to_20 IN (10,11,12,13,14,15,16,17,18,19,20)
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
ROLLBACK
Which returns
========== BASELINE ==========
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 68 ms.
================================================================================
========== >= ==========
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 85 ms.
================================================================================
========== IN ==========
SQL Server Execution Times:
CPU time = 235 ms, elapsed time = 236 ms.
================================================================================
Let's add an index.
BEGIN TRAN
SET NOCOUNT ON
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END
--1,000,000 Random rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
(ABS(CHECKSUM(NEWID())) % 20) + 1 AS number_1_to_20
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
CREATE NONCLUSTERED INDEX [test_index]
ON #testEnvironment (number_1_to_20)
PRINT '========== BASELINE =========='
SET STATISTICS TIME ON
SELECT COUNT(*) FROM #testEnvironment
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== >= =========='
SET STATISTICS TIME ON
SELECT COUNT(*) FROM #testEnvironment
WHERE number_1_to_20 >= 10
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
PRINT '========== IN =========='
SET STATISTICS TIME ON
SELECT COUNT(*) FROM #testEnvironment
WHERE number_1_to_20 IN (10,11,12,13,14,15,16,17,18,19,20)
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
ROLLBACK
========== BASELINE ==========
SQL Server Execution Times:
CPU time = 63 ms, elapsed time = 68 ms.
================================================================================
========== >= ==========
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 43 ms.
================================================================================
========== IN ==========
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 43 ms.
================================================================================
Since I obviously can't replicate your observations, I'm going to suggest that it's something on your database that is adversely affecting the queries.
December 9, 2011 at 7:09 am
Ulysses_Infinity (12/9/2011)
A] select * from table where field >= 10 --- only values >=10 are 10 to 20 inclusiveB] select * from table where field in (10,11,12,13,14,15,16,17,18,19,20)
Did you run them both multiple times and ignore the first executions? If not, then the first would have been slower due to reading the data off disk anc caching it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 9, 2011 at 7:22 am
I see that you are measuring your duration, but to see the likely culprit of the difference, you should use:
SET STATISTICS IO ON
The info you'll get will likely show you where the IO issue is, if indeed the difference is IO.
If the durations are substantially different, you'll see a lot more logical reads in one of the queries. But, be aware of GilaMonster's comment on reading from disk vs cache.
If you want to force the disk reads, before each execution you could run:
DBCC DropCleanBuffers
That will force SQL to pull the information from disk since the data buffers will be emptied.
Just some thoughts on troubleshooting.
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply