simple SELECT query takes ten seconds to return 5000 rows - can I optimize?

Question

Post reply

simple SELECT query takes ten seconds to return 5000 rows - can I optimize?

Jesse Reich

Hall of Fame

Points: 3020
More actions
September 15, 2010 at 7:04 pm

#229302

I have a table and index:
CREATE TABLE X (
HistoryID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ObjID INT NOT NULL,
Date1 DATETIME2,
Date2 DATETIME2,
Info1 CHAR(400),
Info2 CHAR(400),
Int1 INT,
Int2 INT,
Int3 INT
)
CREATE INDEX X_IX1 ON X (ObjID)
The table has about 4 million rows and is 854MB in size.
The ObjID value repeats and there are about 5000 unique ObjIDs in the table.
When I run this query:
DBCC DROPCLEANBUFFERS
SELECT
X.*
FROM X
INNER JOIN (
SELECT
ObjID,
MAX(HistoryID) AS MaxID
FROM X
GROUP BY ObjID
) X2
ON X.HistoryID = X2.MaxID
It takes about 10 seconds to return 5000 records which seems like forever considering it's a local database on my laptop (Intel Core Duo - 4GB RAM).
My questions:
1) Is 10 seconds a reasonable amount of time for this query to run?
2) Is there anything I can do to speed it up?
I've included the query plan as well:
|--Nested Loops(Inner Join, OUTER REFERENCES ([Expr1006]))
|--Stream Aggregate(GROUP BY ([LoadTest].[dbo].[X].[OBJ_ID]) DEFINE ([Expr1006]=MAX
([LoadTest].[dbo].[X].[ID])))
| |--Index Scan(OBJECT ([LoadTest].[dbo].[X].[X_IX1]), ORDERED FORWARD)
|--Clustered Index Seek(OBJECT ([LoadTest].[dbo].[X].[PK__X__3214EC273F466844]),
SEEK ([LoadTest].[dbo].[X].[ID]=[Expr1006]) ORDERED FORWARD)

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 1

I know this might seem stupid to even try but it made the code run twice as fast on my machine...

SELECT

X.*

FROM X

INNER JOIN (

SELECT

--ObjID, --<<<<COMMENT THIS LINE OUT!!!

MAX(HistoryID) AS MaxID

FROM X

GROUP BY ObjID

) X2

ON X.HistoryID = X2.MaxID

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jesse Reich Hall of Fame Points: 3020 More actions · Answer 2

Thanks for looking at this. I tried removing the extra column but didn't see any substantial difference in the elapsed time of the query.

Although I did notice that upon removing the column, SET STATISTICS IO shows 31933 logical reads as opposed to 32095.

sanmatrix SSC Veteran Points: 250 More actions · Answer 3

SELECT X.*

FROM X

WHERE HistoryId In (

SELECT MAX(HistoryID)

FROM X

GROUP BY ObjID)

I don't know but may be it works. Did you try this... or even this can be tried...

SELECT X.* FROM

(SELECT MAX(HistoryID) AS MaxID

FROM X GROUP BY ObjID) AS X1

INNER JOIN X ON X.HistoryID = X1.MaxID

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 4

8kb (9/15/2010)
CREATE INDEX X_IX1 ON X (ObjID)

Note, non-covering, non-clustered index, so we're going here, to the PrimaryKey:

|--Index Scan(OBJECT ([LoadTest].[dbo].[X].[X_IX1]), ORDERED FORWARD)

|--Clustered Index Seek(OBJECT ([LoadTest].[dbo].[X].[PK__X__3214EC273F466844]),

Add HistoryID to the index to use the index directly instead of the second lookup on the clustered index.

Next this:

The table has about 4 million rows and is 854MB in size.
(and from later...)
It takes about 10 seconds to return 5000 records which seems like forever considering it's a local database on my laptop (Intel Core Duo - 4GB RAM).

Now, in theory, you could load the entire DB to RAM, I know, but that's going to depend on how much room his local server is allowed overall. Also, if the db is local to the laptop, it's sitting on a single spindle IDE drive. You simply will NEVER get high speed performance from this. I'm kinda surprised it went that fast, actually. I'd have assumed it was holding a lot of data in cache still if it wasn't for the DROPCLEANBUFFERS you've ran.

1) Is 10 seconds a reasonable amount of time for this query to run?
2) Is there anything I can do to speed it up?

1) In the local environment, in my opinion, for 4 mill rows that're wide... yeah, that's reasonable.

2) Get it on the real server. 🙂 Also, try altering your index to be a covering index, should recover some time that way. Just make it ( ObjID, HistoryID)

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

Can you post the full execution plan as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ken McKelvey SSCoach Points: 18987 More actions · Answer 6

Does this work?

SELECT *

FROM X

WHERE HistoryID = MAX(HistoryID) OVER (PARTITION BY [ObjID])

thetodaisies Ten Centuries Points: 1387 More actions · Answer 7

why not use a temp table with an index instead of using a subquery

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

Craig Farrell (9/16/2010)
8kb (9/15/2010)
CREATE INDEX X_IX1 ON X (ObjID)
Note, non-covering, non-clustered index, so we're going here, to the PrimaryKey:
|--Index Scan(OBJECT ([LoadTest].[dbo].[X].[X_IX1]), ORDERED FORWARD)
|--Clustered Index Seek(OBJECT ([LoadTest].[dbo].[X].[PK__X__3214EC273F466844]),
Add HistoryID to the index to use the index directly instead of the second lookup on the clustered index.

That's probably the best advice so far and would also be what I'd try next.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jesse Reich Hall of Fame Points: 3020 More actions · Answer 9

Thanks everyone for your help. To create a baseline, I've attached a script to load sample data (takes about 2 minutes if you're interested). I also attached the sqlplan for the original query.

The new table is larger than the original, so now the query takes about 30 seconds instead of 10.

I tried everyone's suggestions but noticed no substantial improvements:

Original (8kb): 30493 ms

Remove ObjID from subquery (J. Moden): 29384 ms

IN clause (sanmatrix): 29970 ms

Covering index (C. Farrell): 28879 ms

Temp table (thetodaisis): 29504 ms

I haven't yet run it on the server. The only other thing I can think of is partitioning the table...

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 10

On my laptop, I'm getting subsecond response time using data generated by your script and no additional indexing. But I'm testing by writing to a temp table. When I let it display all the rows, it runs in 10-11 seconds. Thanks to Mr. Moden for teaching me not to confuse the time to produce a result set with the time it takes the screen to scroll.

declare @timer datetime = getdate()

;with cte as ( select ObjID, max(HistoryID) as MaxID

from dbo.X

group by OBJID

)

select X.*

into #temp

from dbo.X

join cte c on c.objID = x.objID

and c.maxID = x.HistoryID

select cast(@@ROWCOUNT as varchar) as rows, DATEDIFF(MS,@timer,GETDATE()) as milliseconds

drop table #temp

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 11

In the example below, a CROSS APPLY technique manages to run a hair faster with an execution plan that goes directly from an index scan into a clustered index seek, instead of index scan / NC-index seek / keylookup.

declare @timer datetime = getdate()

;with cte as ( select ObjID, max(HistoryID) as MaxID

from dbo.X

group by OBJID

)

select X.*

into #temp1

from dbo.X

join cte c on c.objID = x.objID

and c.maxID = x.HistoryID

select cast(@@ROWCOUNT as varchar) as rows, DATEDIFF(MS,@timer,GETDATE()) as milliseconds

drop table #temp1

set @timer = GETDATE()

;with cte as ( select ObjID, max(HistoryID) as MaxID

from dbo.X

group by OBJID

)

select ca.*

into #temp2

from cte c

cross apply (select * from dbo.x where objID = x.objID

and c.maxID = x.HistoryID) ca

select cast(@@ROWCOUNT as varchar) as rows, DATEDIFF(MS,@timer,GETDATE()) as milliseconds

drop table #temp2

Logical reads dropped from 36091 (JOIN) to 20770 (Cross Apply).

I tried Jeff's suggestion of dropping the ObjID from the JOIN and seemed to save another 10 or 12 milliseconds out of 800+. Without the ObjID column, the JOIN ran faster than the CROSS APPLY without ObjID. The execution plans appeared the same as before. In both cases there were 20770 logical reads.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Paul White SSC Guru Points: 150468 More actions · Answer 12

The most efficient execution avoids any joins:

That's a single ordered scan of an index. One way to achieve that plan is to change the clustered index:

CREATE TABLE X

(

HistoryID INT NOT NULL IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,

ObjID INT NOT NULL,

Date1 DATETIME2,

Date2 DATETIME2,

Info1 CHAR(75),

Info2 CHAR(200),

Int1 INT,

Int2 INT,

Int3 INT

)

GO

CREATE UNIQUE CLUSTERED INDEX cuq ON X (ObjID, HistoryID)

GO

Unfortunately, cost estimation overstates the cost of the plan above. I get the optimal plan by using a query hint:

SELECT X1.*

FROM dbo.X X1

WHERE X1.HistoryID =

(

SELECT MAX(X2.HistoryID)

FROM dbo.X X2

WHERE X2.ObjID = X1.ObjID

)

OPTION (FAST 100);

The more robust alternative is to force the plan shape:

SELECT X1.*

FROM dbo.X X1

WHERE X1.HistoryID =

(

SELECT MAX(X2.HistoryID)

FROM dbo.X X2

WHERE X2.ObjID = X1.ObjID

)

OPTION (

USE PLAN

N'<?xml version="1.0" encoding="utf-16"?>

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.2789.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

<BatchSequence>

<Batch>

<Statements>

<StmtSimple StatementCompId="1" StatementEstRows="3950.43" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="1.23612" StatementText="SELECT X1.* FROM dbo.X X1 WHERE X1.HistoryID = ( SELECT MAX(X2.HistoryID) FROM dbo.X X2 WHERE X2.ObjID = X1.ObjID ) " StatementType="SELECT" QueryHash="0x6EA71E2F95338845" QueryPlanHash="0x93D3C5FB53F91114">

<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan CachedPlanSize="16" CompileTime="6" CompileCPU="6" CompileMemory="248">

<RelOp AvgRowSize="319" EstimateCPU="0.000879009" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="3950.43" LogicalOp="Top" NodeId="1" Parallel="false" PhysicalOp="Top" EstimatedTotalSubtreeCost="1.23612">

<OutputList>

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="HistoryID" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="ObjID" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Date1" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Date2" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Info1" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Info2" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Int1" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Int2" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="Int3" />

</OutputList>

<Top RowCount="false" Rows="1" IsPercent="false" WithTies="true">

<TieColumns>

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="ObjID" />

<ColumnReference Database="[Sandpit]" Schema="[dbo]" Table="[X]" Alias="[X1]" Column="HistoryID" />

</TieColumns>

<RelOp AvgRowSize="319" EstimateCPU="0.00351603" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="40000" LogicalOp="Segment" NodeId="2" Parallel="false" PhysicalOp="Segment" EstimatedTotalSubtreeCost="1.23524">