getting rid of sort operation

Question

getting rid of sort operation

winston Smith

SSCoach

Points: 19500
More actions
October 25, 2012 at 8:49 am

#284039

I have an issue with a large query, where the sort operation caused by a group by is costing over 80% of the entire query cost.
I want to get rid of the sort operation, if possible, using sorted indexes.
I cannot post the query here due to NDA and all that but i have a contrived example using adventureworks.
Query is
select od.DueDate ,pr.Name
from purchasing.PurchaseOrderDetail od
inner join Production.Product pr on pr.ProductID = od.ProductID
group by od.DueDate,
pr.Name
This query causes a sort on PurchaseOrderDetail.DueDate Asc, Product.Name Asc.
I have added varioius non clustered indexes (covering etc) but I cant get the sort operation to go away. If its not possible, no biggie, il figure something else out, but can anyone advise if I can actually remove this sort operation?
Query Plan XML is below (apologies if this clutters up the thread)
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.2500.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="7086.95" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.379491" StatementText="select od.DueDate ,pr.Name from purchasing.PurchaseOrderDetail od inner join Production.Product pr on pr.ProductID = od.ProductID group by od.DueDate, pr.Name " StatementType="SELECT" QueryHash="0x866C10AEBAE3FA0E" QueryPlanHash="0x7EBA0391CA4A047D">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="0" MemoryGrant="1920" CachedPlanSize="24" CompileTime="13" CompileCPU="13" CompileMemory="384">
<RelOp AvgRowSize="55" EstimateCPU="0.297984" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="7086.95" LogicalOp="Distinct Sort" NodeId="0" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.379491">
<OutputList>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="DueDate" />
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="Name" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1" ActualRewinds="0" ActualRows="8243" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Sort Distinct="true">
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="DueDate" />
</OrderByColumn>
<OrderByColumn Ascending="true">
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="Name" />
</OrderByColumn>
</OrderBy>
<RelOp AvgRowSize="55" EstimateCPU="0.0259874" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="7545.11" LogicalOp="Inner Join" NodeId="1" Parallel="false" PhysicalOp="Merge Join" EstimatedTotalSubtreeCost="0.0702457">
<OutputList>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="DueDate" />
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="Name" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="8845" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Merge ManyToMany="false">
<InnerSideJoinColumns>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="ProductID" />
</InnerSideJoinColumns>
<OuterSideJoinColumns>
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="ProductID" />
</OuterSideJoinColumns>
<Residual>
<ScalarOperator ScalarString="[AdventureWorks].[Purchasing].[PurchaseOrderDetail].[ProductID] as [od].[ProductID]=[AdventureWorks].[Production].[Product].[ProductID] as [pr].[ProductID]">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="ProductID" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="ProductID" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
</Residual>
<RelOp AvgRowSize="51" EstimateCPU="0.0007114" EstimateIO="0.0120139" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="504" LogicalOp="Clustered Index Scan" NodeId="2" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.0127253" TableCardinality="504">
<OutputList>
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="ProductID" />
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="Name" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="458" ActualEndOfScans="0" ActualExecutions="1" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="ProductID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Alias="[pr]" Column="Name" />
</DefinedValue>
</DefinedValues>
<Object Database="[AdventureWorks]" Schema="[Production]" Table="[Product]" Index="[PK_Product_ProductID]" Alias="[pr]" IndexKind="Clustered" />
</IndexScan>
</RelOp>
<RelOp AvgRowSize="19" EstimateCPU="0.0098865" EstimateIO="0.0216435" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="8845" LogicalOp="Index Scan" NodeId="3" Parallel="false" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="0.03153" TableCardinality="8845">
<OutputList>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="DueDate" />
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="ProductID" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="8845" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="DueDate" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Alias="[od]" Column="ProductID" />
</DefinedValue>
</DefinedValues>
<Object Database="[AdventureWorks]" Schema="[Purchasing]" Table="[PurchaseOrderDetail]" Index="[IDX_Sorted]" Alias="[od]" IndexKind="NonClustered" />
</IndexScan>
</RelOp>
</Merge>
</RelOp>
</Sort>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

winston Smith SSCoach Points: 19500 More actions · Answer 1

Update:

Think i got it! forgot to take into consideration the order of the predicates. Join predicate must be used first in the index, followed by whatever im grouping/sorting by:

CREATE NONCLUSTERED INDEX [IDX_Sorted] ON purchasing.PurchaseOrderDetail

(

ProductID ASC,

DueDate ASC

)

still interested to hear any other solutions or reasons why what im trying may not be worth the effort or any other arguments on the topic.

GSquared SSC Guru Points: 260824 More actions · Answer 2

So far as I know, Group By will require a sort, since it has to put the rows together to do whatever aggregation you're asking for. You might be able to short-circuit that by using a cleverly crafted index, but I'm not sure how much you'll actually gain from that, if anything.

Don't tune based on the estimated costs of the operations in the execution plan. Tune based on the I/O and time the query takes. The estimated costs are just a sort of "hey, you might want to look at this", and don't necessarily mean anything at all. And they can be WAY off, too.

For example, recursive CTEs and table variables will usually return very low costs, based on the assumption they will only have 1 row in them (which is what the optimizer pretty much has to assume since it has no way of knowing how many will actually be there), but they can easily end up being the slowest parts of a query.

So, try the query as-is and try with the index you want, and check a trace/extended events log, and see how long they actually took, and how much I/O, and see what you're really gaining.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

winston Smith SSCoach Points: 19500 More actions · Answer 3

GSquared (10/25/2012)
So far as I know, Group By will require a sort, since it has to put the rows together to do whatever aggregation you're asking for. You might be able to short-circuit that by using a cleverly crafted index, but I'm not sure how much you'll actually gain from that, if anything.
Don't tune based on the estimated costs of the operations in the execution plan. Tune based on the I/O and time the query takes. The estimated costs are just a sort of "hey, you might want to look at this", and don't necessarily mean anything at all. And they can be WAY off, too.
For example, recursive CTEs and table variables will usually return very low costs, based on the assumption they will only have 1 row in them (which is what the optimizer pretty much has to assume since it has no way of knowing how many will actually be there), but they can easily end up being the slowest parts of a query.
So, try the query as-is and try with the index you want, and check a trace/extended events log, and see how long they actually took, and how much I/O, and see what you're really gaining.

Thanks for the advice! Problem running the query as is, is that Its currently taking quite a while to run (minutes) which means each change i make i have to re-execute and just wait for quite some time to find out if i made a difference. Guess thats just the way it is! Thanks!

If the sort is showing though as taking 80% of the execution plan after an actual run, its pretty good indication that i should direct my efforts there, right?

GSquared SSC Guru Points: 260824 More actions · Answer 4

winston Smith (10/25/2012)
GSquared (10/25/2012)
So far as I know...
So, try the query as-is and try with the index you want, and check a trace/extended events log, and see how long they actually took, and how much I/O, and see what you're really gaining.
Thanks for the advice! Problem running the query as is, is that Its currently taking quite a while to run (minutes) which means each change i make i have to re-execute and just wait for quite some time to find out if i made a difference. Guess thats just the way it is! Thanks!
If the sort is showing though as taking 80% of the execution plan after an actual run, its pretty good indication that i should direct my efforts there, right?

On the 80% piece, you can certainly look into that. But the estimated costs and percentages are not fixed by using actual execution plans instead of estimated ones. (I think that's what you mean.)

As to how much tuning to do on the query, that will depend on what you need to achieve with it. It currently takes minutes to run, but what is the goal for tuning it? Seconds? Less minutes? Miliseconds? Is it something that runs in the background during some automated process and a few minutes isn't really hurting anything, or is it something users are having to wait for while a page loads, or something else?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon