SQL Improvement

Question

SQL Improvement

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 1

Performace Guard (Shehap) (10/7/2011)
Since no enough information about how data entity size is there , we could come up with an oustanding wokable solution to perform it perfectly as below :
1-Breaking down into subqueries as below :
select a1, clsdt_date,wrkcpl_date
FROM wt WHERE clsdt_date = wrkcpl_date
union
select a1, clsdt_date,wrkcpl_date
FROM wt where clsdt_date is null
UNION
select a1, clsdt_date,wrkcpl_date
FROM wt WHERE wrkcpl_date is null
2-Then create 3 indexes ( 2 filtered index + one index ) :
Create nonclustered index wt_index1 on wt
(clsdt_date asc,
wrkcpl_date asc)include (a1) where clsdt_date is null
Create nonclustered index wt_index2 on wt
(clsdt_date asc,
wrkcpl_date asc)include (a1) where wrkcpl_date is null
Create nonclustered index wt_index3 on wt
(clsdt_date asc,
wrkcpl_date asc)include (a1)
3-Then apply page compression for the 3 indexes to afford sufficiently more data entity size

Test it... see if the indexes show up in the execution plan. If they don't, they can't help.

{Edit} Corrected based on findings below.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Moutray SSCertifiable Points: 5956 More actions · Answer 2

Sure thing, Jeff. I'll try to be really rigorous instead of just kinda rigorous! 😎

To start out, I am going to recreate everything from scratch. Here is the query I will be testing against:

select a1, clsdt_date,wrkcpl_date

FROM wt

WHERE

( clsdt_date = wrkcpl_date

OR clsdt_date is null

OR wrkcpl_date is null

);

On my laptop, this query takes about 4 seconds to return 9,862 records. 88% of the processing is taken up by the table scan. Here is the execution plan:

<?xml version="1.0" encoding="utf-16"?>

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.1617.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

<BatchSequence>

<Batch>

<Statements>

<StmtSimple StatementCompId="1" StatementEstRows="1061940" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="35.9431" StatementText="select a1, clsdt_date,wrkcpl_date FROM wt WHERE ( clsdt_date = wrkcpl_date OR clsdt_date is null OR wrkcpl_date is null ) " StatementType="SELECT" QueryHash="0xFD4CB9DF1B3A6019" QueryPlanHash="0x00A53E5CC61A6032">

<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan DegreeOfParallelism="8" MemoryGrant="0" CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="144">

<RelOp AvgRowSize="27" EstimateCPU="1.07544" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1061940" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="35.9431">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="9862" ActualEndOfScans="1" ActualExecutions="1" />

</RunTimeInformation>

<Parallelism>

<RelOp AvgRowSize="27" EstimateCPU="2.97303" EstimateIO="28.7054" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1061940" LogicalOp="Table Scan" NodeId="1" Parallel="true" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="31.6785" TableCardinality="10810900">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="7" ActualRows="882" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="8" ActualRows="893" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="6" ActualRows="846" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="4" ActualRows="901" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="5" ActualRows="867" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="3" ActualRows="851" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="2" ActualRows="3955" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="667" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

<TableScan Ordered="false" ForcedIndex="false" NoExpandHint="false">

<DefinedValues>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</DefinedValue>

</DefinedValues>

<Object Database="[TEST]" Schema="[dbo]" Table="[WT]" IndexKind="Heap" />

<Predicate>

<ScalarOperator ScalarString="[TEST].[dbo].[WT].[clsdt_date]=[TEST].[dbo].[WT].[wrkcpl_date] OR [TEST].[dbo].[WT].[clsdt_date] IS NULL OR [TEST].[dbo].[WT].[wrkcpl_date] IS NULL">

<Logical Operation="OR">

<ScalarOperator>

<Compare CompareOp="EQ">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

</Compare>

</ScalarOperator>

<ScalarOperator>

<Compare CompareOp="IS">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="NULL" />

</ScalarOperator>

</Compare>

</ScalarOperator>

<ScalarOperator>

<Compare CompareOp="IS">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="NULL" />

</ScalarOperator>

</Compare>

</ScalarOperator>

</Logical>

</ScalarOperator>

</Predicate>

</TableScan>

</RelOp>

</Parallelism>

</RelOp>

</QueryPlan>

</StmtSimple>

</Statements>

</Batch>

</BatchSequence>

</ShowPlanXML>

P.S. Just copy and paste the XML into SQL Server Management Studio to see the graphical execution plan.

OK. Now let's clear the cache, create Performance Guard's suggested index and try it again:

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;

CREATE INDEX [IX_Clsdt_Wrkcpl_Date] ON WT(clsdt_date, wrkcpl_date) INCLUDE (a1);

This time when I run the query, it executes in 3 seconds, but the execution plan is exactly the same. Apparently the index is not being used at all.

OK. Now let's create the additional two indexes I suggested and try again:

CREATE INDEX [IX_CLSDT_Date] ON WT(clsdt_date) INCLUDE (wrkcpl_date);

CREATE INDEX [IX_WRKCPL_Date] ON WT(wrkcpl_date) INCLUDE (clsdt_date);

Once again, the query takes 4 seconds to run (these small differences in run times are irrelevant, apparently) and the execution plan is - once again - exactly the same. The indexes are not being used.

Now, I am going to blow everything away, and start over. The only difference is that this time, I am going to put a primary key on the table, like so:

CREATE TABLE WT

( a1 INT IDENTITY PRIMARY KEY

,clsdt_date DATETIME NULL

,wrkcpl_date DATETIME NULL

);

When I run the query again, with no indexes except that default clustered primary key index, everything is exactly the same - except now we have a clustered index scan instead of a table scan. Here is the execution plan:

<?xml version="1.0" encoding="utf-16"?>

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.1617.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

<BatchSequence>

<Batch>

<Statements>

<StmtSimple StatementCompId="1" StatementEstRows="1062360" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="35.9435" StatementText="select a1, clsdt_date,wrkcpl_date FROM wt WHERE ( clsdt_date = wrkcpl_date OR clsdt_date is null OR wrkcpl_date is null ) " StatementType="SELECT" QueryHash="0xFD4CB9DF1B3A6019" QueryPlanHash="0x4517426011C6C58E">

<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan DegreeOfParallelism="8" MemoryGrant="0" CachedPlanSize="16" CompileTime="3843" CompileCPU="1303" CompileMemory="136">

<RelOp AvgRowSize="27" EstimateCPU="1.07585" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1062360" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="35.9435">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="9862" ActualEndOfScans="1" ActualExecutions="1" />

</RunTimeInformation>

<Parallelism>

<RelOp AvgRowSize="27" EstimateCPU="2.97305" EstimateIO="28.7053" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1062360" LogicalOp="Clustered Index Scan" NodeId="1" Parallel="true" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="31.6784" TableCardinality="10810900">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="6" ActualRows="458" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="8" ActualRows="867" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="4" ActualRows="1689" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="3" ActualRows="822" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="5" ActualRows="1233" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="2" ActualRows="458" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="7" ActualRows="366" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="3969" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

<IndexScan Ordered="false" ForcedIndex="false" NoExpandHint="false">

<DefinedValues>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</DefinedValue>

</DefinedValues>

<Object Database="[TEST]" Schema="[dbo]" Table="[WT]" Index="[PK__WT__3213A9FA0EA330E9]" IndexKind="Clustered" />

<Predicate>

<ScalarOperator ScalarString="[TEST].[dbo].[WT].[clsdt_date]=[TEST].[dbo].[WT].[wrkcpl_date] OR [TEST].[dbo].[WT].[clsdt_date] IS NULL OR [TEST].[dbo].[WT].[wrkcpl_date] IS NULL">

<Logical Operation="OR">

<ScalarOperator>

<Compare CompareOp="EQ">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

</Compare>

</ScalarOperator>

<ScalarOperator>

<Compare CompareOp="IS">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="NULL" />

</ScalarOperator>

</Compare>

</ScalarOperator>

<ScalarOperator>

<Compare CompareOp="IS">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="NULL" />

</ScalarOperator>

</Compare>

</ScalarOperator>

</Logical>

</ScalarOperator>

</Predicate>

</IndexScan>

</RelOp>

</Parallelism>

</RelOp>

</QueryPlan>

</StmtSimple>

</Statements>

</Batch>

</BatchSequence>

</ShowPlanXML>

OK. Next, we will add Performance Guard's index back in - without the INCLUDE - since the field, a1, is now the clustered index:

CREATE INDEX [IX_Clsdt_Wrkcpl_Date] ON WT(clsdt_date, wrkcpl_date);

Finally we have a little bit of a difference. The query runs in 0 seconds. The overall resource usage does not go down by that much - about 10%. The execution plan shows an index scan on the non-clustered index that we just created.

Now, let's recreate those two separate indexes with the INCLUDEs and try again:

Once again, the query runs in 0 seconds. Resource usage is almost the same, except now the execution plan shows that the optimizer has picked a different index for the index scan:

<?xml version="1.0" encoding="utf-16"?>

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.1617.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

<BatchSequence>

<Batch>

<Statements>

<StmtSimple StatementCompId="1" StatementEstRows="1065370" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="32.9931" StatementText="select a1, clsdt_date,wrkcpl_date FROM wt WHERE ( clsdt_date = wrkcpl_date OR clsdt_date is null OR wrkcpl_date is null ) " StatementType="SELECT" QueryHash="0xFD4CB9DF1B3A6019" QueryPlanHash="0x570AB7099D491107">

<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan DegreeOfParallelism="8" MemoryGrant="0" CachedPlanSize="16" CompileTime="20" CompileCPU="1" CompileMemory="144">

<RelOp AvgRowSize="27" EstimateCPU="1.07882" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1065370" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="32.9931">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="9862" ActualEndOfScans="1" ActualExecutions="1" />

</RunTimeInformation>

<Parallelism>

<RelOp AvgRowSize="27" EstimateCPU="2.97305" EstimateIO="25.752" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1065370" LogicalOp="Index Scan" NodeId="1" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="28.7251" TableCardinality="10810900">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="8" ActualRows="830" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="7" ActualRows="890" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="5" ActualRows="826" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="4" ActualRows="958" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="3" ActualRows="711" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="6" ActualRows="894" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="764" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="2" ActualRows="3989" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

<IndexScan Ordered="false" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">

<DefinedValues>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</DefinedValue>

</DefinedValues>

<Object Database="[TEST]" Schema="[dbo]" Table="[WT]" Index="[IX_WRKCPL_Date]" IndexKind="NonClustered" />

<Predicate>

<ScalarOperator ScalarString="[TEST].[dbo].[WT].[clsdt_date]=[TEST].[dbo].[WT].[wrkcpl_date] OR [TEST].[dbo].[WT].[clsdt_date] IS NULL OR [TEST].[dbo].[WT].[wrkcpl_date] IS NULL">

<Logical Operation="OR">

<ScalarOperator>

<Compare CompareOp="EQ">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

</Compare>

</ScalarOperator>

<ScalarOperator>

<Compare CompareOp="IS">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="NULL" />

</ScalarOperator>

</Compare>

</ScalarOperator>

<ScalarOperator>

<Compare CompareOp="IS">

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="NULL" />

</ScalarOperator>

</Compare>

</ScalarOperator>

</Logical>

</ScalarOperator>

</Predicate>

</IndexScan>

</RelOp>

</Parallelism>

</RelOp>

</QueryPlan>

</StmtSimple>

</Statements>

</Batch>

</BatchSequence>

</ShowPlanXML>

Well, now! I have to thank both Drew Allen and Jeff Moden for teaching me something! 🙂

Testing might not be sexy, but it gives you solid insight, and it is very, very hard to argue against. None of our indexes were used at all until we added a primary key with a clustered index. Once we added that primary key and clustered index, either of the indexes we added were fine, as long as both date fields were included.

Side Note: For those of you who would argue, "What about a clustered index without a primary key?", I can only say, "Who is this who darkens wisdom and hinders understanding? May Codd rebuke you!" :angry:

David Moutray SSCertifiable Points: 5956 More actions · Answer 3

Oh! I forgot to test Jeff's query!

SELECT a1, clsdt_date, wrkcpl_date

FROM wt

WHERE ISNULL(DATEDIFF(dd,clsdt_date,wrkcpl_date),0) = 0;

It's hard to say, but it did seem to run slightly faster. Resource usage was slightly less. (I'm not sure if that is significant or not.) The execution plan looked the same, but seemed to have a slightly different distribution of resource allocation.

<?xml version="1.0" encoding="utf-16"?>

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.1617.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

<BatchSequence>

<Batch>

<Statements>

<StmtSimple StatementCompId="1" StatementEstRows="188537" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="30.507" StatementText="SELECT a1, clsdt_date, wrkcpl_date FROM wt WHERE ISNULL(DATEDIFF(dd,clsdt_date,wrkcpl_date),0) = 0; " StatementType="SELECT" QueryHash="0x168D8821B312D68F" QueryPlanHash="0xFF7A6E360B393BB1">

<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan DegreeOfParallelism="8" MemoryGrant="0" CachedPlanSize="16" CompileTime="44" CompileCPU="22" CompileMemory="176">

<RelOp AvgRowSize="27" EstimateCPU="0.214374" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="188537" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="30.507">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="9862" ActualEndOfScans="1" ActualExecutions="1" />

</RunTimeInformation>

<Parallelism>

<RelOp AvgRowSize="27" EstimateCPU="2.97305" EstimateIO="25.752" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="188537" LogicalOp="Index Scan" NodeId="2" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="28.7251" TableCardinality="10810900">

<OutputList>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</OutputList>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="8" ActualRows="827" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="4" ActualRows="891" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="5" ActualRows="894" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="7" ActualRows="1020" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="6" ActualRows="3926" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="3" ActualRows="767" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="901" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="2" ActualRows="636" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

<IndexScan Ordered="false" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">

<DefinedValues>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="a1" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</DefinedValue>

<DefinedValue>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</DefinedValue>

</DefinedValues>

<Object Database="[TEST]" Schema="[dbo]" Table="[WT]" Index="[IX_WRKCPL_Date]" IndexKind="NonClustered" />

<Predicate>

<ScalarOperator ScalarString="isnull(datediff(day,[TEST].[dbo].[WT].[clsdt_date],[TEST].[dbo].[WT].[wrkcpl_date]),(0))=(0)">

<Compare CompareOp="EQ">

<ScalarOperator>

<Intrinsic FunctionName="isnull">

<ScalarOperator>

<Intrinsic FunctionName="datediff">

<ScalarOperator>

<Const ConstValue="(4)" />

</ScalarOperator>

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="clsdt_date" />

</Identifier>

</ScalarOperator>

<ScalarOperator>

<Identifier>

<ColumnReference Database="[TEST]" Schema="[dbo]" Table="[WT]" Column="wrkcpl_date" />

</Identifier>

</ScalarOperator>

</Intrinsic>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="(0)" />

</ScalarOperator>

</Intrinsic>

</ScalarOperator>

<ScalarOperator>

<Const ConstValue="(0)" />

</ScalarOperator>

</Compare>

</ScalarOperator>

</Predicate>

</IndexScan>

</RelOp>

</Parallelism>

</RelOp>

</QueryPlan>

</StmtSimple>

</Statements>

</Batch>

</BatchSequence>

</ShowPlanXML>

For what it's worth, I also tried Lynn's query (with the UNION statements). I didn't see any difference in that, either. Apparently the optimizer saw no substantive difference between the three queries.

I might point out, though, that it is definitely worth checking these things out. The query optimizer DOES NOT check every possible solution to your query. It has to strike some balance between finding the perfect solution, and finding an acceptable solution quickly. I have seen minor changes to query structure have dramatic impacts on performance, even though both versions of the query were logically equivalent.

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 4

Interesting stuff (hang on a minute... gotta get this shoe out of my mouth... 🙂 there, that's better)... I wouldn't have believed that this query could ever be helped by indexing. Of course, I'm going to have to go through all of what dmoutray did, but coming up with a 0 runtime on this is very cool. Thanks to dmoutray, Shehap, and Lynn for their suggestions on this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 5

dmoutray (10/8/2011)
Just copy and paste the XML into SQL Server Management Studio to see the graphical execution plan.

That's just not working for me... is it because I'm using SQL Server 2005? It might also explain some of the results I got (still working on those).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 6

Perhaps it's because I'm using SQL Server 2005 on a 9 year old, single CPU box, but I've tried the index suggestions from dmoutray's testing above, especially the 0 second runtime suggestions, and I'm definitly NOT getting any close to a zero runtime.

Guess I'll have to break out the laptop after all... it has 2008 on it (not R2, though).

I'll try to get back to this tonight.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jags2001 SSCommitted Points: 1529 More actions · Answer 7

Thank you Jeff and dmoutray and the other contributors...I'm in Canada, near Toronto, and it's the Thanksgiving weekend. I'm in awe to the work you have done. Keep them coming...

I like the way you generated test data. and the simplified:

SELECT a1, clsdt_date, wrkcpl_date

FROM wt

WHERE ISNULL(DATEDIFF(dd,clsdt_date,wrkcpl_date),0) = 0

More info on this... this is part of a data extraction script to the data warehouse. A1 is the primary key and it is a clustered index. We support both SQL Server and Oracle. I tested it first against SQL Server to see the performance, then work on Oracle. I don't have an 11 million row table, and now I do..I learn from you.

Thanks to people like you... I'll test the code on Tuesday when I'm back to the office. It's a very nice day and enjoy the day outside, while it is not yet snowing.

JAgs...

David Moutray SSCertifiable Points: 5956 More actions · Answer 8

Strangely enough, it doesn't work for me, either! :unsure:

I played around with it for a bit, and got it to work, though. Paste the XML into Notepad (or some other text editor), and save it with a .SQLPLAN file extension. Then, if you double-click on the file, it will open in SSMS as a graphical execution plan.

I believe that will work in any version of SSMS from 2005 and later. (I am currently using SQL 2008 R2 at home.)

David Moutray SSCertifiable Points: 5956 More actions · Answer 9

Jeff Moden (10/9/2011)
Interesting stuff (hang on a minute... gotta get this shoe out of my mouth... 🙂 there, that's better)... I wouldn't have believed that this query could ever be helped by indexing. Of course, I'm going to have to go through all of what dmoutray did, but coming up with a 0 runtime on this is very cool. Thanks to dmoutray, Shehap, and Lynn for their suggestions on this.

I think this is really interesting, too. I'd like to know why the index improves performance, even though the optimizer has to elect for an index scan instead of an index seek.

One thing I have noticed - it may even be relevant - is that the optimizer always elects for an index scan over a table scan, if an appropriate index is available. There must be a reason for that. (Maybe one of those Internals gurus would know: Kalen Delaney, Gail Shaw, Itzik Ben-Gan, et. al.)

David Moutray SSCertifiable Points: 5956 More actions · Answer 10

Jeff Moden (10/9/2011)
Perhaps it's because I'm using SQL Server 2005 on a 9 year old, single CPU box, but I've tried the index suggestions from dmoutray's testing above, especially the 0 second runtime suggestions, and I'm definitly NOT getting any close to a zero runtime.

There was an interesting presentation on this topic during the recent 24 Hours of PASS. The presentation was Hardware 301: Diving Deeper into Database Hardware. The presenter showed how to get various performance metrics, and then he pointed out that the laptop you bought yesterday just might have better performance statistics than the server your company bought two years ago.

I'll bet if you test both your server and your laptop at GeekBench, you'll see a big difference.

http://www.primatelabs.ca/geekbench

I am using a two year-old HP laptop with dual quad-core processors and 8 GB of RAM. It performs pretty well, but I am still kicking myself that I didn't pay the extra money for SSD drives! 😉

Oh, the operating system is Windows 7 (64-bit). I am running SQL 2008 R2 Enterprise, also 64-bit. (Thank you, MSDN!)

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 11

David Moutray (10/9/2011)
Strangely enough, it doesn't work for me, either! :unsure:
I played around with it for a bit, and got it to work, though. Paste the XML into Notepad (or some other text editor), and save it with a .SQLPLAN file extension. Then, if you double-click on the file, it will open in SSMS as a graphical execution plan.
I believe that will work in any version of SSMS from 2005 and later. (I am currently using SQL 2008 R2 at home.)

Yep... I knew that would work... just thought you found some magic, David. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 12

David Moutray (10/9/2011)
Jeff Moden (10/9/2011)
Perhaps it's because I'm using SQL Server 2005 on a 9 year old, single CPU box, but I've tried the index suggestions from dmoutray's testing above, especially the 0 second runtime suggestions, and I'm definitly NOT getting any close to a zero runtime.
There was an interesting presentation on this topic during the recent 24 Hours of PASS. The presentation was Hardware 301: Diving Deeper into Database Hardware. The presenter showed how to get various performance metrics, and then he pointed out that the laptop you bought yesterday just might have better performance statistics than the server your company bought two years ago.
I'll bet if you test both your server and your laptop at GeekBench, you'll see a big difference.
http://www.primatelabs.ca/geekbench
I am using a two year-old HP laptop with dual quad-core processors and 8 GB of RAM. It performs pretty well, but I am still kicking myself that I didn't pay the extra money for SSD drives! 😉
Oh, the operating system is Windows 7 (64-bit). I am running SQL 2008 R2 Enterprise, also 64-bit. (Thank you, MSDN!)

Heh... and now you know why I usually test on the older box... if it's fast there, it'll fly on newer hardward. I rarely do any performance testing on my laptop which is a 2 year old HP-G6 with a dual-core and 4GB of ram... and that damned thing absolutely flies! It's faster than even the latest servers at work so I just don't use the laptop for performance testing in most cases.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 13

Jags2001 (10/9/2011)
Thank you Jeff and dmoutray and the other contributors...I'm in Canada, near Toronto, and it's the Thanksgiving weekend. I'm in awe to the work you have done. Keep them coming...
I like the way you generated test data. and the simplified:
SELECT a1, clsdt_date, wrkcpl_date
FROM wt
WHERE ISNULL(DATEDIFF(dd,clsdt_date,wrkcpl_date),0) = 0
More info on this... this is part of a data extraction script to the data warehouse. A1 is the primary key and it is a clustered index. We support both SQL Server and Oracle. I tested it first against SQL Server to see the performance, then work on Oracle. I don't have an 11 million row table, and now I do..I learn from you.
Thanks to people like you... I'll test the code on Tuesday when I'm back to the office. It's a very nice day and enjoy the day outside, while it is not yet snowing.
JAgs...

Thank you for the feedback, JAgs... I have to caution again, though... the ISNULL/DATEDIFF code is not a practice that you should get into because even in the presence of the correct predicates, the non-SARGable code of the ISNULL/DATEDIFF solution will prevent the ability of the code to do an index seek... it will force either an index scan or a table scan in one form or another.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)