Help with the query and index.

Question

Post reply

Help with the query and index.

mayurkb

Ten Centuries

Points: 1155
More actions
August 7, 2013 at 9:03 am

#281672

I've a table which has about 15 columns, I'll just put 3 columns here in the script that we would be using in the script.
CREATE TABLE "schema"."tblTest"
(
"TestId" uniqueidentifier NOT NULL ,
"StartDateTime" datetime2 NOT NULL ,
"EndDateTime" datetime2 NULL
)
TestId is PK with a clustered index. Also, there is 1 nonclustered index on StartDateTime. No index is present on Enddatetime.
The query I need to rewrite\optimise is like this:
SELECT TOP (500) test.[TestId]
FROM tblTest as test
WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;
The table will have about 1 million records, and about 1/16th (about 6-7%) of that would be part of the output at any given time. Right now this query does clustered index scan. I would like it to do seeks instead.
Solutions I tried:
1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.
2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.
3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.
;with cte as
(SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime < @Date
union all
SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime is null and test.StartDateTime < @Date)
select top 500 testid from cte
Any help on finding a better solution would be appreciated.
Thanks in advance.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286609 More actions · Answer 1

Why not just split the two conditions? No need for a cte here.

SELECT TOP (500) test.[TestId]

FROM tblTest as test

WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;

The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

mayurkb Ten Centuries Points: 1155 More actions · Answer 2

Sean Lange (8/7/2013)
Why not just split the two conditions? No need for a cte here.
SELECT TOP (500) test.[TestId]
FROM tblTest as test
WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;
The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.

Thanks Sean.

We have to use StartDateTime only if Enddatetime is null. We are doing batching that's why 500. That's configurable and we do not care about ordering.

Stefan_G SSCertifiable Points: 6609 More actions · Answer 3

Solutions I tried:
1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.
2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.
3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.
;with cte as
(SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime < @Date
union all
SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime is null and test.StartDateTime < @Date)
select top 500 testid from cte
Any help on finding a better solution would be appreciated.
Thanks in advance.

What is wrong with your solution 3? It seems like an ok solution to me?

Another possibility could be to create a new regular column with a non-clustered index and let a trigger populate it with (case when EndDateTime is not null then EndDateTime Else StartDateTime End)

This would of course increase the storage requirements and slow down inserts, but at lest the select would be extremely fast.

/SG

ScottPletcher SSC Guru Points: 101235 More actions · Answer 4

Do you really need only the TestId column in the result set? You don't all the columns?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

mayurkb Ten Centuries Points: 1155 More actions · Answer 5

Thank you everybody for the reply. Sorry for replying late as I got caught up in something else.

@stefan,

I don't really mind increased storage as long as I get best performance for reads. Normal column with trigger is a good idea. We can't really use triggers as we have distributed environment, data is sync'd by MS sync framework.

@scott,

I only need TestId as I want to delete those older tests.

ScottPletcher SSC Guru Points: 101235 More actions · Answer 6

You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

SQL will only be able to do a seek on that index if it's headed by EndDateTime.

So, you could:

1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

OR

2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

mayurkb Ten Centuries Points: 1155 More actions · Answer 7

ScottPletcher (8/9/2013)
You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.
SQL will only be able to do a seek on that index if it's headed by EndDateTime.
So, you could:
1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.
OR
2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.
Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 8

mayurkb (8/9/2013)
ScottPletcher (8/9/2013)
You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.
SQL will only be able to do a seek on that index if it's headed by EndDateTime.
So, you could:
1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.
OR
2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.
Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.
Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.

And your problem is what exactly?

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

DROP TABLE #SomeTestTable;

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

@StartDate DATEtime,

@EndDate DATEtime,

@Days DECIMAL(10,2), --This is still the "range"

@StartValue INT,

@EndValue INT,

@Range INT;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000, -- a million rows

@StartDate = '2010', --Inclusive

@EndDate = '2020', --Exclusive

@Days = DATEDIFF(dd,@StartDate,@EndDate),

@StartValue = 100,

@EndValue = 900,

@Range = @EndValue - @StartValue + 1;

--===== Create "random constrained" integers within

-- the parameters identified in the variables above.

SELECT

ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),

StartDateTime,

EndDateTime = DATEADD(DD,ABS(CHECKSUM(NEWID()))%100,StartDateTime)

INTO #SomeTestTable

FROM (

SELECT TOP (@NumberOfRows)

StartDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

) d;

UPDATE t SET EndDateTime = NULL

FROM #SomeTestTable t

WHERE ID IN (

SELECT TOP 100000 ID FROM #SomeTestTable ORDER BY NEWID())

CREATE UNIQUE CLUSTERED INDEX ucx_ID ON #SomeTestTable (ID)

CREATE INDEX ix_StartDateTime ON #SomeTestTable (StartDateTime)

CREATE INDEX ix_EndDateTime ON #SomeTestTable (EndDateTime)

CREATE INDEX ix_StartDateTimeEndDateTime ON #SomeTestTable (StartDateTime,EndDateTime)

CREATE INDEX ix_EndDateTimeStartDateTime ON #SomeTestTable (EndDateTime,StartDateTime)

-- #########################################################################

--

-- Q1 Index scan ix_EndDateTimeStartDateTime; Duration about 60ms

DECLARE @Date DATE = '20120513'

SELECT TOP (10000) test.ID

FROM #SomeTestTable as test

WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

go 5

--

-- Q2 Index seek ix_EndDateTime; Duration about 40ms

DECLARE @Date DATE = '20120513'

SELECT TOP (10000) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)

go 5

--

-- Q3 index scan ix_EndDateTimeStartDateTime; Duration about 50ms

DECLARE @Date DATE = '20120513'

SELECT TOP (10000) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date) OR (test.EndDateTime IS NULL AND test.StartDateTime < @Date)

go 5

--

-- Q4 index scan ix_EndDateTimeStartDateTime; Duration too short to measure

DECLARE @Date DATE = '20120513'

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

go 5

--

-- Q5 index seek ix_EndDateTime; Duration too short to measure

DECLARE @Date DATE = '20120513'

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)

go 5

--

-- Q6 Index scan ix_EndDateTimeStartDateTime; Duration too short to measure

DECLARE @Date DATE = '20120513'

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date) OR (test.EndDateTime IS NULL AND test.StartDateTime < @Date)

go 5

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

ScottPletcher SSC Guru Points: 101235 More actions · Answer 9

Hmm, interesting. SQL's still "dumber" than I thought :-).

We'll need to UNION the query results instead of using "OR".

Also, as always, we want the variable to be exactly the same data type as the column, to insure that no implicit conversions will be needed.

DECLARE @Date datetime2

SET @Date = '20120513'

SELECT TOP (500) test.ID

FROM (

-- index seek

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)

UNION --sort and eliminate dups, since we're sorting in the outer query anyway

-- index seek

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE test.EndDateTime IS NULL AND test.StartDateTime < @Date

) AS derived

ORDER BY

test.ID

Too bad SQL can't do that itself with the "OR" there, but I know it has to allow for far more complex variations where such a clean translation might not be possible.

Edit: Changed "datetime" to "datetime2" to match the OP's table defs.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".