August 7, 2013 at 9:03 am
I've a table which has about 15 columns, I'll just put 3 columns here in the script that we would be using in the script.
CREATE TABLE "schema"."tblTest"
(
"TestId" uniqueidentifier NOT NULL ,
"StartDateTime" datetime2 NOT NULL ,
"EndDateTime" datetime2 NULL
)
TestId is PK with a clustered index. Also, there is 1 nonclustered index on StartDateTime. No index is present on Enddatetime.
The query I need to rewrite\optimise is like this:
SELECT TOP (500) test.[TestId]
FROM tblTest as test
WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;
The table will have about 1 million records, and about 1/16th (about 6-7%) of that would be part of the output at any given time. Right now this query does clustered index scan. I would like it to do seeks instead.
Solutions I tried:
1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.
2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.
3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.
;with cte as
(SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime < @Date
union all
SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime is null and test.StartDateTime < @Date)
select top 500 testid from cte
Any help on finding a better solution would be appreciated.
Thanks in advance.
August 7, 2013 at 10:34 am
Why not just split the two conditions? No need for a cte here.
SELECT TOP (500) test.[TestId]
FROM tblTest as test
WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;
The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 7, 2013 at 11:46 am
Sean Lange (8/7/2013)
Why not just split the two conditions? No need for a cte here.
SELECT TOP (500) test.[TestId]
FROM tblTest as test
WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;
The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.
Thanks Sean.
We have to use StartDateTime only if Enddatetime is null. We are doing batching that's why 500. That's configurable and we do not care about ordering.
August 7, 2013 at 2:06 pm
Solutions I tried:
1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.
2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.
3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.
;with cte as
(SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime < @Date
union all
SELECT test.[TestId]
FROM schema.tblTest as test
WHERE test.EndDateTime is null and test.StartDateTime < @Date)
select top 500 testid from cte
Any help on finding a better solution would be appreciated.
Thanks in advance.
What is wrong with your solution 3? It seems like an ok solution to me?
Another possibility could be to create a new regular column with a non-clustered index and let a trigger populate it with (case when EndDateTime is not null then EndDateTime Else StartDateTime End)
This would of course increase the storage requirements and slow down inserts, but at lest the select would be extremely fast.
/SG
August 7, 2013 at 4:06 pm
Do you really need only the TestId column in the result set? You don't all the columns?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 9, 2013 at 7:25 am
Thank you everybody for the reply. Sorry for replying late as I got caught up in something else.
I don't really mind increased storage as long as I get best performance for reads. Normal column with trigger is a good idea. We can't really use triggers as we have distributed environment, data is sync'd by MS sync framework.
I only need TestId as I want to delete those older tests.
August 9, 2013 at 9:26 am
You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.
SQL will only be able to do a seek on that index if it's headed by EndDateTime.
So, you could:
1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.
OR
2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.
Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 9, 2013 at 10:39 am
ScottPletcher (8/9/2013)
You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.SQL will only be able to do a seek on that index if it's headed by EndDateTime.
So, you could:
1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.
OR
2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.
Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.
Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.
August 9, 2013 at 11:34 am
mayurkb (8/9/2013)
ScottPletcher (8/9/2013)
You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.SQL will only be able to do a seek on that index if it's headed by EndDateTime.
So, you could:
1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.
OR
2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.
Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.
Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.
And your problem is what exactly?
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable;
--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartDate DATEtime,
@EndDate DATEtime,
@Days DECIMAL(10,2), --This is still the "range"
@StartValue INT,
@EndValue INT,
@Range INT;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000, -- a million rows
@StartDate = '2010', --Inclusive
@EndDate = '2020', --Exclusive
@Days = DATEDIFF(dd,@StartDate,@EndDate),
@StartValue = 100,
@EndValue = 900,
@Range = @EndValue - @StartValue + 1;
--===== Create "random constrained" integers within
-- the parameters identified in the variables above.
SELECT
ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
StartDateTime,
EndDateTime = DATEADD(DD,ABS(CHECKSUM(NEWID()))%100,StartDateTime)
INTO #SomeTestTable
FROM (
SELECT TOP (@NumberOfRows)
StartDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
) d;
UPDATE t SET EndDateTime = NULL
FROM #SomeTestTable t
WHERE ID IN (
SELECT TOP 100000 ID FROM #SomeTestTable ORDER BY NEWID())
CREATE UNIQUE CLUSTERED INDEX ucx_ID ON #SomeTestTable (ID)
CREATE INDEX ix_StartDateTime ON #SomeTestTable (StartDateTime)
CREATE INDEX ix_EndDateTime ON #SomeTestTable (EndDateTime)
CREATE INDEX ix_StartDateTimeEndDateTime ON #SomeTestTable (StartDateTime,EndDateTime)
CREATE INDEX ix_EndDateTimeStartDateTime ON #SomeTestTable (EndDateTime,StartDateTime)
-- #########################################################################
--
-- Q1 Index scan ix_EndDateTimeStartDateTime; Duration about 60ms
DECLARE @Date DATE = '20120513'
SELECT TOP (10000) test.ID
FROM #SomeTestTable as test
WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;
go 5
--
-- Q2 Index seek ix_EndDateTime; Duration about 40ms
DECLARE @Date DATE = '20120513'
SELECT TOP (10000) test.ID
FROM #SomeTestTable as test
WHERE (test.EndDateTime < @Date)
go 5
--
-- Q3 index scan ix_EndDateTimeStartDateTime; Duration about 50ms
DECLARE @Date DATE = '20120513'
SELECT TOP (10000) test.ID
FROM #SomeTestTable as test
WHERE (test.EndDateTime < @Date) OR (test.EndDateTime IS NULL AND test.StartDateTime < @Date)
go 5
--
-- Q4 index scan ix_EndDateTimeStartDateTime; Duration too short to measure
DECLARE @Date DATE = '20120513'
SELECT TOP (500) test.ID
FROM #SomeTestTable as test
WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;
go 5
--
-- Q5 index seek ix_EndDateTime; Duration too short to measure
DECLARE @Date DATE = '20120513'
SELECT TOP (500) test.ID
FROM #SomeTestTable as test
WHERE (test.EndDateTime < @Date)
go 5
--
-- Q6 Index scan ix_EndDateTimeStartDateTime; Duration too short to measure
DECLARE @Date DATE = '20120513'
SELECT TOP (500) test.ID
FROM #SomeTestTable as test
WHERE (test.EndDateTime < @Date) OR (test.EndDateTime IS NULL AND test.StartDateTime < @Date)
go 5
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
August 9, 2013 at 12:24 pm
Hmm, interesting. SQL's still "dumber" than I thought :-).
We'll need to UNION the query results instead of using "OR".
Also, as always, we want the variable to be exactly the same data type as the column, to insure that no implicit conversions will be needed.
DECLARE @Date datetime2
SET @Date = '20120513'
SELECT TOP (500) test.ID
FROM (
-- index seek
SELECT TOP (500) test.ID
FROM #SomeTestTable as test
WHERE (test.EndDateTime < @Date)
UNION --sort and eliminate dups, since we're sorting in the outer query anyway
-- index seek
SELECT TOP (500) test.ID
FROM #SomeTestTable as test
WHERE test.EndDateTime IS NULL AND test.StartDateTime < @Date
) AS derived
ORDER BY
test.ID
Too bad SQL can't do that itself with the "OR" there, but I know it has to allow for far more complex variations where such a clean translation might not be possible.
Edit: Changed "datetime" to "datetime2" to match the OP's table defs.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply