June 26, 2012 at 4:31 am
Hi all
I've got a performance problem with a query that uses a comparison between two columns to pull out some data. Rather than bore you with the details of the system, I've created some SQL that mirrors the issue (acknowledgments to Itzik Ben Gan for the numbers CTEs):
CREATE TABLE #Dates
(
Number int NOT NULL PRIMARY KEY,
Date1 datetime NOT NULL,
Date2 datetime NOT NULL
)
GO
DECLARE @number_of_numbers INT
SET @number_of_numbers = 100000;
;WITH
a AS (SELECT 1 AS i UNION ALL SELECT 1),
b AS (SELECT 1 AS i FROM a AS x, a AS y),
c AS (SELECT 1 AS i FROM b AS x, b AS y),
d AS (SELECT 1 AS i FROM c AS x, c AS y),
e AS (SELECT 1 AS i FROM d AS x, d AS y),
f AS (SELECT 1 AS i FROM e AS x, e AS y),
numbers AS
(
SELECT TOP(@number_of_numbers)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
FROM f
)
INSERT INTO #Dates (Number, Date1, Date2)
SELECT number, CAST(RAND(number) * 100000 AS Datetime) AS Date1,
CAST(RAND(number+1) * 100000 AS Datetime) AS Date2
FROM numbers;
GO
-- This is the bit to optimise if possible
SELECT * FROM #Dates WHERE Date1 > Date2
GO
DROP TABLE #Dates
GO
On my box, the select statement only returns two rows, which is fairly representative of the real system. The real table also has about 100000 rows.
Try as I might with adding other indexes I can't get the select to do anything other than index scans. This does make sense, but doesn't help with the performance.
So - what does everyone else do when faced with such a query? Are there any indexing possibilities that I've missed, or is it time to look at other options, for example an indexed computed column?
Any help appreciated.
Duncan
Edit: Forgot to add - it's a SQL 2005 box, if that makes any difference.
June 26, 2012 at 5:02 am
a query with Column1 > Column2 best plan is going to be a index scan, right? it has to use the index to get all values greater than to match the query;
if you were after a single value, you might get an index seek , but i'm pretty sure that the best you can do is an index scan.
now, from there, your query might benefit with some include columns so the index is more efficient, and doesn't have to go to the leaf data to get the remaining columns.
Lowell
June 26, 2012 at 5:18 am
Lowell (6/26/2012)
a query with Column1 > Column2 best plan is going to be a index scan, right? it has to use the index to get all values greater than to match the query;if you were after a single value, you might get an index seek , but i'm pretty sure that the best you can do is an index scan.
now, from there, your query might benefit with some include columns so the index is more efficient, and doesn't have to go to the leaf data to get the remaining columns.
I'd absolutely agree with this - you will always get an index scan.
Currently the temp table has a unique clustered index on number which offers an advantage to scanning an index containing date1 and date2.
If you change the clustered index to date1, date2 then the clustered index (the table) is scanned, which is slightly more expensive.
Depending on the granularity of the difference between the two dates, you might consider a computed column; the difference between the two dates, in sensible units.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 26, 2012 at 6:04 am
Thanks both. I didn't think there would be any way to improve it much with an index alone, but it's good to get confirmation I'm not missing anything obvious. I did manage to make a smallish (perhaps 20%) improvement using a nonclustered index on the two columns and including all other columns in the query, but that was as much as I could get.
Thanks again.
June 26, 2012 at 6:06 am
Duncan Pryde (6/26/2012)
Thanks both. I didn't think there would be any way to improve it much with an index alone, but it's good to get confirmation I'm not missing anything obvious. I did manage to make a smallish (perhaps 20%) improvement using a nonclustered index on the two columns and including all other columns in the query, but that was as much as I could get.Thanks again.
Did you try a computed column? The performance lift is dramatic:
CREATE TABLE #Dates
(
Number int NOT NULL,
Date1 datetime NOT NULL,
Date2 datetime NOT NULL,
delta AS DATEDIFF(mi,Date1, Date2)
)
GO
DECLARE @number_of_numbers INT
SET @number_of_numbers = 1000000;
;WITH
a AS (SELECT 1 AS i UNION ALL SELECT 1),
b AS (SELECT 1 AS i FROM a AS x, a AS y),
c AS (SELECT 1 AS i FROM b AS x, b AS y),
d AS (SELECT 1 AS i FROM c AS x, c AS y),
e AS (SELECT 1 AS i FROM d AS x, d AS y),
f AS (SELECT 1 AS i FROM e AS x, e AS y)
INSERT INTO #Dates (Number, Date1, Date2)
SELECT
number,
CAST(RAND(number) * 100000 AS Datetime) AS Date1,
CAST(RAND(number+1) * 100000 AS Datetime) AS Date2
FROM (SELECT TOP(@number_of_numbers) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number FROM f) d
GO
-- This is the bit to optimise if possible
CREATE CLUSTERED INDEX [number] on [#Dates] (number)
CREATE INDEX [ix_delta] on [#Dates] (delta)
set statistics io,time on
SELECT delta FROM #Dates
WHERE delta < 0 --Date1 > Date2
set statistics io,time off
--------------------------------------------------
DROP TABLE #Dates
GO
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 26, 2012 at 6:39 am
ChrisM@Work (6/26/2012)
Did you try a computed column? The performance lift is dramatic:
I must confess I hadn't had a chance yet! Thanks for the feedback: the difference is, as you say, dramatic. It certainly gives me an option anyway.
Thanks again.
June 26, 2012 at 7:51 am
why isn't there just a simple index on both date columns in the order you need them ?
create index X_yourtable_dates on yourschema.yourtable (date1, date2);
might be another solution to avoid side effects for 'select *' queries suddenly showing the computed column and still provide a solution that performs well enough.
I'm just guessing your 3 column table actually holds more columns ...
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
June 26, 2012 at 8:05 am
ALZDBA (6/26/2012)
why isn't there just a simple index on both date columns in the order you need them ?
create index X_yourtable_dates on yourschema.yourtable (date1, date2);
might be another solution to avoid side effects for 'select *' queries suddenly showing the computed column and still provide a solution that performs well enough.
I'm just guessing your 3 column table actually holds more columns ...
Thanks for your reply. I tried the index you describe before posting here, with the other columns in the query as included columns. The result was a scan of the index - albeit slightly more efficient than scanning the clustered primary key (about 20% better).
The table does hold many more columns in reality, but this question was more about investigating the general principle.
You're also right in that the "select *" problem you mention is one reason why I'd have to tread quite carefully if I added the computed column.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply