March 12, 2012 at 11:59 pm
I have seen and use a DATE table for a db that I use for weather reporting
The date table is typical of these and this code will generate an example of it with data from 2007 to 2015 (I have kept it simple to 2 columns for this msg)
--drop table Dates2
go
CREATE TABLE [dbo].[Dates2](
[DateValue] [datetime] NOT NULL,
[day] [int] NULL,
CONSTRAINT [PK_Dates2] PRIMARY KEY CLUSTERED
(
[DateValue] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
go
declare @CurrentDate datetime = '2007-1-1'
while (@CurrentDate < '2015-1-1')
begin
insert into Dates2(DateValue, [day])
values (@CurrentDate, datepart(day, @CurrentDate))
set @CurrentDate= dateadd(day, 1, @CurrentDate)
end
go
UPDATE STATISTICS Dates2 with fullscan, all
go
DBCC SHOW_STATISTICS (Dates2, PK_Dates2) with histogram
go
Now my question is that I know that there is exactly 1 record for each day. Also the statistics that are generated although random intervals are used, are accurate for each period.
So for any period I know that say between Jan 1, 2012 and Jan 21, 2012 there are exactly 21 records
select * from dates2 where datevalue between '2012-01-01' and '2012-01-21'
If you turn on Actual Query plan and hover over the actual index seek the estimated rows will be correct or very close to it (20.5446)
But do the same for this query which has variables.
declare @then DateTime
declare @Today DateTime
set @Today = dateadd(dd,datediff(dd,0, getdate()),0)
set @then = dateadd(day, -6, @Today)
select @today
select DateAdd(day, -6, @Today)
select * from dates2 where datevalue between '2012-02-22' and '2012-03-11'
select * from dates2 where datevalue between DateAdd(day, -6, @Today) and @Today
select * from dates2 where datevalue between @then and @Today
select * from dates2 where datevalue between '2012-03-07' and '2012-03-13'
The hard coded values "for me" are returning an estimted row count of 20.5446 but the ones with the variables are returning an estimated row count of 262.98? More than 10 times as many.
I am using SQL Server 2012 RTM for this test (but tested with SQL 2008 SP1 and got similar results)
This makes all my joins using this table have statistics that look wrong and updating, dropping, freeing proc cache and restarting sql returns the same data.
Any cluses why the variables which is probably how most people query tables are so different?
Chris
March 15, 2012 at 6:43 am
Hello,
I've test your data and I've the same results as you on my SS2k8 R2 RTM.
I can easily explain the estimated ~20 rows.
For the 268 estimated rows it seems that the engine is taking 2 steps (or two times the same) from the histogram to compute the estimated size which it can't resolved a precise estimate at first and then fallback on the configured default value of around 30% of the rows which gives ~268. (When looking at the histogram, it clearly shows that all rows between steps are unique which make me wonder why the engine seems to use two steps to resolve the estimate row size, maybe checking into the XML plan will give you more details.)
Only when I'm taking this into account I can reproduce the plan estimated rows size.
But I would need to go much deeper and I can't right now.
Hope that will shed some light for you.
March 15, 2012 at 7:36 am
This: http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/
The use of variables is not all that common. More common would be parameters, which behave similarly to constants when it comes to row estimations
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 15, 2012 at 10:18 am
Thank you Gila,
I will need to update that part of the sniffing issue...
March 15, 2012 at 12:27 pm
GilaMonster (3/15/2012)
This: http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/The use of variables is not all that common. More common would be parameters, which behave similarly to constants when it comes to row estimations
Thanks for the link - that explains it to me quite well. I have never used Query Hints and always thought recompile was bad.
Without the recompile hint
declare @Today DateTime
SET @Today = dbo.F_START_OF_DAY(GETDATE())
SELECT Dates.DateValue as Date, isnull(WeatherDailyStats.RainFall,0) as Rainfall
FROM Dates LEFT OUTER JOIN
WeatherDailyStats ON Dates.DateValue = WeatherDailyStats.Date
WHERE (Dates.DateValue between DATEADD(day, - 6, @Today) and GETDATE())
and this produced this query plan
and with the recompile hint
declare @Today DateTime
SET @Today = dbo.F_START_OF_DAY(GETDATE())
SELECT Dates.DateValue as Date, isnull(WeatherDailyStats.RainFall,0) as Rainfall
FROM Dates LEFT OUTER JOIN
WeatherDailyStats ON Dates.DateValue = WeatherDailyStats.Date
WHERE (Dates.DateValue between DATEADD(day, - 6, @Today) and GETDATE())
option ( recompile)
As you can see quite a different plan is generated - and running them both and comparing the % of each plan
First Query
Table 'WeatherDailyStats'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dates'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Elapsed time = 32 ms.
Second Query
Table 'WeatherDailyStats'. Scan count 0, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dates'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Elapsed time = 44ms.
I assume the second is slower since there is more logical reads - so technically the first query is better even though the stats are not correct and it is more complex 74% of the batch.
If I wss running this query 1000's of times I would go with the first option.
chris
March 15, 2012 at 2:52 pm
4 or 14 logical reads are comparable, and they're too close in time to make any broad conclusions on which is better. At that speed, the measuring inaccuracies are significant.
Personally I'd use neither and make sure my query (which I assume will be in a procedure) uses the parameters that are passed in, not variables assigned within the procedure (if possible)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply