August 6, 2008 at 3:01 pm
I am using partitioned views with Standard Edition 2005. When I code a query filtering on the partition key (ClientID), I get partition elimination as expected. But when I use a variable to filter on the partition key, whether in an ad-hoc query or stored procedure, all tables in the partitioned view are hit. I've looked all over and found no info on this. I find it hard to believe that the only way you can get partition elimination is from hard-coded queries or stored procs that generate dynamic SQL.
Anyone have an idea?
August 6, 2008 at 3:38 pm
It's not clear what you mean (ie., what distinction you are drawing). Please provide some examples.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 6, 2008 at 3:49 pm
The partitioned view (pvF_Call_Level1) unions several tables that have ClientID as the column with the check constraint.
The queries:
Without variable--works. Query plan shows 1 table accessed
select dateID, count(*)
from dw.dbo.pvF_Call_Level1
where dateID between 20080508 and 20080509
and ClientID = 4
group by DateID
With variable--doesn't work. Query plan shows all tables accessed
declare @ClientID integer;
set @ClientID = 4;
select dateID, count(*)
from dw.dbo.pvF_Call_Level1
where dateID between 20080508 and 20080509
and ClientID = @ClientID
group by DateID
August 7, 2008 at 9:28 am
Its more likely because of the parameterization of the query. You should generally try to avoid using a local variable in a query predicate. The optimizer doesn't do a very good job of creating good quality estimates for queries like that. Take a look at the article below, towards the bottom to see why...
http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx
Basically the gist of it is, SQL Server produces better cardinality estimates when dealing with a string literal at run time of a query. For example...
declare @StartOrderDate datetime
set @StartOrderDate = '20040731'
select * from Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderId
AND h.OrderDate >= @StartOrderDate
Forces SQL Server to produce less than favorable cardinality estimates than this...
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderId
AND h.OrderDate >= '20040731'
A possible resolution to the issue at the cost of recompiling the query plan every time its run is this...
declare @StartOrderDate datetime
set @StartOrderDate = '20040731'
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderId
AND h.OrderDate >= @StartOrderDate
OPTION(RECOMPILE)
The cost of recompiling the query plan will probably be a whole lot less than letting SQL Server choose a sub-optimal plan with bad cardinality when dealing with large tables.
Now you may be thinking "Well that's just wonderful! Gee whiz, how do I handle variables at execution time?" Well fortunately we do have a few ways to deal with it. According to the article...
1) rewriting the query to use literals directly in queries instead of assigning them to variables and then using the variables in the queries
2) using sp_executesql or API server cursors with parameters that replace your use of local variables
3) using a stored procedure with parameters that replace your use of local variables
Also for stored procedures, you should generally avoid modification of parameter values within the procedure body. For example...
CREATE PROCEDURE GetRecentSales (@date datetime) AS
BEGIN
IF @date IS NULL
SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)
FROM Sales.SalesOrderHeader))
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date
END
As written above, this procedure will produce rather poor cardinality estimates due to parameter sniffing. Optionally you can re-code the procedure above above like so...
CREATE PROCEDURE GetRecentSales (@date datetime) AS
BEGIN
IF @date IS NULL
SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)
FROM Sales.SalesOrderHeader))
EXEC GetRecentSalesHelper @date
END
CREATE PROCEDURE GetRecentSalesHelper (@date datetime) WITH RECOMPILE AS
BEGIN
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date -- @date is unchanged from compile time,
-- so a good plan is obtained.
END
Or you can try to assign the parameter value to an internal variable within the procedure to avoid parameter sniffing as well.
August 7, 2008 at 9:49 am
Oh, and I almost forgot. You probably have clustered indexes on those tables that follow a natural ascending order (i.e. datetime field or IDENTITY). In this case, when you update the stats on those clustered indexes either by rebuilding it or issuing an UPDATE STATISTICS statement, the samples are created in the physical order of the table. This means any new values will lie outside of the statistics histogram. More than likely you'll have to run UPDATE STATISTICS statements a bit more frequently to be able to sample the new values. AUTO UPDATE STATISTICS doesn't always do a very good job of this, since the formula to update the stats is based on a percentage (500 + 20% of total column changes). 20% of a huge table can be quite a bit, and chances are that the stats won't be flagged to update when you have a "large" insert into the table. Many times the default sampling rate will suffice, but sometimes you'll have to supply a larger scan density.
August 7, 2008 at 9:49 am
Thanks, but option 1 is no good for us because we need to use stored procedures (your option 3). And that option results in the same problem. Using RECOMPILE in a sproc or query does not help. Option 2 (dynamic SQL) works fine, but is harder to develop and maintain.
1) rewriting the query to use literals directly in queries instead of assigning them to variables and then using the variables in the queries
2) using sp_executesql or API server cursors with parameters that replace your use of local variables
3) using a stored procedure with parameters that replace your use of local variables
August 7, 2008 at 9:55 am
Yep. There's no easy answers really.
Just out of curiosity, why are you using a partitioned view instead of a partitioned table with a partition scheme and function? I find that SQL Server does a little bit better of a job creating predictable query plans with the latter than the former.
August 7, 2008 at 10:05 am
Also with the stored procedure did you try...
CREATE PROCEDURE xxx @date datetime AS
DECLARE @xdate datetime
SET @xdate = @date
SELECT yyy FROM zzz WHERE aaa = @xdate
I mentioned that as the last line of my post above but failed miserably at providing an example.
August 7, 2008 at 10:25 am
Yes, I tried that first. It really should work. I'm not using Enterprise because my company won't buy it. It looks like we're stuck with dynaimc SQL.
August 7, 2008 at 10:42 am
Wow, that sucks. Well, hopefully you can use this scenario to help change their minds. Seems to me like an excellent business case for the upgrade.
You could go grab the developer edition and load it using their data in a partitioned table. That would make an excellent demonstration that would potentially open a few eyes.
The cost for maintaining code with dynamic SQL and the cost of poor query plan execution in terms of waits seems like it would be much higher if you calculate the time spent by developers to code around the issue, dba time spent tuning queries and managing additional tables (index maintenance, stats, storage, etc), as well as time spent by the users waiting on poor query plans.
Anyhow, good luck! I sure hope you can find the answer!
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply