Partition Elimination with Varaiable in Query

Question

Partition Elimination with Varaiable in Query

rlo-700454

SSC-Addicted

Points: 440
More actions
August 6, 2008 at 3:01 pm

#190725

I am using partitioned views with Standard Edition 2005. When I code a query filtering on the partition key (ClientID), I get partition elimination as expected. But when I use a variable to filter on the partition key, whether in an ad-hoc query or stored procedure, all tables in the partitioned view are hit. I've looked all over and found no info on this. I find it hard to believe that the only way you can get partition elimination is from hard-coded queries or stored procs that generate dynamic SQL.
Anyone have an idea?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

It's not clear what you mean (ie., what distinction you are drawing). Please provide some examples.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

rlo-700454 SSC-Addicted Points: 440 More actions · Answer 2

The partitioned view (pvF_Call_Level1) unions several tables that have ClientID as the column with the check constraint.

The queries:

Without variable--works. Query plan shows 1 table accessed

select dateID, count(*)

from dw.dbo.pvF_Call_Level1

where dateID between 20080508 and 20080509

and ClientID = 4

group by DateID

With variable--doesn't work. Query plan shows all tables accessed

declare @ClientID integer;

set @ClientID = 4;

select dateID, count(*)

from dw.dbo.pvF_Call_Level1

where dateID between 20080508 and 20080509

and ClientID = @ClientID

group by DateID

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 3

Its more likely because of the parameterization of the query. You should generally try to avoid using a local variable in a query predicate. The optimizer doesn't do a very good job of creating good quality estimates for queries like that. Take a look at the article below, towards the bottom to see why...

http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx

Basically the gist of it is, SQL Server produces better cardinality estimates when dealing with a string literal at run time of a query. For example...

declare @StartOrderDate datetime

set @StartOrderDate = '20040731'

select * from Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderId

AND h.OrderDate >= @StartOrderDate

Forces SQL Server to produce less than favorable cardinality estimates than this...

SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderId

AND h.OrderDate >= '20040731'

A possible resolution to the issue at the cost of recompiling the query plan every time its run is this...

declare @StartOrderDate datetime

set @StartOrderDate = '20040731'

SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderId

AND h.OrderDate >= @StartOrderDate

OPTION(RECOMPILE)

The cost of recompiling the query plan will probably be a whole lot less than letting SQL Server choose a sub-optimal plan with bad cardinality when dealing with large tables.

Now you may be thinking "Well that's just wonderful! Gee whiz, how do I handle variables at execution time?" Well fortunately we do have a few ways to deal with it. According to the article...

1) rewriting the query to use literals directly in queries instead of assigning them to variables and then using the variables in the queries

2) using sp_executesql or API server cursors with parameters that replace your use of local variables

3) using a stored procedure with parameters that replace your use of local variables

Also for stored procedures, you should generally avoid modification of parameter values within the procedure body. For example...

CREATE PROCEDURE GetRecentSales (@date datetime) AS

BEGIN

IF @date IS NULL

SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)

FROM Sales.SalesOrderHeader))

SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderID

AND h.OrderDate > @date

END

As written above, this procedure will produce rather poor cardinality estimates due to parameter sniffing. Optionally you can re-code the procedure above above like so...

CREATE PROCEDURE GetRecentSales (@date datetime) AS

BEGIN

IF @date IS NULL

SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)

FROM Sales.SalesOrderHeader))

EXEC GetRecentSalesHelper @date

END

CREATE PROCEDURE GetRecentSalesHelper (@date datetime) WITH RECOMPILE AS

BEGIN

SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderID

AND h.OrderDate > @date -- @date is unchanged from compile time,

-- so a good plan is obtained.

END

Or you can try to assign the parameter value to an internal variable within the procedure to avoid parameter sniffing as well.

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 4

Oh, and I almost forgot. You probably have clustered indexes on those tables that follow a natural ascending order (i.e. datetime field or IDENTITY). In this case, when you update the stats on those clustered indexes either by rebuilding it or issuing an UPDATE STATISTICS statement, the samples are created in the physical order of the table. This means any new values will lie outside of the statistics histogram. More than likely you'll have to run UPDATE STATISTICS statements a bit more frequently to be able to sample the new values. AUTO UPDATE STATISTICS doesn't always do a very good job of this, since the formula to update the stats is based on a percentage (500 + 20% of total column changes). 20% of a huge table can be quite a bit, and chances are that the stats won't be flagged to update when you have a "large" insert into the table. Many times the default sampling rate will suffice, but sometimes you'll have to supply a larger scan density.

rlo-700454 SSC-Addicted Points: 440 More actions · Answer 5

Thanks, but option 1 is no good for us because we need to use stored procedures (your option 3). And that option results in the same problem. Using RECOMPILE in a sproc or query does not help. Option 2 (dynamic SQL) works fine, but is harder to develop and maintain.

1) rewriting the query to use literals directly in queries instead of assigning them to variables and then using the variables in the queries

2) using sp_executesql or API server cursors with parameters that replace your use of local variables

3) using a stored procedure with parameters that replace your use of local variables

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 6

Yep. There's no easy answers really.

Just out of curiosity, why are you using a partitioned view instead of a partitioned table with a partition scheme and function? I find that SQL Server does a little bit better of a job creating predictable query plans with the latter than the former.

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 7

Also with the stored procedure did you try...

CREATE PROCEDURE xxx @date datetime AS

DECLARE @xdate datetime

SET @xdate = @date

SELECT yyy FROM zzz WHERE aaa = @xdate

I mentioned that as the last line of my post above but failed miserably at providing an example.

rlo-700454 SSC-Addicted Points: 440 More actions · Answer 8

Yes, I tried that first. It really should work. I'm not using Enterprise because my company won't buy it. It looks like we're stuck with dynaimc SQL.

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 9

Wow, that sucks. Well, hopefully you can use this scenario to help change their minds. Seems to me like an excellent business case for the upgrade.

You could go grab the developer edition and load it using their data in a partitioned table. That would make an excellent demonstration that would potentially open a few eyes.

The cost for maintaining code with dynamic SQL and the cost of poor query plan execution in terms of waits seems like it would be much higher if you calculate the time spent by developers to code around the issue, dba time spent tuning queries and managing additional tables (index maintenance, stats, storage, etc), as well as time spent by the users waiting on poor query plans.

Anyhow, good luck! I sure hope you can find the answer!