Partitioned View DEFINITELY not working like one

  • I've followed the table and column rules from BOL to a T.  I've verified that I can insert into my partitioned view and the inserted row is sent to the proper underlying table.  It looks and smells like a partitioned view, but when I make a call to it, and explicitly reference the partitioning column in the where clause (ie: WHERE partcol = @someval) it STILL LOOKS AT EVERY UNDERLYING TABLE!

    Someone mentioned in another string that the tables are "touched" for schema validation and what not, but that the touch wouldn't amount to any significant overhead.  I'm not seeing this.  In my estimated execution plan, I'm getting 1-2% overhead on all of 60+ tables that comprise the partitioned view when the WHERE clause should be pointing it to only 6 of those 60. 

    Here's a quick piece of code for you to run to see what I mean.

    create table ul1 (

    mon int not null,

    tue int not null,

    sumpin varchar(32),

    constraint [pk_ul1] primary key

     (mon, tue),

    constraint [ck_ul1] check (mon = 1)

    )

    go

    create table ul2 (

    mon int not null,

    tue int not null,

    sumpin varchar(32),

    constraint [pk_ul2] primary key

     (mon, tue),

    constraint [ck_ul2] check (mon = 2)

    )

    go

    create table ul3 (

    mon int not null,

    tue int not null,

    sumpin varchar(32),

    constraint [pk_ul3] primary key

     (mon, tue),

    constraint [ck_ul3] check (mon = 3)

    )

    go

    create view vw_ul as

    select mon, tue, sumpin from ul1

    union all

    select mon, tue, sumpin from ul2

    union all

    select mon, tue, sumpin from ul3

    go

    insert into vw_ul (mon, tue, sumpin) values (1, 1, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (1, 2, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (1, 3, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (2, 1, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (2, 2, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (2, 3, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (3, 7, 'asdf')

    go

    insert into vw_ul (mon, tue, sumpin) values (3, 9, 'asdf')

    go

    select * from vw_ul where mon = 2 -- ctrl+L for execution plan

    -- you'll see that each of the underlyers accounts for 33% of the cost of the query.  why wouldn't it be 100, 0, 0?

    -------------CLEAN UP:------------------------

    drop view vw_ul

    go

    drop table ul1

    go

    drop table ul2

    go

    drop table ul3

    go

     

    I thought I had this problem licked yesterday, but at second glance, my testing was flawed and today I'm right back where I started which is using a partitioned view that's acting like a regular old crappy view.

    Thanks, Greg

  • Hi Greg,

    Looks like this partitioned view is going to be the death of you .

    I've take your sample code and created the tables and views as posted but I cannot see the same problem you're referring to. The execution plan I've got has 100% cost on ul2.

    Admittedly I'm running this test on a SQL Server 2005 so there might be a difference there. I'll test on a SQL 2000 installation when I'm home later on.

    I remember Steve mentioned that it's worth freeing up the proc cache - could be that you've got a bad execution plan cached. Also, try and actually run the query and get the actual execution plan rather than the estimated plan - I've seen the estimated one be wrong before.

    I'll let you know if I get the same results on my SQL 2000 machine. It might be worth testing this on another machine (if you haven't already) to see if you're getting exactly the same results.

  • I just realized that if I used a hard coded value in my "WHERE part_col" = XXX clause, the query plan comes out PERFECT, but when I use a variable that was set earlier in the code, it looks at all tables.

    You've got to be joking.  I'm now expected to dynamically generate code so that I can hard code a value???!!!  UNGH! 

    I'm going to start looking at some actual query plans instead of estimaters and see if they're different.

     

  • Actual execution plans showing 1-2% on all 60 underlyers as well. 

  • dbcc freeproccache did nothing but cause SQL to take longer to come up with the same crappy plan.

    was that how I was supposed to clean the plan cache?

  • Greg,

    When I run this, I see what you're seeing - 33% for each table.

    HOWEVER - I think this is simply because of the very low volumes of data involved. If you insert more rows into each table, you should see that the relative cost of ul2 starts to increase. It's also important to run UPDATE STATISTICS on each table, otherwise the optimizer can't predict correctly.

    What you should also notice is that although all 3 tables are touched (using an index seek), the number of matching records for ul1 and ul3 remains at 1. The number of matching records from ul2, however, increases.

    This is how partitioned views are supposed to behave, in my experience. The index seeks on ul1 and ul3 are essential in order to confirm that the search argument (WHERE mon = 2) is invalid for those tables (something that takes a fraction of a second when using an index seek). When the same index seek is used on ul2, it detects that the index range matches the search argument, and all matching records are returned.

  • I see the same thing on SQL 2000. Interstingly, I still see only one 100% cost, touching ul2 on SQL 2005.

    Upgrade to 2005?

    Philip might have a point about the low number of records. The optimizer might just decide that it's quicker to go through all tables than figure out which table the data is in. Or where you seeing this problem with your full, live dataset?

    What about the times? Forget the execution plan for a second.

    Yesterday you noted that the times were slower when selecting from the view. Until you updated statistics. Are the times looking good now? What I'm trying to determine is if what you're seeing in the execution plan is a red herring. It might just look that something's not right because it's touching all the tables but then it could just be the way that the optimizer works, as Philip indicated.

    But if the performance isn't affected then it's probably nothing to worry about.

  • Thanks for the input Philip, and I considered the fact that the low number of rows could be causing the 33, 33, 33, but I'm seeing a congruous behavior in my production environment where there are about 5 million records in each underlying table.

    I HAVE been able to get the partitioned view to work as I'd like it, but only by hardcoding the partition columns desired value in the WHERE clause.

    IOW:

    WHERE partcol = 65 AND...  -- works perfectly and SQL only considers ONLY the 1 appropriate underlying table

    WHERE partcol = @current_fiscal_id AND... -- where @current_fiscal_id was decided on earlier in the code, works as though the view is not partitioned and all 60 underlying tables waste 1-2% of the query's time... this is coming from the ACTUAL execution plan.

    Anyone?  Anyone?  Bueller?

    btw, dynamically generating the select in order to get an effective hardcoded value is not showing improvements and I can't figure out how to accurately see what it's doing in the execution plan.  The time it's taking to yield results indicates that it's not working either.

  • I'll check the times now SQLZ and post them presently.

     

  • The partitioned view is used as the source of a temp table population.

    I ran the code as-is... FROM the partitioned view, WHERE we filter on the partitioning column using a BETWEEN operator.  It took about 19 minutes.

    I then created a view that would look at only the 6 underlying tables that the BETWEEN clause should have isolated.  When I ran the insert, it took about 7 minutes.

    NO RED HERRINGS.  The partitioned view is DEFINITELY not working properly.

    Going back to the requirements for a partitioned view...  How can I check the ANSI_PADDING setting for my underlying tables? 

  • Greg,

    to check ansi_padding setting use the databasepropertyex function. ansi_padding is set at the database level. I suspect that it must already be on because I'm pretty sure that without it on you wouldn't have been able to create the view (worth double-checking though).

    select databasepropertyex('mydatabase','IsAnsiPaddingEnabled')

    I'll have another look at this tomorrow when I've got some more time. Curiously, on SQL 2000 I get the 33% split across the three tables irrespective of whether I use the hard-coded literal value in the WHERE clause or a parameter.

    I'll keep you posted.

  • Greg,

    If you write a query like this:

    declare @var int

    set @var=2

    select * from vw_ul where mon = @var

    then yes, the optimizer will ASSUME that each table will contribute equally to the overall result set. This is because at the time it generates the query plan (which is BEFORE the query actually executes), @var actually has no value. As a result, the optimizer does not (cannot) know which of the tables is going to hold the relevent rows for the search argument "...WHERE mon = @var". If you hard-code the value, then the optimizer knows at compile time what the search argument expression will be, and can generate a more accurate plan.

    There's no such thing as an "actual" execution plan - an execution plan is just that, a plan based on best information available at the time. The execution plan displayed at the end of a just-run query will be exactly the same as that generated without actually running the query.

    If you want the optimizer to estimate @var accurately, you need to put your code in a stored procedure, something like this:

    create proc philproc (@var int)

    as

    select * from vw_ul where mon = @var

    go

    exec philproc @var=2

    If you compare the two query plans for each approach, you should see:

    In-line - Estimated rowcount is equal for each table

    StoredProc - Estimated rowcount = 1 for ul1 and ul3, higher for ul2.

    If you run the two forms of the query having first issued a SET STATISTICS IO ON command, you should see IDENTICAL output. In other words, although the estimated plan appears more pessimistic for the in-line local variable version, in reality it makes no difference. This is because the query plan is an ESTIMATE only, and can be wrong. So long as the estimate isn't so badly out that it forces the optimizer to choose a totally inappropriate plan (and in this case it doesn't), then there will be no adverse effect, even though the apparent costs are wrong.

  • OK, here's a wrap up.  I think I was expecting a lot more out of the partitioned views, so I kept on trying to find reasons why they weren't working.  I'm of the opinion now that they are in fact working.

    Taking a tip from Philip above, I tried the STATISTICS IO trick.  Here is what I found:

     

    SET STATISTICS IO ON

    go

    dbcc freeproccache

    go

    dbcc dropcleanbuffers

    go

    sp_refreshview vw_fact_revenue_gregtest

    go

    declare @a int

    set @a = 17

    select max(cust_id)

    from vw_fact_revenue_gregtest

    where fiscal_period_id = @a

    Table 'FACT_REVENUE_200206'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

    Table 'FACT_REVENUE_200205'. Scan count 1, logical reads 245232, physical reads 0, read-ahead reads 245249.

    Table 'FACT_REVENUE_200204'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

    Table 'FACT_REVENUE_200203'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

    Table 'FACT_REVENUE_200202'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

    Table 'FACT_REVENUE_200201'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

    dbcc freeproccache

    go

    dbcc dropcleanbuffers

    go

    sp_refreshview vw_fact_revenue_gregtest

    go

    select max(cust_id)

    from vw_fact_revenue_gregtest

    where fiscal_period_id = 17

    Table 'FACT_REVENUE_200205'. Scan count 1, logical reads 245232, physical reads 0, read-ahead reads 245249.

    (no other tables mentioned for the second query)

    BOTH QUERIES RETURNED AN ANSWER IN THE SAME RUN TIME OF :37!!!  That shows me that even though the execution plan (both est. and act.) says it's spending an even amount of time in each of the 6 underlyers, in actuality, it's not.  THANKS FOR THIS TIP PHILIP.

    This doesn't solve the performance problem, but aside from a few anomalies (that will be splinters in my brain for sure...) I can rest assured that I'm at least using the partitioned views "correctly". 

    Hope this string helps others in the future and I'm looking forward to seeing how these partitioned views perform when we upgrade to 2005.

    thanks all.

  • Thanks for the follow up Greg and I'm glad you've come to a position of at least understanding what's going on - despite a few splinters .

    This will certainly be very useful stuff for others.

  • One final, final postscript Greg - when you upgrade to 2005, I strongly recommend you replace the partitioned views with partitioned tables. Microsoft specifically recommend this course of action, and they do have additional benefits, such as being able to place the different partitions on specific filegroups for a single table, thus allowing you to selectively back up or restore parts of a single table.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply