Partitining an existing table

  • Ah. No I'm not going mad, just connected to the wrong instance :blush: Disregard the last post, the max id query is still taking forever.

    I'll get back to you when I get the plans.

  • OK, well I have the plans. Not entirely sure how to include them.

    The base queries:

    declare @topid bigint

    print getdate()

    select @topid = isnull(max(NotificationSequenceID),0)

    from Exchange.dbo.NotificationSequence

    print cast(getdate() as varchar) + ' Max NotificationSequenceID: ' + cast(@topid as varchar)

    select @topid = isnull(max(GBEOrderNotificationID),0)

    from Exchange.dbo.GBEOrderNotification

    print cast(getdate() as varchar) + ' Max GBEOrderNotificationID: ' + cast(@topid as varchar)

    Where both tables are partitioned on the same schema (as above). Both plans are the same and involve a clustered index scan, however whereas the first returns 1 row (as expected) the second returns the full table (actual rows 1197150000). It's almost as if, when redefining the cluster key, to include the partition field, the partition field was switched to be the first field in the key - in which case you would need to scan the whole index (table). But if that's the case, then you can't efficiently partition a unique clustered table on anything other than the original unique cluster key, which can't be right. Can it? Because it's not like you can create a non-clustered index on the original PK, as you have to include the partition field again, presumably with the same result. Or am I missing something vital?

    P.S. I've attached the .sqlplan file, but renamed as a .txt file to get around the upload restrictions (you'd need to rename it back, ovvs)

  • pbowman (6/4/2009)


    OK, well I have the plans. Not entirely sure how to include them.

    The base queries:

    declare @topid bigint

    print getdate()

    select @topid = isnull(max(NotificationSequenceID),0)

    from Exchange.dbo.NotificationSequence

    print cast(getdate() as varchar) + ' Max NotificationSequenceID: ' + cast(@topid as varchar)

    select @topid = isnull(max(GBEOrderNotificationID),0)

    from Exchange.dbo.GBEOrderNotification

    print cast(getdate() as varchar) + ' Max GBEOrderNotificationID: ' + cast(@topid as varchar)

    Where both tables are partitioned on the same schema (as above). Both plans are the same and involve a clustered index scan, however whereas the first returns 1 row (as expected) the second returns the full table (actual rows 1197150000). It's almost as if, when redefining the cluster key, to include the partition field, the partition field was switched to be the first field in the key - in which case you would need to scan the whole index (table). But if that's the case, then you can't efficiently partition a unique clustered table on anything other than the original unique cluster key, which can't be right. Can it? Because it's not like you can create a non-clustered index on the original PK, as you have to include the partition field again, presumably with the same result. Or am I missing something vital?

    P.S. I've attached the .sqlplan file, but renamed as a .txt file to get around the upload restrictions (you'd need to rename it back, ovvs)

    Try creating a NC index on GBEOrderNotificationID. I think you will find it zips along, although it is carrying 16 bytes for each row as the CI pointer. Please read documentation and web articles on partition-aligned indexes btw!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • easiest is to zip the file and attach that. it then opens with the right app. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Your issue ( select max(the_clustered_pk_column) ) is the cause of your huge consumption.

    It actually performs the full clustered index datapage level scan !!!!

    This is an issue of SQLServer in the way the determine this max value !!

    In stead of solving this using the index B-tree, they perform a full clustered index scan !!!

    The other dark sides of RDBMSses (e.g. DB2 on ZOs) solve this (for years now) only using the b-tree. Solving min / max of a first column of an index only takes maximum the number of IOs that equals the number of levels in the index.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • So I created a non-clustered, non-unique index on the ID column, like so

    USE [Exchange]

    GO

    CREATE NONCLUSTERED INDEX [IX_GBEOrderNotificationID] ON [dbo].[GBEOrderNotification]

    (

    [GBEOrderNotificationId] ASC

    )WITH (STATISTICS_NORECOMPUTE = OFF

    , SORT_IN_TEMPDB = OFF

    , IGNORE_DUP_KEY = OFF

    , DROP_EXISTING = OFF

    , ONLINE = OFF

    , ALLOW_ROW_LOCKS = ON

    , ALLOW_PAGE_LOCKS = ON)

    Now the select max(GBEOrderNotificationID) query runs in 5 mins rather than 20. However its still bringing back every row, the performance gain is simply that it doesn't have to page through the whole table while doing it. (I attach the plan). However I noticed that the details of the index scan have a property of "Ordered False". Checking back the previous plan (see above) I notice that the same Ordered False applies to the clustered index scan. Now it makes sense to me that if the optimizer sees the index as not ordered that it would have to do a full scan to get a max() value.

    I see also that the previous plan for the NotificationSequence query, where the clustered index scan returns 1 row, has an Ordered True property, which also makes sense, I think. Basically I want to get a plan like...

    |--Clustered Index Scan(OBJECT:([Exchange].[dbo].[NotificationSequence].[PK128]), ORDERED BACKWARD PARTITION ID:([PtnIds1006]))

    whereas what I'm getting is the non-ordered version

    |--Index Scan(OBJECT:([Exchange].[dbo].[GBEOrderNotification].[IX_GBEOrderNotificationID]), PARTITION ID:([PtnIds1007]))

    (pardon the unintentional smilies!)

  • sorry, I'll try that attaching the plan bit again.

  • If you created your new non-clustering index without specifying a filegroup, you should see a parallel execution for that query, because will actually put this index on a per content base for every partition in the partition filegroup.

    So your index will actually reside on every partition, in stead of a single large index in another filegroup.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (6/5/2009)


    If you created your new non-clustering index without specifying a filegroup, you should see a parallel execution for that query, because will actually put this index on a per content base for every partition in the partition filegroup.

    So your index will actually reside on every partition, in stead of a single large index in another filegroup.

    Agreed. But this is my intention - to have the indexes also partitioned (and aligned).

  • Is the max value always going to be in the last partition? If so, you should be able to substantially speed up the query by using the $partition function.

  • Well, to be honest, given that I need to get a workaround at the moment (it being Friday afternoon an' all), I decided to go for a "how long is a piece of string" function to find the max element, like so

    create function dbo.max_ON ()

    returns bigint

    as

    begin

    declare @nextid bigint, @lastid bigint, @offset bigint

    select top 1 @nextid = GBEOrderNotificationId

    from dbo.GBEOrderNotification

    select @lastid = @nextid

    select @offset = sum(p.rows)/2 - 1

    from sys.partitions p

    where p.object_id = object_id('GBEOrderNotification')

    and p.index_id = 1

    while (@offset > 0)

    begin

    -- Check for presence of ID value

    select @nextid = isnull(GBEOrderNotificationId, 0)

    from dbo.GBEOrderNotification

    where GBEOrderNotificationId = @lastid + @offset

    if @@rowcount = 0 set @nextid = 0

    if @nextid = 0

    set @offset = @offset/2

    else

    set @lastid = @nextid

    end

    return @lastid

    end;

    go

    I'm sure that's not the most efficient way of doing it (and I should recurse to be flash), but it gets me the answer in < 3 seconds rather than 20 mins or even 5 mins, so it'll do for now.

    NB I have also just partitioned another table on a date field - again this meant adding the date field into the primary cluster key, since which select max(id) also does a full table scan - this does seem to be a general effect. This last case also seems pretty standard partitioning (if all the examples in the documentation are anything to go by), so I'm still a bit surprised that there's no mention of this issue.

    Using a specific $PARTITION number in the where clause does not seem to significantly improve the query time.

  • OK, so I found a relevant page on the Microsoft feedback forum at: Feedback: Partition Table using min/max functions and Top N - Index selection and performance.

    Edited highlights:

    Description: Partitioned Tables performance issues - For select statements using Min and Max functions and Top N with ordering over an index.

    Poor performance is being detected for Queries on partitioned tables utilising the min and max functions and select Top N clause with "order by" matching columns of the index.

    The candidate index is either not being used or is being used to scan or seek all rows rather than a subset.

    [...]

    Mark,

    You've provided a very nice analysis that has shown several limitations in the optimizer related to queries on partitioned tables that use MIN and TOP. We're not in a position to fix these issues as a bug fix for SQL Server 2008 because they require some significant effort. We'll keep a record of these and try to improve the behavior for the next major release of SQL Server.

    [...]

    Eric

    Posted by Microsoft on 29/01/2009 at 10:48

    So, it does actually appear to be a "feature" 🙁 (even though the above is for 2008, it looks like the same issue).

  • It cannot be sql2008 ...... the feedback registration date is 11/26/2006

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (6/10/2009)


    It cannot be sql2008 ...... the feedback registration date is 11/26/2006

    I believe you are mistaken if you think that MS wasn't working on SQL Server 2008 in late 2006.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/10/2009)


    ALZDBA (6/10/2009)


    It cannot be sql2008 ...... the feedback registration date is 11/26/2006

    I believe you are mistaken if you think that MS wasn't working on SQL Server 2008 in late 2006.

    You mean it takes more than one year to rewrite sqlserver ?? :hehe:

    The connect thread has market "version: SQL Server 2005 SP2 standard edition" and there have indeed only been references to sql2008 ctps but they did not get confermed except for in the end (sql2008 rtm repo)

    After re-reading it all .... this is exactly what I should have answered in my previous post :blush:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply