Partitining an existing table

Question

Partitining an existing table

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 1

Ah. No I'm not going mad, just connected to the wrong instance :blush: Disregard the last post, the max id query is still taking forever.

I'll get back to you when I get the plans.

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 2

OK, well I have the plans. Not entirely sure how to include them.

The base queries:

declare @topid bigint

print getdate()

select @topid = isnull(max(NotificationSequenceID),0)

from Exchange.dbo.NotificationSequence

print cast(getdate() as varchar) + ' Max NotificationSequenceID: ' + cast(@topid as varchar)

select @topid = isnull(max(GBEOrderNotificationID),0)

from Exchange.dbo.GBEOrderNotification

print cast(getdate() as varchar) + ' Max GBEOrderNotificationID: ' + cast(@topid as varchar)

Where both tables are partitioned on the same schema (as above). Both plans are the same and involve a clustered index scan, however whereas the first returns 1 row (as expected) the second returns the full table (actual rows 1197150000). It's almost as if, when redefining the cluster key, to include the partition field, the partition field was switched to be the first field in the key - in which case you would need to scan the whole index (table). But if that's the case, then you can't efficiently partition a unique clustered table on anything other than the original unique cluster key, which can't be right. Can it? Because it's not like you can create a non-clustered index on the original PK, as you have to include the partition field again, presumably with the same result. Or am I missing something vital?

P.S. I've attached the .sqlplan file, but renamed as a .txt file to get around the upload restrictions (you'd need to rename it back, ovvs)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

pbowman (6/4/2009)
OK, well I have the plans. Not entirely sure how to include them.
The base queries:
declare @topid bigint
print getdate()
select @topid = isnull(max(NotificationSequenceID),0)
from Exchange.dbo.NotificationSequence
print cast(getdate() as varchar) + ' Max NotificationSequenceID: ' + cast(@topid as varchar)
select @topid = isnull(max(GBEOrderNotificationID),0)
from Exchange.dbo.GBEOrderNotification
print cast(getdate() as varchar) + ' Max GBEOrderNotificationID: ' + cast(@topid as varchar)
Where both tables are partitioned on the same schema (as above). Both plans are the same and involve a clustered index scan, however whereas the first returns 1 row (as expected) the second returns the full table (actual rows 1197150000). It's almost as if, when redefining the cluster key, to include the partition field, the partition field was switched to be the first field in the key - in which case you would need to scan the whole index (table). But if that's the case, then you can't efficiently partition a unique clustered table on anything other than the original unique cluster key, which can't be right. Can it? Because it's not like you can create a non-clustered index on the original PK, as you have to include the partition field again, presumably with the same result. Or am I missing something vital?
P.S. I've attached the .sqlplan file, but renamed as a .txt file to get around the upload restrictions (you'd need to rename it back, ovvs)

Try creating a NC index on GBEOrderNotificationID. I think you will find it zips along, although it is carrying 16 bytes for each row as the CI pointer. Please read documentation and web articles on partition-aligned indexes btw!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

easiest is to zip the file and attach that. it then opens with the right app. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Johan Bijnens SSC Guru Points: 135389 More actions · Answer 5

Your issue ( select max(the_clustered_pk_column) ) is the cause of your huge consumption.

It actually performs the full clustered index datapage level scan !!!!

This is an issue of SQLServer in the way the determine this max value !!

In stead of solving this using the index B-tree, they perform a full clustered index scan !!!

The other dark sides of RDBMSses (e.g. DB2 on ZOs) solve this (for years now) only using the b-tree. Solving min / max of a first column of an index only takes maximum the number of IOs that equals the number of levels in the index.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 6

So I created a non-clustered, non-unique index on the ID column, like so

USE [Exchange]

GO

CREATE NONCLUSTERED INDEX [IX_GBEOrderNotificationID] ON [dbo].[GBEOrderNotification]

(

[GBEOrderNotificationId] ASC

)WITH (STATISTICS_NORECOMPUTE = OFF

, SORT_IN_TEMPDB = OFF

, IGNORE_DUP_KEY = OFF

, DROP_EXISTING = OFF

, ONLINE = OFF

, ALLOW_ROW_LOCKS = ON

, ALLOW_PAGE_LOCKS = ON)

Now the select max(GBEOrderNotificationID) query runs in 5 mins rather than 20. However its still bringing back every row, the performance gain is simply that it doesn't have to page through the whole table while doing it. (I attach the plan). However I noticed that the details of the index scan have a property of "Ordered False". Checking back the previous plan (see above) I notice that the same Ordered False applies to the clustered index scan. Now it makes sense to me that if the optimizer sees the index as not ordered that it would have to do a full scan to get a max() value.

I see also that the previous plan for the NotificationSequence query, where the clustered index scan returns 1 row, has an Ordered True property, which also makes sense, I think. Basically I want to get a plan like...

|--Clustered Index Scan(OBJECT:([Exchange].[dbo].[NotificationSequence].[PK128]), ORDERED BACKWARD PARTITION ID:([PtnIds1006]))

whereas what I'm getting is the non-ordered version

|--Index Scan(OBJECT:([Exchange].[dbo].[GBEOrderNotification].[IX_GBEOrderNotificationID]), PARTITION ID:([PtnIds1007]))

(pardon the unintentional smilies!)

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 7

pbowman-543344

Mr or Mrs. 500

Points: 556

June 5, 2009 at 3:09 am

#1005091

sorry, I'll try that attaching the plan bit again.

Johan Bijnens SSC Guru Points: 135389 More actions · Answer 8

If you created your new non-clustering index without specifying a filegroup, you should see a parallel execution for that query, because will actually put this index on a per content base for every partition in the partition filegroup.

So your index will actually reside on every partition, in stead of a single large index in another filegroup.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 9

ALZDBA (6/5/2009)
If you created your new non-clustering index without specifying a filegroup, you should see a parallel execution for that query, because will actually put this index on a per content base for every partition in the partition filegroup.
So your index will actually reside on every partition, in stead of a single large index in another filegroup.

Agreed. But this is my intention - to have the indexes also partitioned (and aligned).

matt stockham SSCrazy Eights Points: 9942 More actions · Answer 10

Is the max value always going to be in the last partition? If so, you should be able to substantially speed up the query by using the $partition function.

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 11

Well, to be honest, given that I need to get a workaround at the moment (it being Friday afternoon an' all), I decided to go for a "how long is a piece of string" function to find the max element, like so

create function dbo.max_ON ()

returns bigint

as

begin

declare @nextid bigint, @lastid bigint, @offset bigint

select top 1 @nextid = GBEOrderNotificationId

from dbo.GBEOrderNotification

select @lastid = @nextid

select @offset = sum(p.rows)/2 - 1

from sys.partitions p

where p.object_id = object_id('GBEOrderNotification')

and p.index_id = 1

while (@offset > 0)

begin

-- Check for presence of ID value

select @nextid = isnull(GBEOrderNotificationId, 0)

from dbo.GBEOrderNotification

where GBEOrderNotificationId = @lastid + @offset

if @@rowcount = 0 set @nextid = 0

if @nextid = 0

set @offset = @offset/2

else

set @lastid = @nextid

end

return @lastid

end;

go

I'm sure that's not the most efficient way of doing it (and I should recurse to be flash), but it gets me the answer in < 3 seconds rather than 20 mins or even 5 mins, so it'll do for now.

NB I have also just partitioned another table on a date field - again this meant adding the date field into the primary cluster key, since which select max(id) also does a full table scan - this does seem to be a general effect. This last case also seems pretty standard partitioning (if all the examples in the documentation are anything to go by), so I'm still a bit surprised that there's no mention of this issue.

Using a specific $PARTITION number in the where clause does not seem to significantly improve the query time.

pbowman-543344 Mr or Mrs. 500 Points: 556 More actions · Answer 12

OK, so I found a relevant page on the Microsoft feedback forum at: Feedback: Partition Table using min/max functions and Top N - Index selection and performance.

Edited highlights:

Description: Partitioned Tables performance issues - For select statements using Min and Max functions and Top N with ordering over an index.
Poor performance is being detected for Queries on partitioned tables utilising the min and max functions and select Top N clause with "order by" matching columns of the index.
The candidate index is either not being used or is being used to scan or seek all rows rather than a subset.
[...]
Mark,
You've provided a very nice analysis that has shown several limitations in the optimizer related to queries on partitioned tables that use MIN and TOP. We're not in a position to fix these issues as a bug fix for SQL Server 2008 because they require some significant effort. We'll keep a record of these and try to improve the behavior for the next major release of SQL Server.
[...]
Eric
Posted by Microsoft on 29/01/2009 at 10:48

So, it does actually appear to be a "feature" 🙁 (even though the above is for 2008, it looks like the same issue).

Johan Bijnens SSC Guru Points: 135389 More actions · Answer 13

It cannot be sql2008 ...... the feedback registration date is 11/26/2006

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

ALZDBA (6/10/2009)
It cannot be sql2008 ...... the feedback registration date is 11/26/2006

I believe you are mistaken if you think that MS wasn't working on SQL Server 2008 in late 2006.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Johan Bijnens SSC Guru Points: 135389 More actions · Answer 15

TheSQLGuru (6/10/2009)
ALZDBA (6/10/2009)
It cannot be sql2008 ...... the feedback registration date is 11/26/2006
I believe you are mistaken if you think that MS wasn't working on SQL Server 2008 in late 2006.

You mean it takes more than one year to rewrite sqlserver ?? :hehe:

The connect thread has market "version: SQL Server 2005 SP2 standard edition" and there have indeed only been references to sql2008 ctps but they did not get confermed except for in the end (sql2008 rtm repo)

After re-reading it all .... this is exactly what I should have answered in my previous post :blush:

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me