divide data into partitions

  • I have a table with 100.000.000 records of statistics which is growing with 10.000.000 records a month. These statistics have a IDProject which i want to use to partition the table. The problem i have is that there are a few big projects and many small ones, so i want to create a partition for each big project, and 1 partition for all small projects.

    For examle:

    IDProject 500, 800 and 1234 are big projects, and all the others are small.

    I created the folowing partition function:

    CREATE PARTITION FUNCTION f_PartitionStatistics(int)

    AS RANGE LEFT FOR VALUES (500, 800, 1234)

    but when i do this, all statistics of project 0 - 500 will go to the first partition, 501 to 800 go to the second, etc. But what i realy want is that 0-499, 501-799, 1235-.. go to 1 partition.

    How can i do this? I thought of some construction where i use an extra column in the statistictable called partitionNr. Before i put the data in the table (from a portal table) i find out to which partition it should go, and then fill in the partitionNr.

  • Here is a good resource for Table Partitioning

    http://www.sqljunkies.com/article/f4920050-6c63-4109-93ff-c2b7eb0a5835.scuk

  • Once you have the data in the table, how are you accessing it? Is it always by Project, or do you typically look at it by date as well? You could change your partitioning key to date and create a parition for each month's worth of data.

    Here's a great resource for SQL Server 2005 partitioning:

    http://msdn.microsoft.com/en-us/library/ms345146.aspx

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • We access the data always by idproject, but also by date. I know partitioning by date is common and even in my case a good option. But the majority of the query will be on de data of the last 3 months and on the 3 big projects. So when the 3 customers of the 3 big projects use the data at the same time (and they will!) i get the best performanceboost if the all have their own partition, so they won't be in eachothers way.

    I don't it is possible to partition on idproject AND date, right ?? (because data older then 4 months is much less used)

  • See if this blog post about horizontal partitioning helps you

    http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Horizontal-partitioning.aspx

    You can do the same with your data.

    Have a composite key with a new column named "HighProfileProject" which is 1 for project 500, 800 and 1234, and the date.

    Or have a Status column which includes all you want.

    Status = 0 means Project is 500, 800 or 1234, and date is less than three months.

    Status = 1 means Project is 500, 800 or 1234, and date is older than three months.

    Status = 2 means Project is other than 500, 800 and 1234, regardless of date

    Now create a partition scheme and function for those three statuses.


    N 56°04'39.16"
    E 12°55'05.25"

  • Does Peso's suggestion not mean that every day/week/month/quarter a job has to be run to shuffle data into a different time-based partition?

    Will the passage of time not change the evaluation of the status flag?

  • When you update the status, the SQL Server automatically rearrange the records to the correct table/partition.

    See the execution plans in the link above.


    N 56°04'39.16"
    E 12°55'05.25"

  • John Rowan (6/11/2008)


    Here's a great resource for SQL Server 2005 partitioning:

    http://msdn.microsoft.com/en-us/library/ms345146.aspx

    I read the whole document, but i still have 2 questions:

    1. My table is already filled with data, is it best to create a new table with partitioning & partitioned indexes and then replace all the data into this table, or can i use the next statement:

    Drop index IX_Statistics on Statistics with (Move To [ps_Partition] (IDProject))

    I tried this, and it seems to work, but i'm not sure the whole table is now partitioned or only the clusterd index is.

    2. If i use Peso's idea to use some kind of status (which is what i was thinking of in the first place, so i like his idea), will the execution plan use only those partitions that are needed when i only query at date and idproject. Or do i need to use the status in the query (where date = '01012008' and idproject=800 and status=3)?

  • For performance, yes. But SQL Server takes care of the filtering for you.

    See this picture. This is the execution plan when having Status = 2 in the WHERE statement.

    A nice Index seek on the only tables that needs to be touched.

    But when you don't supply Status = 2 in the WHERE clause, this is how the execution plans looks like.

    These are the popups. You will still get an index seek for every table, but this is a relatively fast operation, especially if you make a clustered index over projectid.


    N 56°04'39.16"
    E 12°55'05.25"

  • k, thnx!

    but in your example you are using viewpartitioning like the blog you posted : http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Horizontal-partitioning.aspx

    When i use the SQL2005 feature of tablepartioting, this wil be the same i pressume ?

  • Since you seems to have the Enterprise Edition, go for the

    CREATE PARTITION FUNCTION over your partitionNr column,

    and have a job that runs nightly to update the partitionNr column according to projectid and date.


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso (6/12/2008)


    Since you seems to have the Enterprise Edition, go for the

    CREATE PARTITION FUNCTION over your partitionNr column,

    and have a job that runs nightly to update the partitionNr column according to projectid and date.

    I don't have Enterprise yet (only on the development db), but we are trying to convince the boss to switch to Enterprise since our database is growing very fast and it is becoming more and more difficult to manage.

    But if we don't succeed in convincing the boss i will try the view method, but ofcourse the partitionfunction is definitly prefered.

  • Wow, you guys got an early start today. I'll echo everything Peso has posted. Very sound advice as usual Peter. Thanks for joining in.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply