June 11, 2008 at 1:38 am
I have a table with 100.000.000 records of statistics which is growing with 10.000.000 records a month. These statistics have a IDProject which i want to use to partition the table. The problem i have is that there are a few big projects and many small ones, so i want to create a partition for each big project, and 1 partition for all small projects.
For examle:
IDProject 500, 800 and 1234 are big projects, and all the others are small.
I created the folowing partition function:
CREATE PARTITION FUNCTION f_PartitionStatistics(int)
AS RANGE LEFT FOR VALUES (500, 800, 1234)
but when i do this, all statistics of project 0 - 500 will go to the first partition, 501 to 800 go to the second, etc. But what i realy want is that 0-499, 501-799, 1235-.. go to 1 partition.
How can i do this? I thought of some construction where i use an extra column in the statistictable called partitionNr. Before i put the data in the table (from a portal table) i find out to which partition it should go, and then fill in the partitionNr.
June 11, 2008 at 3:49 pm
Here is a good resource for Table Partitioning
http://www.sqljunkies.com/article/f4920050-6c63-4109-93ff-c2b7eb0a5835.scuk
June 11, 2008 at 4:37 pm
Once you have the data in the table, how are you accessing it? Is it always by Project, or do you typically look at it by date as well? You could change your partitioning key to date and create a parition for each month's worth of data.
Here's a great resource for SQL Server 2005 partitioning:
June 11, 2008 at 11:57 pm
We access the data always by idproject, but also by date. I know partitioning by date is common and even in my case a good option. But the majority of the query will be on de data of the last 3 months and on the 3 big projects. So when the 3 customers of the 3 big projects use the data at the same time (and they will!) i get the best performanceboost if the all have their own partition, so they won't be in eachothers way.
I don't it is possible to partition on idproject AND date, right ?? (because data older then 4 months is much less used)
June 12, 2008 at 1:24 am
See if this blog post about horizontal partitioning helps you
http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Horizontal-partitioning.aspx
You can do the same with your data.
Have a composite key with a new column named "HighProfileProject" which is 1 for project 500, 800 and 1234, and the date.
Or have a Status column which includes all you want.
Status = 0 means Project is 500, 800 or 1234, and date is less than three months.
Status = 1 means Project is 500, 800 or 1234, and date is older than three months.
Status = 2 means Project is other than 500, 800 and 1234, regardless of date
Now create a partition scheme and function for those three statuses.
N 56°04'39.16"
E 12°55'05.25"
June 12, 2008 at 3:48 am
Does Peso's suggestion not mean that every day/week/month/quarter a job has to be run to shuffle data into a different time-based partition?
Will the passage of time not change the evaluation of the status flag?
June 12, 2008 at 4:17 am
When you update the status, the SQL Server automatically rearrange the records to the correct table/partition.
See the execution plans in the link above.
N 56°04'39.16"
E 12°55'05.25"
June 12, 2008 at 6:30 am
John Rowan (6/11/2008)
Here's a great resource for SQL Server 2005 partitioning:
I read the whole document, but i still have 2 questions:
1. My table is already filled with data, is it best to create a new table with partitioning & partitioned indexes and then replace all the data into this table, or can i use the next statement:
Drop index IX_Statistics on Statistics with (Move To [ps_Partition] (IDProject))
I tried this, and it seems to work, but i'm not sure the whole table is now partitioned or only the clusterd index is.
2. If i use Peso's idea to use some kind of status (which is what i was thinking of in the first place, so i like his idea), will the execution plan use only those partitions that are needed when i only query at date and idproject. Or do i need to use the status in the query (where date = '01012008' and idproject=800 and status=3)?
June 12, 2008 at 7:12 am
For performance, yes. But SQL Server takes care of the filtering for you.
See this picture. This is the execution plan when having Status = 2 in the WHERE statement.
A nice Index seek on the only tables that needs to be touched.
But when you don't supply Status = 2 in the WHERE clause, this is how the execution plans looks like.
These are the popups. You will still get an index seek for every table, but this is a relatively fast operation, especially if you make a clustered index over projectid.
N 56°04'39.16"
E 12°55'05.25"
June 12, 2008 at 7:23 am
k, thnx!
but in your example you are using viewpartitioning like the blog you posted : http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Horizontal-partitioning.aspx
When i use the SQL2005 feature of tablepartioting, this wil be the same i pressume ?
June 12, 2008 at 7:28 am
Since you seems to have the Enterprise Edition, go for the
CREATE PARTITION FUNCTION over your partitionNr column,
and have a job that runs nightly to update the partitionNr column according to projectid and date.
N 56°04'39.16"
E 12°55'05.25"
June 12, 2008 at 7:39 am
Peso (6/12/2008)
Since you seems to have the Enterprise Edition, go for theCREATE PARTITION FUNCTION over your partitionNr column,
and have a job that runs nightly to update the partitionNr column according to projectid and date.
I don't have Enterprise yet (only on the development db), but we are trying to convince the boss to switch to Enterprise since our database is growing very fast and it is becoming more and more difficult to manage.
But if we don't succeed in convincing the boss i will try the view method, but ofcourse the partitionfunction is definitly prefered.
June 12, 2008 at 9:24 am
Wow, you guys got an early start today. I'll echo everything Peso has posted. Very sound advice as usual Peter. Thanks for joining in.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply