July 23, 2013 at 5:20 pm
My company is in the process of designing a new DW and from speaking to the developers there is most likely going to be one fact table that could become quite large. This fact table is going to have to hold history data going back about 9 years so I'm considering using partitioning to help maintain the data.
According to the developers this fact table will more often than not only query data from the current month, so I'm thinking of having a partition that holds data for the current month, one for the current year and one for the history (I may add more than this).
I've read that it is best practice to have a free partition at each end of the scale (I was hoping someone could clarify the reasons for this) so I'll try and incorporate this. I just wondered if what I described is the best way to go. My plan would be to switch out data from the current year partition into the history partition at the end of each financial year, and also switch out data for the current month into the year partition.
Does this sound like I'm on the right track? I haven't implemented partitioning before so I'm open to any suggestions or gotchas that I may come across.
July 23, 2013 at 6:59 pm
I haven't implemented partitioning before so I'm open to any suggestions or gotchas that I may come across.
That says it all right there. I have come across FAR too many clients and forum posters who have tried to implement partitioning in a large scale system and it has NEVER worked out well. This is a complex subsystem layered on top of a complex undertaking. You REALLY need to get a professional on board to help you understand what is possible and what is not and help you get a good design in place, implement, and set up proper monitoring and maintenance. Each phase could be a separate, short engagement which will keep the cost down. It is my strong belief that the odds of success are low if you just read a few blogs, books online, ask some forum posts and slap something together.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 23, 2013 at 7:08 pm
I'm by no means looking to "slap" something together. I believe using partitioning will help us going forward so I'm just looking to make sure I do it in the right way. In an ideal world I'd sit next to a partitioning guru to go through this but I'm not going to be able to do that.
I'll be playing around with this a lot in a development environment to make sure anything we consider implementing will work.
I just wanted to make sure what I mentioned sounded like the right thing to be doing.
July 23, 2013 at 8:09 pm
Let me ask what you think partitioning will provide to "help you going forward". Most who start down this path start off with misguided ideas about what they expect to get from partitioning.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 23, 2013 at 8:26 pm
Ok no problem.
We have an old MIS system that this new DW is looking to replace. In that system, they may have a table such as Policy which they will split into something like 3 separate tables - Policy, Policy_MTD and policy_YTD.
One of my aims is to get away from this so older data could be switched to a partition on a separate FG that can reside on a separate spindle. We have the ability to put current data on SSDs so I would also look to have a FG that contains that data for faster access.
I'm aware that partitioning isn't primarily a performance enhancer but I have seen examples where SQL Server will only look at one partition for the data it requires (if data was partitioned by date for example). I'm hoping this may be something that could also benefit us.
These are the some of the things that I'm looking to use it for, but I guess the main focus is on speeding up the archiving of data which down the road will become quite large. I'm also hoping we can speed up the loading of data if we switch in data from a staging table.
July 24, 2013 at 2:06 am
Chris-475469 (7/23/2013)
I'm aware that partitioning isn't primarily a performance enhancer but I have seen examples where SQL Server will only look at one partition for the data it requires (if data was partitioned by date for example). I'm hoping this may be something that could also benefit us.
Partitioning is not about performance. If you're looking for performance gains, wrong place.
These are the some of the things that I'm looking to use it for, but I guess the main focus is on speeding up the archiving of data which down the road will become quite large. I'm also hoping we can speed up the loading of data if we switch in data from a staging table.
Now that's what partitioning is for.
As for design, find out what you archive based on. Do you archive last month's data, all data older than 3 months, older than a year, etc. That guides your partition strategy because to benefit from fast switching of data in/out of a table, you move entire partitions around, hence you need to ensure that when you load/archive, you're doing it to an entire partition, not part of one.
https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 24, 2013 at 2:25 am
Thanks for your reply Gail.
As I said previously, I'm aware partitioning isn't for performance. I just know that there is a possibility it MAY help with queries. My main focus is helping to manage the data in these larger fact tables.
I just wanted to know that, assuming the fact table would be archived by date, would have something like 3 partitions (Current month, current year, history) sound like the right thing to do? That way at the end of a financial year for example, I'd be able to switch out data from the current year partition into the history partition.
I also wanted a little clarification around having the two empty partitions (one at each end) as best practice and what this would help with.
July 24, 2013 at 2:32 am
Chris-475469 (7/24/2013)
As I said previously, I'm aware partitioning isn't for performance. I just know that there is a possibility it MAY help with queries.
If you go and read that article, you'll see that it probably won't and may hinder performance.
I just wanted to know that, assuming the fact table would be archived by date, would have something like 3 partitions (Current month, current year, history) sound like the right thing to do? That way at the end of a financial year for example, I'd be able to switch out data from the current year partition into the history partition.
No, that's not how you'd typically do it. Typically you'd have it much the way described in that article, a partition per month and a history table (that's partitioned itself but can be a lower granularity or even just a single partition, it just needs to be partitioned to be used as a target for a partition switch)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 24, 2013 at 2:34 am
Ok thanks.
I have read your article before so I'll re-visit it and go from there.
July 24, 2013 at 2:51 am
Chris-475469 (7/24/2013)
I just wanted to know that, assuming the fact table would be archived by date, would have something like 3 partitions (Current month, current year, history) sound like the right thing to do? That way at the end of a financial year for example, I'd be able to switch out data from the current year partition into the history partition.
I will disagree somewhat with Gail and say, yes, partitioning can be about performance. But in that case there is a price to pay.
The partitioning you suggest could help performance, if you put the three partitioning on disk with different performance characteristics, and your budget does not permit FusionIO across the board.
But in that case, once you a month, you will need to physically move data around, which you are likely to find painful. Then again, those older partitions could be on read-only filegroups, permitting you to reduce backup times.
I like to butt in that for this partioning scenarions, partitioned views may be a better option than partitioned tables.
Another option is to partition on a guid and then put partitions on different filegroups that are on different I/O channels. This can also improve performance.
Beware however that what I have discussed here are advanced partition scenarios. The bread-and-butter scenario for partitioning is when you want to age out old data quickly. That is, if you think that the fact table will never have more than 24 months per data, you have one partition per month, and getting rid of a month worth of data will be a snap.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
July 24, 2013 at 3:39 am
Erland Sommarskog (7/24/2013)
I will disagree somewhat with Gail and say, yes, partitioning can be about performance. But in that case there is a price to pay.
It can be (which I say in that article), however that takes careful query design, probably different drives for the partitions, a workload that is IO bottlenecked and queries that read across the partitions to take advantage of parallel reads from different drives and parallelism across the partitions.
It doesn't come automatically just by partitioning a table, which is what a lot of people seem to expect.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply