May 10, 2010 at 3:49 pm
I'm in the process of designing a reporting databaseusing transactional replication to pull data from the production database. The production database is 600 GB+ and growing at a rate of 1 GB / day. Also, I am prhibited from making any changes to the structure of the DB itself.
Here is my dilema. 2 tables have 200 million + rows in them. 2 additional tables have over 40 million rows in them. And 8 tables have over 5 million rows. None of the tables are partitioned. Only the 200 million row tables are in a seperate file group.
For the reporting database I will be partitioning the very large tables. (Still trying to determine the best scheme to use there.)
At what point would you partition the tables? Any guidelines or best practices?
I was considering partitioning by quarter and here are the row counts I would be dealing with.
Year - Qtr - RowCount
2008 - 3 - 47,931,373
2008 - 4 - 52,067,688
2009 - 1 - 15,407,640
2009 - 2 - 18,157,897
2009 - 3 - 19,246,925
2009 - 4 - 19,465,122
2010 - 1 - 16,834,640
20 million rows per quarter is expected and the business is growing.
Thanks in advance,
Greg
Greg Roberts
May 10, 2010 at 7:07 pm
I guess my first question would be... why hasn't 2008 been pre-aggregated and stored by month instead of storing details? The same would probably hold true for 2009 but I don't know what your reporting requirements actually are.
Also, just a reminder... you have to have the Enterprise Edition (Developer's Edition is not appropriate here) in order to do table partitioning. Please confirm that you are using the Enterprise Edition of SQL Server 2005.
As a side bar, I'd likely partition by month instead of quarter. Smaller pieces make for shorter maintenance spikes and partitions actually need maintenance roll off more quickly because most activity stops after the current month on most applications IMHO.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 10, 2010 at 9:25 pm
The partition scheme is really dictated with your reporting requirements, you may want a gradual gradient for the partitions, for example monthly partitions based for the current year then quarterly there after( based on what is being accessed most frequently).
Sounds like you cannot change your database structure which isn't ideal, really you would want to spread the partitions across multiple file-groups sitting on multiple disks to get the performance increased for reporting.
May 11, 2010 at 6:30 am
... and aggregate the older data if it's static.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 11, 2010 at 9:13 am
Yes, we are running on Enterprise Edition.
The long term view is to preaggregate the data and roll it into an SSAS cube. But the business needs info now. The lumps in 2008 are the result of a data migration (before my time) where the consultant simply imported all of the previous data and used the date imported, instead of the original date.
I agree monthly may make more sense, i was just hoping to reduce the number of partitions to manage. but it's only 4 tables I would get that granular on. The other large tables (5 to 20 million rows) I'm planning on partitioning based on year.
Question: What happens if the date on the record changes? Will it move to the correct partition? Or do really bad things happen? 🙂
Greg Roberts
May 12, 2010 at 7:58 am
1) I would like for you to tell me exactly why you think you need/want partitioning, and also what you think you will gain by partitioning. Most people think they need it for the wrong reasons or for reasons that will never materialize.
2) " Also, I am prhibited from making any changes to the structure of the DB itself." That statement says that you cannot partition the tables since partitioning DOES change the structure of the database.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 12, 2010 at 8:22 am
2) I cannot change the structure of the production database. I have a nightly ETL that migrates the data from production to a reporting database.
1) I am looking at partioning the data to improve query performance and better manage storage. A table with over 200 million rows can be rather painful when a table scan is performed. Even a Clustered Index Scan is painful. By partitioning the table the Clustered Index is partitioned as well. This allows for better parrallel execution. Disk IO will still be an issue, but any improvement is good. The report developers do not always write the most efficient code. 😉 Nor do I for that matter. 😛
-G
Greg Roberts
May 12, 2010 at 8:35 am
Greg Roberts-134214 (5/12/2010)
2) I cannot change the structure of the production database. I have a nightly ETL that migrates the data from production to a reporting database.1) I am looking at partioning the data to improve query performance and better manage storage. A table with over 200 million rows can be rather painful when a table scan is performed. Even a Clustered Index Scan is painful. By partitioning the table the Clustered Index is partitioned as well. This allows for better parrallel execution. Disk IO will still be an issue, but any improvement is good. The report developers do not always write the most efficient code. 😉 Nor do I for that matter. 😛
-G
1) a clustered index scan is a table scan 🙂
2) Why do you think you will not get table scans when you query the partitioned table? If your queries don't include a filter on the partitioning key you won't get any partition elimination. If your queries DO include said filter, you can often benefit just as much by having the clustered index be on that column(s).
3) You can manage storage by using a file group as well without partitioning.
4) I don't know of a parallel execution benefit to partitioning, at least in SQL 2005. Partitioning in that version is VERY limited. It is much better in SQL 2008, but STILL not fully robust and functional. Quite a shame actually . . .
5) I don't have much information about your system or code, but I would bet a day's wages that you have MANY more important things to do to improve performance and "storage management" before you bother with partitioning a few 200M row tables. Get a perf tuning pro to spend a few days with you and your system to help guide you and you will see some tremendous opportunities fall out I bet.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 12, 2010 at 10:52 am
Filegroup Backups and Restores
When a Database has multiple filegroups you can back the database up with a Full Backup. You can also perform backups on specific File groups. You can also restore a filegroup if the need arose, without taking the database or the table offline and only affecting the data in that portion of the filegroup, if the primary filegroup containing the system tables is not effected.
Spread I/O to multiple Disks
If you place filegroups on different Disks you can spread the I/O processes further reducing possible I/O bottlenecks. If you do not have additional drives to start with, yet partition from the beginning, if an I/O bottleneck arises you can add additional disks and move a filegroup to that disk.
Filegroup Maintenance
Running scheduled maintenance, Updating Statistics, Reindexing, and Integrity Checks can cause a lot of I/O, Indexes can be partitioned just like tables. Rebuilding Indexes on specific filegroups will limit the overall impact of blocking and locking on a database. NOT ALL INDEXES CAN BE INDEXED ONLINE. If contention occurs an index will need to be reindexed offline. Partitioning Indexes will limit the over all cost of rebuilding. SQL 2005 can perform a Check DBCC Integrity check against specific FileGroups, this is a heavy cost operation but is essential to the health of Servers.
There was an error that prevented dbcc checkfilegroup in SQL 2008 prior to CU 5, in SQL 2005 this was supposed to be fixed in SP 3 forgive me i'm not working with Filegroups on SQL 2005 and I'm not sure if the is a CU that was required after sp3 to address this, it would be worth checking.
I think with a 600GB DB feeding a datawarehouse with 1GB growth per day, if you can get ahead of that train and utilize partitioning why wouldn't you want to?
Keep in mind as the data grows or needs to be archived you will need to be familar with maintaining partitioning. if you have a test environment you should test your scripts and make sure you are familar with the possible growth patterns.
As far as perfmon, open it up, add your individual drive queues into it, and watch and see what your I/O is. If it is high partitioning and placing filegroups on different drives will allow the read process to be more efficent.
But there are more reasons than just I/O to consider partitioning, Maintenance, Recovery, and possibly performance. Your current I/O will tell you if that is a bottle neck.
Keep gathering performance metrics on your prod server and it will help you to size servers better in the future.
the 70-443 chapter 1 Lesson 1 & 3 will help you with this. It is a good read and I thought it had a lot of great info
Twitter: @SQLBalls
Blog: http://www.SQLBalls.com
Channel: https://www.youtube.com/@Tales-from-the-Field
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply