December 6, 2011 at 3:00 pm
Hey everyone, i'm considering doing a partitioning structure. The database is a data mart that holds historical customer profile snapshots including the current one. Everynight a process will be ran to check to see if the customer's profile changed that day, and if so it will update this datamart setting the old profile as historical, and inserting the new profile into the table. Current data will be about 80-90% of the reads.
I'm having difficulty trying to figure out a good partitioning key. First obvious choice is some type of bit or char that has two distinct values - one for current record, and one for historical record. However that would create two disproportionate partitions. I don't think that would be the most effective use of the partition scheme. My next thought would be some type of value that has a unique value for current, then a set of numbers (e.g 0-9) that can be used as a mod function on the year. I want to make the partitioning key no larger than 1 byte to keep our indexes as small as possible - so that leaves either char(1), binary(1) or tinyint. So the partitioning set would look something like {Current, 0,1,2,3,4,5,6,7,8,9}.
Any thoughts? I would love to hear from anyone who's encountered something similar.
December 6, 2011 at 3:13 pm
Why are you trying to partition the data?
December 6, 2011 at 3:21 pm
The dba wants to use 2 different disks - 1 for current data and one for historical data. So I'd either have to partition or create two separate tables.
December 6, 2011 at 3:29 pm
This doesn't sound like a very good plan if you are just going to update the partition column, since it would force the data to move physically form one disk to the other every time you make an update to the partition column. What would be the advantage?
December 6, 2011 at 3:35 pm
Michael Valentine Jones (12/6/2011)
This doesn't sound like a very good plan if you are just going to update the partition column, since it would force the data to move physically form one disk to the other every time you make an update to the partition column. What would be the advantage?
One drive is a lot bigger than the other. The other is a lot faster. I believe they are using SSD.
One way or another data will have to physically be moved, so this isn't a dealbreaker.
December 6, 2011 at 3:39 pm
How big is the existing table? Are we talking about a table with billions of rows of data?
December 6, 2011 at 3:54 pm
Lynn Pettis (12/6/2011)
How big is the existing table? Are we talking about a table with billions of rows of data?
There's two tables, one is the main profile data (contains basic information, e.g. gender, dob, zipcode, etc), then there is a bridge table that stores the preferences made in the user profile. If you can imagine a webpage with about 60-70 checkboxes, this table stores those selected checkboxes for each user.
The Main Profile table - 17 million rows, and about 1 GB
The Preference bridge table - 270 million rows, and 8 GB of data
These are for the "current" tables. We estimate 2x-4x as much data to be generated per year in the historical tables. However this will be the first time we store historical data, so we have no real world data to go off of.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply