March 18, 2009 at 2:53 pm
I am creating a process that will insert records into a table if the records don't exist, or update a date if the records match. This table will initially be around a billion records and hover around that amount. We obviously need to do some kind of partitioning on this table. Either a column partition (first digit of zip) or table partitioning by some date. Any advice on performance, pros, cons, etc...
This will piggy back on a similiar process that currently takes 20 to 30 minutes to process. I don't want to add a considerable amount of time to the current process.
March 19, 2009 at 8:07 am
stevoid1970 (3/18/2009)
... Either a column partition (first digit of zip) or table partitioning by some date. Any advice on performance, pros, cons, etc...
This question can not be answered without some knowledge about the data and how it will be queried.
Lets say you have an order table with 10 years of historical data.
If you query the data with date conditions I would say partition into months. If you have 100 years of data maybe you will want to partition into years to ease the maintenance. Or maybe into financial year?
If you have a contact table and you query with geographical conditions (zip,phone,city,country) you will want to partition your table on that.
The idea behind partitioning is to limit the size of your dataset right from the start so we need to know what are the most commonly applied filters.
Let me know if you need any help setting this up.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply