June 10, 2004 at 1:16 pm
We are looking at building a datawarehouse using SQL Server 2000. Having never set up a data warehouse before, I am a little unsure as to how much storage space to initial ask for. I was wondering what a good rule of thumb would be in sizing the database given the data is currently in text files.
By are initial investigation, we are looking at around 2 terabytes of text data in fixed-width text files.
Currently, we are quessing the database will be about 2 to 2.5 times the size of the text data.
June 14, 2004 at 8:00 am
This was removed by the editor as SPAM
June 14, 2004 at 8:58 am
There are a lot of variables you will have to take into consideration when making a guestimate like this. Things like
-how much initial data do you have
-how long are you planning for before you increase your storage space
-realistically, how much extra data will you need to store over that period (growth rates per day/ month/ year?)
-will you be the sole keeper of any data (if its financial you may not want to archive it for at least 7 years)
-at what periods will you want to archive, to free up space (if at all)
-you will also have to allow a % for new projects (they will come along when people relies the benefits of your Datawarehouse)
-give yourself a moderate buffer
As most people will notice I haven’t even touched backing up/ mirroring and storage architecture. That’s another kettle of fish.
I’m sure there are lots of other variables, and if people want to add to the list I would like to see them so I can apply them in future DW birthing projects.
June 14, 2004 at 9:19 am
Thanks for the reply.
Are initial planning is for a rolling 2 years worth of data. Our users say that after two years, the data is meaningless for them to use in thier risk models. This is where the 2 terabytes of fixed-width test data comes from. The data would be loaded monthly. We are expecting an average of 15 to 20 % growth rate from year to year of the data.
We are not worried about backups and mirroring. We are planning to back up the monthly data just before it is to be loaded into the data warehouse. If the data warehouse crashes, we will rebuild the warehouse from the backups of the monthly load data.
We will be using SAS to mine and analyze the data. The SAS server will be on another server seperate from the data server which is seperate from the loading server. Each of these server will have a Gigabyte network connection between them to help move the data between the servers.
If you need any other info, please let me know.
Dave Novak
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply