April 3, 2008 at 3:21 am
Hi,
I'm working on a BI project to store surveys over the last 10 years. We've got a very large fact table. This fact table contains all question/answer combination per survey, per respondent. It's about 4,5 10^9 rows and the estimated size is about 150GB (without indexes). Has anyone any idea how we should 'organize' this? I guess we have to use partitioning, but are there other possibilities.
As for an example: 1 survey contains about 250 questions and over 250.000 people has responded. And there are dozens of surveys :ermm:
Thanks in advance
Jos
April 3, 2008 at 4:06 am
I'd use partitioning on a huge table, where there is a time or other criteria on which a big part of data is rarely used/modified as it speeds up a bit queries on new data, but slows queries on full data and backups are smaller.
Big tables are problem if you do a full backup, if you want delete significant part of it,...
April 5, 2008 at 6:56 am
Robert,
Thanks for your answer. I'm going to try to use partitioning with the help of a dba.
April 5, 2008 at 10:55 am
You might want to post the CREATE statement for your fact table and maybe 10 or so rows of data (see the URL in my signature line for the right way to do that so we can help you better). Although I agree that partionining may be the answer, it might not depending on what your queries will look like. Might wanna post one of those, as well.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 6, 2008 at 7:50 am
Jeff,
I've posted a CREATE script for the facttable with a few records. I can't post any queries because there an no queries yet. In the future the customer wants to analyse the data by asking a question like: "Show me for a daterange (time not included) for a certain survey the answers.
I'm wondering if I should use partitioning by date or by survey?
Another question: Should I use a primary key?
Thanks in advance
Jos
April 6, 2008 at 9:30 am
jos.haemers (4/6/2008)
Jeff,I've posted a CREATE script for the facttable with a few records. I can't post any queries because there an no queries yet. In the future the customer wants to analyse the data by asking a question like: "Show me for a daterange (time not included) for a certain survey the answers.
I'm wondering if I should use partitioning by date or by survey?
Another question: Should I use a primary key?
Thanks in advance
Jos
Thanks for the file. Hmmm... if that the case, here's a couple of things that I might start with...
First, yes, you not only should use a Primary Key, I believe it will be necessary for you to successfully do partitioning. Identities are ok for this but when you split the table, you'll need to add a range constraint to the PK column in order to get the partioning to work. Spend some time looking at the partioning methods in Books Online. They'll guide you as to how to pull of partioning with no guess work.
Second, I think you have too many indexes for such a large collection of data... wait until you have a couple of the more commonly used queries designed and ready to go and the figure out what would be the most optimal composite indexes to support them. You might even be able to get away with just 1 or 2 depending, of course, on the queries.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 6, 2008 at 1:05 pm
- seems like only fk-in the fact ??
- I've read this nice art regarding a technique often used in questionairs...
"Introduction to Bitmasking in SQL Server 2005"
http://www.sqlservercentral.com/articles/Miscellaneous/2748/
Maybe that can open some chanlenges ...
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply