Hello SSC,
I have a data warehouse question that I would like some advice with...
Like most companies we get our data from flat files. These flat files are processed and then dumped into SQL staging tables. Then, an SSIS process transforms the data and pushes it out into our production DW environment. The staging tables consist of about 250 columns. We take about 50 of those columns into our production environment into a fact table. Our VP now wants all of the columns from the staging table pushed into our production data warehouse. This is most likely going to slow everything down and cause bottlenecks in processing.
What are some ways to approach this? Can a fact table with over 250 columns still perform well?
I was considering column store indexes, or perhaps splitting the fact table into 2 or 3, but the downstream affect could be costly. We have a lot of SSIS packages and processes that point to that particular fact table. I am not sure what the correct approach to this would be. I would welcome any advice or suggestions.
As always, thank you very much in advance SSC. I hope you are all well and staying safe!
Thanks,
Dave
The are no problems, only solutions. --John Lennon
May 8, 2021 at 5:54 am
This was removed by the editor as SPAM
May 8, 2021 at 8:06 pm
250 seems like a lot of facts! Are you happy that you've got the data modelling correct?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
250 is a lot for a fact table. I had a dimension table that had 573 columns. The problems encountered were during updates due to page splits. So much of a problem that the table had to be split vertically into 2 separate tables then joined by the primary key. There is another fact table that has 151 columns this has no problems.
The important thing to check when you have a lot of columns is the row width, i.e. the number of bytes in a row. For a fact table most of the columns should be integer foreign keys and things like datetime and measurement. You should not have large text columns. So 250 columns may well be ok.
May 10, 2021 at 1:07 pm
250 is a lot for a fact table. I had a dimension table that had 573 columns. The problems encountered were during updates due to page splits. So much of a problem that the table had to be split vertically into 2 separate tables then joined by the primary key. There is another fact table that has 151 columns this has no problems.
The important thing to check when you have a lot of columns is the row width, i.e. the number of bytes in a row. For a fact table most of the columns should be integer foreign keys and things like datetime and measurement. You should not have large text columns. So 250 columns may well be ok.
Thanks, this is helpful. How about partitioning the table by year, or whatever timestamp works?
The are no problems, only solutions. --John Lennon
May 10, 2021 at 2:11 pm
Phil Parkin wrote:250 seems like a lot of facts! Are you happy that you've got the data modelling correct?
Not sure I understand what you are saying Phil.
I'm used to seeing fairly wide dimension tables, but fact tables are usually just a bunch of FKs along with a few numbers. I was wondering if there was a possibility that you have a situation where you have got facts which should be dims.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 10, 2021 at 11:11 pm
Lord Slaagh wrote:Phil Parkin wrote:250 seems like a lot of facts! Are you happy that you've got the data modelling correct?
Not sure I understand what you are saying Phil.
I'm used to seeing fairly wide dimension tables, but fact tables are usually just a bunch of FKs along with a few numbers. I was wondering if there was a possibility that you have a situation where you have got facts which should be dims.
guess it all depends on what's in play - on my shop one of the actuarial cubes has 25 dimensions (all very short - 3 to 10 columns max ) and 1 fact table with 350+ measures (these are already aggregated values) plus a few more calculated values within the cube itself
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply