How to handle a fact table with over 250 columns

  • Hello SSC,

    I have a data warehouse question that I would like some advice with...

    Like most companies we get our data from flat files. These flat files are processed and then dumped into SQL staging tables. Then, an SSIS process transforms the data and pushes it out into our production DW environment. The staging tables consist of about 250 columns. We take about 50 of those columns into our production environment into a fact table. Our VP now wants all of the columns  from the staging table pushed into our production data warehouse. This is most likely going to slow everything down and cause bottlenecks in processing.

    What are some ways to approach this? Can a fact table with over 250 columns still perform well?

    I was considering column store indexes, or perhaps splitting the fact table into 2 or 3, but the downstream affect could be costly. We have a lot of SSIS packages and processes that point to that particular fact table. I am not sure what the correct approach to this would be. I would welcome any advice or suggestions.

    As always, thank you very much in advance SSC. I hope you are all well and staying safe!

    Thanks,

    Dave

    The are no problems, only solutions. --John Lennon

  • This was removed by the editor as SPAM

  • 250 seems like a lot of facts! Are you happy that you've got the data modelling correct?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • 250 is a lot for a fact table. I had a dimension table that had 573 columns. The problems encountered were during updates due to page splits. So much of a problem that the table had to be split vertically into 2 separate tables then joined by the primary key. There is another fact table that has 151 columns this has no problems.

    The important thing to check when you have a lot of columns is the row width, i.e. the number of bytes in a row. For a fact table most of the columns should be integer foreign keys and things like datetime and measurement. You should not have large text columns. So 250 columns may well be ok.

  • Phil Parkin wrote:

    250 seems like a lot of facts! Are you happy that you've got the data modelling correct?

    Not sure I understand what you are saying Phil.

    The are no problems, only solutions. --John Lennon

  • Jonathan AC Roberts wrote:

    250 is a lot for a fact table. I had a dimension table that had 573 columns. The problems encountered were during updates due to page splits. So much of a problem that the table had to be split vertically into 2 separate tables then joined by the primary key. There is another fact table that has 151 columns this has no problems.

    The important thing to check when you have a lot of columns is the row width, i.e. the number of bytes in a row. For a fact table most of the columns should be integer foreign keys and things like datetime and measurement. You should not have large text columns. So 250 columns may well be ok.

    Thanks, this is helpful. How about partitioning the table by year, or whatever timestamp works?

    The are no problems, only solutions. --John Lennon

  • Lord Slaagh wrote:

    Phil Parkin wrote:

    250 seems like a lot of facts! Are you happy that you've got the data modelling correct?

    Not sure I understand what you are saying Phil.

    I'm used to seeing fairly wide dimension tables, but fact tables are usually just a bunch of FKs along with a few numbers. I was wondering if there was a possibility that you have a situation where you have got facts which should be dims.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin wrote:

    Lord Slaagh wrote:

    Phil Parkin wrote:

    250 seems like a lot of facts! Are you happy that you've got the data modelling correct?

    Not sure I understand what you are saying Phil.

    I'm used to seeing fairly wide dimension tables, but fact tables are usually just a bunch of FKs along with a few numbers. I was wondering if there was a possibility that you have a situation where you have got facts which should be dims.

    guess it all depends on what's in play - on my shop one of the actuarial cubes has 25 dimensions (all very short - 3 to 10 columns max ) and 1 fact table with 350+ measures (these are already aggregated values) plus a few more calculated values within the cube itself

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply