Is there a rule of thumb for number of rows in the partition for a very large table

  • Hi Team,

    We have a 1.4 billion record table which will grow to about 4 billion in the next year or so.

    It is a Fact table, mostly with Keys and metrics (both integer values)

    We are planning to partition it on Date_Key.

    The question is "How many rows should be in each partition"?

    Some say that you should have ~ 30 million records in each partition, regardless of whether it is Fact table with all the integers or a Flat table with mostly strings (which would be much much larger).

    My understanding of how partitioning is utilized is limited.

    Could someone please shed some light on whether 30 million row (and not how much storage differently sized 30 million rows are taking) is the rule of thumb and why that is the case?

    Or is there some other rule of thumb?

    Thank you in advance!

  • Why are you partitioning it? What is the goal?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Good question 🙂

    We are going to generate reports on a monthly basis by joining this Fact table to Dim_Date on a key.

    Each month of data has about 100 million records.

    The initial plan was to partition by month (so that when we generate a report only the month's data we need will be scanned), but it will mean each partition will have ~ 100 million records, and we are told that we should not exceed 30 million records per partition.

  • Ok, so what's the goal of the partitioning? Fast load? Parallel load? Index rebuilds at the partition level? Partition switching for archive/delete of old data?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The main goal at the moment is to ensure that when a monthly report goes out we can get the data out fast.

    Instead of scanning 1.4 - 4 billion records, we'd like to be scanning only 100 million records and do so in parallel.

  • sql_er (11/14/2014)


    The main goal at the moment is to ensure that when a monthly report goes out we can get the data out fast.

    Then stop wasting your time looking at partitioning.

    https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers

    Instead of scanning 1.4 - 4 billion records, we'd like to be scanning only 100 million records and do so in parallel.

    If you're scanning the entire table, then either your indexing is inadequate or your queries can't use indexes, and unless you've got maxdop set to 1, a query against that many rows is going to be running in parallel.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/14/2014)


    sql_er (11/14/2014)


    The main goal at the moment is to ensure that when a monthly report goes out we can get the data out fast.

    Then stop wasting your time looking at partitioning.

    https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers

    Instead of scanning 1.4 - 4 billion records, we'd like to be scanning only 100 million records and do so in parallel.

    If you're scanning the entire table, then either your indexing is inadequate or your queries can't use indexes, and unless you've got maxdop set to 1, a query against that many rows is going to be running in parallel.

    +1000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We don't have any indexes on this Fact table.

    Sounds like we need to have a conversation with our DBA 🙂

  • sql_er (11/14/2014)


    We don't have any indexes on this Fact table.

    Sounds like we need to have a conversation with our DBA 🙂

    Oh my, YES! That's absolutely incredible! I don't believe I've ever heard of a table with no indexes that large before!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply