Is there a rule of thumb for number of rows in the partition for a very large table

Question

Is there a rule of thumb for number of rows in the partition for a very large table

sql_er

SSCarpal Tunnel

Points: 4135
More actions
November 13, 2014 at 9:25 am

#313018

Hi Team,
We have a 1.4 billion record table which will grow to about 4 billion in the next year or so.
It is a Fact table, mostly with Keys and metrics (both integer values)
We are planning to partition it on Date_Key.
The question is "How many rows should be in each partition"?
Some say that you should have ~ 30 million records in each partition, regardless of whether it is Fact table with all the integers or a Flat table with mostly strings (which would be much much larger).
My understanding of how partitioning is utilized is limited.
Could someone please shed some light on whether 30 million row (and not how much storage differently sized 30 million rows are taking) is the rule of thumb and why that is the case?
Or is there some other rule of thumb?
Thank you in advance!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Why are you partitioning it? What is the goal?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sql_er SSCarpal Tunnel Points: 4135 More actions · Answer 2

Good question 🙂

We are going to generate reports on a monthly basis by joining this Fact table to Dim_Date on a key.

Each month of data has about 100 million records.

The initial plan was to partition by month (so that when we generate a report only the month's data we need will be scanned), but it will mean each partition will have ~ 100 million records, and we are told that we should not exceed 30 million records per partition.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Ok, so what's the goal of the partitioning? Fast load? Parallel load? Index rebuilds at the partition level? Partition switching for archive/delete of old data?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sql_er SSCarpal Tunnel Points: 4135 More actions · Answer 4

The main goal at the moment is to ensure that when a monthly report goes out we can get the data out fast.

Instead of scanning 1.4 - 4 billion records, we'd like to be scanning only 100 million records and do so in parallel.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

sql_er (11/14/2014)
The main goal at the moment is to ensure that when a monthly report goes out we can get the data out fast.

Then stop wasting your time looking at partitioning.

https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers

Instead of scanning 1.4 - 4 billion records, we'd like to be scanning only 100 million records and do so in parallel.

If you're scanning the entire table, then either your indexing is inadequate or your queries can't use indexes, and unless you've got maxdop set to 1, a query against that many rows is going to be running in parallel.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 6

GilaMonster (11/14/2014)
sql_er (11/14/2014)
The main goal at the moment is to ensure that when a monthly report goes out we can get the data out fast.
Then stop wasting your time looking at partitioning.
https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers
Instead of scanning 1.4 - 4 billion records, we'd like to be scanning only 100 million records and do so in parallel.
If you're scanning the entire table, then either your indexing is inadequate or your queries can't use indexes, and unless you've got maxdop set to 1, a query against that many rows is going to be running in parallel.

+1000

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sql_er SSCarpal Tunnel Points: 4135 More actions · Answer 7

We don't have any indexes on this Fact table.

Sounds like we need to have a conversation with our DBA 🙂

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 8

sql_er (11/14/2014)
We don't have any indexes on this Fact table.
Sounds like we need to have a conversation with our DBA 🙂

Oh my, YES! That's absolutely incredible! I don't believe I've ever heard of a table with no indexes that large before!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)