Tables on seperate filegroups.

Question

Tables on seperate filegroups.

senthil kumar d

Old Hand

Points: 395
More actions
July 30, 2014 at 1:40 am

#289385

Hi All
We have a large Datawarehouse and the size is 50TB.. The tables are placed in filegroups based on the schema like fact, dimensions, raw data each sit on seperate filegroups. I am thinking will it make sense to seperate the large facts which are having billions of rows so that they reside on filegroups on their own.. I would like to know if this is sensible and the pros and cons..
Regards
Senthil

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

SQL Guy 1 SSCoach Points: 15979 More actions · Answer 1

It will make sense if the files belonging to these FGs are placed on separate physical drives and you have a lot of queries between fact tables. In this case you will alleviate large I/O readings by employing parallelism.

However, if most of your queries select data within same fact tables, the better solution would be to partition them, again spreading partition files across different physical drives.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 2

I am in favor of separating the large fact tables into their own filegroups - based on better data and analysis.

I don't necessarily think that the fact tables should each have their own filegroup. But if you can split out the fact tables into some configuration of new filegroups - it would prove beneficial.

Where is the benefit of doing that?

With data this size, it provides a more robust opportunity for recovery. Sure you can see some performance gain (whether on same disks or not). But the big benefit is being able to take a filegroup backup and restore a filegroup backup (when/if necessary).

Maybe somebody wants/needs a 2TB group of tables restored to a dev or qa environment. With FG backups, you can achieve that without having to concoct a process to move the tables via bcp or ssis, or worse via backup/restore of the entire 50TB database.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 3

SQLRNNR (7/30/2014)
I am in favor of separating the large fact tables into their own filegroups - based on better data and analysis.
I don't necessarily think that the fact tables should each have their own filegroup. But if you can split out the fact tables into some configuration of new filegroups - it would prove beneficial.
Where is the benefit of doing that?
With data this size, it provides a more robust opportunity for recovery. Sure you can see some performance gain (whether on same disks or not). But the big benefit is being able to take a filegroup backup and restore a filegroup backup (when/if necessary).
Maybe somebody wants/needs a 2TB group of tables restored to a dev or qa environment. With FG backups, you can achieve that without having to concoct a process to move the tables via bcp or ssis, or worse via backup/restore of the entire 50TB database.

+1000

For anyone interested in such a process, it's called a "Piecemeal" restore in Books Online and such restores can frequently be done online with little or no interruption in service.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

senthil kumar d Old Hand Points: 395 More actions · Answer 4

Thanks Jason.

Your suggestion is one of the main benefit that we want to achieve. The ability to leave tables that we do not do development in the next few months and that can be excluded in this way on the backups to the development environment. But the downside is that the DBA informed me that we are using a SIMPLE recovery database and it is not possible to have filegroup based backup when this setting is made.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

senthil kumar d (8/8/2014)
But the downside is that the DBA informed me that we are using a SIMPLE recovery database and it is not possible to have filegroup based backup when this setting is made.

He's correct. Well, it's possible to take the backups afaik, but they're useless for restoring the DB.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sql-lover SSCoach Points: 18530 More actions · Answer 6

senthil kumar d (7/30/2014)
Hi All
We have a large Datawarehouse and the size is 50TB.. The tables are placed in filegroups based on the schema like fact, dimensions, raw data each sit on seperate filegroups. I am thinking will it make sense to seperate the large facts which are having billions of rows so that they reside on filegroups on their own.. I would like to know if this is sensible and the pros and cons..
Regards
Senthil

Adding to what has been said and giving the case of how common SANs are now...

if all your mdf files reside on the same LUN (regardless of how many drive letters you may see in the Os), you won't get any I/O improvement anyway, even if you use FGs. This is because in most cases, the whole LUN that contains all the logical drives is already a single tier with same RAID.

By the way, why are you using SIMPLE recovery model? Is your company aware of possible data loss in case of a system crash/restore if you use SIMPLE?

senthil kumar d Old Hand Points: 395 More actions · Answer 7

Sorry for the late response. I had asked for a discussion with my DBA on why they are doing SIMPLE recovery instead of Full recovery..

But we have split our filegroups so that they are spread across 16 different drives.. So it should help distribute the IOs..

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 8

senthil kumar d (8/13/2014)
Sorry for the late response. I had asked for a discussion with my DBA on why they are doing SIMPLE recovery instead of Full recovery..
But we have split our filegroups so that they are spread across 16 different drives.. So it should help distribute the IOs..

If it's on a SAN, you have to ask the question... "Are they really 16 separate physical drives"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

senthil kumar d Old Hand Points: 395 More actions · Answer 9

Hi Jeff

Yes. There are 16 seperate physical drives on which the files are distributed. And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

senthil kumar d (8/18/2014)
Yes. There are 16 seperate physical drives on which the files are distributed.

I do hope they're not 16 individual drives each with one filegroup on. If they are, you have no redundancy and probably lower throughput than if they were shared on a SAN due to having only one spindle.

And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..

That's a rather silly reason. Does the DB need point in time recovery? If a drive fails at 15h34, is restoring to the last full backup accessible

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQLRNNR SSC Guru Points: 281334 More actions · Answer 11

GilaMonster (8/18/2014)
senthil kumar d (8/18/2014)
And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..
That's a rather silly reason. Does the DB need point in time recovery? If a drive fails at 15h34, is restoring to the last full backup accessible

"Silly" would be a really polite way of putting it.

I do agree that it appears to be decision that has been made. But what are the recovery rules from the business? Generally they have no idea about the size of the logs or database. They consume and use the data and have the requirements for the recovery of that data to maintain the continuity.

That decision with that reason does not seem sane to me.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events