July 30, 2014 at 1:40 am
Hi All
We have a large Datawarehouse and the size is 50TB.. The tables are placed in filegroups based on the schema like fact, dimensions, raw data each sit on seperate filegroups. I am thinking will it make sense to seperate the large facts which are having billions of rows so that they reside on filegroups on their own.. I would like to know if this is sensible and the pros and cons..
Regards
Senthil
July 30, 2014 at 7:49 am
It will make sense if the files belonging to these FGs are placed on separate physical drives and you have a lot of queries between fact tables. In this case you will alleviate large I/O readings by employing parallelism.
However, if most of your queries select data within same fact tables, the better solution would be to partition them, again spreading partition files across different physical drives.
July 30, 2014 at 8:15 am
I am in favor of separating the large fact tables into their own filegroups - based on better data and analysis.
I don't necessarily think that the fact tables should each have their own filegroup. But if you can split out the fact tables into some configuration of new filegroups - it would prove beneficial.
Where is the benefit of doing that?
With data this size, it provides a more robust opportunity for recovery. Sure you can see some performance gain (whether on same disks or not). But the big benefit is being able to take a filegroup backup and restore a filegroup backup (when/if necessary).
Maybe somebody wants/needs a 2TB group of tables restored to a dev or qa environment. With FG backups, you can achieve that without having to concoct a process to move the tables via bcp or ssis, or worse via backup/restore of the entire 50TB database.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
July 30, 2014 at 8:38 am
SQLRNNR (7/30/2014)
I am in favor of separating the large fact tables into their own filegroups - based on better data and analysis.I don't necessarily think that the fact tables should each have their own filegroup. But if you can split out the fact tables into some configuration of new filegroups - it would prove beneficial.
Where is the benefit of doing that?
With data this size, it provides a more robust opportunity for recovery. Sure you can see some performance gain (whether on same disks or not). But the big benefit is being able to take a filegroup backup and restore a filegroup backup (when/if necessary).
Maybe somebody wants/needs a 2TB group of tables restored to a dev or qa environment. With FG backups, you can achieve that without having to concoct a process to move the tables via bcp or ssis, or worse via backup/restore of the entire 50TB database.
+1000
For anyone interested in such a process, it's called a "Piecemeal" restore in Books Online and such restores can frequently be done online with little or no interruption in service.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 8, 2014 at 3:24 am
Thanks Jason.
Your suggestion is one of the main benefit that we want to achieve. The ability to leave tables that we do not do development in the next few months and that can be excluded in this way on the backups to the development environment. But the downside is that the DBA informed me that we are using a SIMPLE recovery database and it is not possible to have filegroup based backup when this setting is made.
August 8, 2014 at 3:29 am
senthil kumar d (8/8/2014)
But the downside is that the DBA informed me that we are using a SIMPLE recovery database and it is not possible to have filegroup based backup when this setting is made.
He's correct. Well, it's possible to take the backups afaik, but they're useless for restoring the DB.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 8, 2014 at 4:52 pm
senthil kumar d (7/30/2014)
Hi AllWe have a large Datawarehouse and the size is 50TB.. The tables are placed in filegroups based on the schema like fact, dimensions, raw data each sit on seperate filegroups. I am thinking will it make sense to seperate the large facts which are having billions of rows so that they reside on filegroups on their own.. I would like to know if this is sensible and the pros and cons..
Regards
Senthil
Adding to what has been said and giving the case of how common SANs are now...
if all your mdf files reside on the same LUN (regardless of how many drive letters you may see in the Os), you won't get any I/O improvement anyway, even if you use FGs. This is because in most cases, the whole LUN that contains all the logical drives is already a single tier with same RAID.
By the way, why are you using SIMPLE recovery model? Is your company aware of possible data loss in case of a system crash/restore if you use SIMPLE?
August 13, 2014 at 3:22 am
Sorry for the late response. I had asked for a discussion with my DBA on why they are doing SIMPLE recovery instead of Full recovery..
But we have split our filegroups so that they are spread across 16 different drives.. So it should help distribute the IOs..
August 15, 2014 at 6:51 pm
senthil kumar d (8/13/2014)
Sorry for the late response. I had asked for a discussion with my DBA on why they are doing SIMPLE recovery instead of Full recovery..But we have split our filegroups so that they are spread across 16 different drives.. So it should help distribute the IOs..
If it's on a SAN, you have to ask the question... "Are they really 16 separate physical drives"?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 18, 2014 at 1:00 am
Hi Jeff
Yes. There are 16 seperate physical drives on which the files are distributed. And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..
August 18, 2014 at 2:38 am
senthil kumar d (8/18/2014)
Yes. There are 16 seperate physical drives on which the files are distributed.
I do hope they're not 16 individual drives each with one filegroup on. If they are, you have no redundancy and probably lower throughput than if they were shared on a SAN due to having only one spindle.
And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..
That's a rather silly reason. Does the DB need point in time recovery? If a drive fails at 15h34, is restoring to the last full backup accessible
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 18, 2014 at 7:51 am
GilaMonster (8/18/2014)
senthil kumar d (8/18/2014)
And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..
That's a rather silly reason. Does the DB need point in time recovery? If a drive fails at 15h34, is restoring to the last full backup accessible
"Silly" would be a really polite way of putting it.
I do agree that it appears to be decision that has been made. But what are the recovery rules from the business? Generally they have no idea about the size of the logs or database. They consume and use the data and have the requirements for the recovery of that data to maintain the continuity.
That decision with that reason does not seem sane to me.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply