July 3, 2006 at 8:18 pm
Hi All,
I don’t quite understand what happens when we put the data in different data files. Eg: Primary and other data files. Does the user find out what’s where? Or is it seamless to them? Is it that the DBA then only needs to worry about backing up and restoring? Or is there something else involved with it? Is having multiple data files better than one?
We have a data warehouse with 3 data files - A primary, a Dimension, and Fact data files. The Fact data file is 198GB with 128GB worth of indexes. Is it worth putting the Indexes on a new Data file? Will the already created stored procedures have to change in anyway?
Please help.
Thanks.
TK
July 4, 2006 at 12:37 am
It's seamless - you can specify multiple files for SQL Server to use for storage. When creating tables & indices you can specify on which file the table/index is created.
It helps to spread the load out amongst multiple disk subsystems - the more spindles the better. I'll admit most of my experience is with DBs <6GB in size so I am not sure how long it takes to rearrange 350GB of data Your logs should also be on a separate set of drives RAID 0 preferably.
July 4, 2006 at 6:53 pm
Thanks Ian. That was very helpful.
Does any one else have any ideas on splitting the indexes on a bigger database?
July 4, 2006 at 10:17 pm
A couple of thoughts. First, you can get a performance boost by putting tables on their own disk subsystem. The obvious move is the log file, which should still be mirrored or on its own Raid-5 as it too needs the redundancy. The data files need to be on Raid-5 as well.
If you have queries that are dependent on multiple tables, and the tables are separated, then you will lose the gain you would expect or even wind up with a slower access time.
Per BOL if the index is clustered, when created it will effectivelu move the table to the filegroup the index is created on. I am not sure what effectively means in this situation, just that there seems to be no performance advantage to trying to separate clustered indexes and their related tables.
July 4, 2006 at 10:25 pm
Sorry, one other note, in SQL 2005 you can split clustered indexes on partioned tables. If there is a set of older data or unused data to partition to a separate table that also lets you gain a boost from splitting the associated clustered index.
July 5, 2006 at 10:08 am
Multiple files and filegroups are primarily the DBA's concern, they should be transparent to the user. Filegroups only appear in DDL statements (CREATE TABLE, CREATE INDEX), and BACKUP/RESTORE statements. There is no way to specify filegroup in DML statements, queries and stored procedures are not generally affected by filegroup strategy. Creating another filegroup for indexes in your case could have some performance benefit, at the expense of more DBA effort.
Multiple files can improve performance, especially if they are on separate physical disks to spread out the I/O. A separate thread is created for each file to manage I/O, and page allocations are handled separately for each file. This means that using multiple files can relieve I/O bottlenecks.
It is possible to have a single filegroup (PRIMARY) with multiple files. For example, it is a recommended practice to create as many data files for tempdb (all in the PRIMARY filegroup) as you have processors on the server. Having multiple files in a filegroup is transparent to the user, data is spread among the files by SQL Server in a round-robin order. The DBA doesn't have to do anything to distribute data among files in the same filegroup, but they do have the option of backing up & restoring files individually. (In SQL 2000 all files must be backed up & restored at the same time, but SQL 2005 has some other options).
Creating multiple filegroups, each with one or more data files, can have more performance benefits but requires a lot more DBA effort. The greatest benefit is when a godlike DBA knows in advance how every table and index will be used. Ideally whenever table A and table B will be joined using secondary index C, each object is in a separate filegroup (hopefully on separate physical disks). This is easy for the first few tables, but gets very complicated very quickly. Usually the best you can hope for is to locate the largest tables and indexes in filegroups that optimize the most common queries on those tables.
SQL 2005 has more uses for multiple filegroups. The partitioning feature can split large tables across multiple filegroups. Static or historical data can be moved into a read-only filegroup which can be eliminated from the regular backups. Data in a read-only filegroup can be online and available even though other filegroups in the same database are in recovery.
July 5, 2006 at 9:36 pm
Naughty me - I had meant to say that logs are on RAID 1. TempDB can be on RAID 0. Thanks for pointing out mistake!
July 5, 2006 at 11:38 pm
Thankyou all for helping me understand the concept. I don't know what I'd do without you.
Thanks heaps again,
TK
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply