Very Big Data Files

Question

Very Big Data Files

Rob-406527

SSC Eights!

Points: 907
More actions
June 20, 2007 at 8:05 am

#153642

In general what are the recommendations, from a sizing perspective, for adding additional data files to a filegroup. In a new version of our DW I’m seeing the possibility of having some very large datafiles, in excess of 200 GB.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 737519 More actions · Answer 1

I'm not sure I've seen recommendations for sizing with a particular file. Usually it's because of space recommendations. It's also a nice tool if you have sections of read-only data. You can move them to filegroups and then not back them up (very often), saving time and resources, but still having recoverability.

tew Ten Centuries Points: 1249 More actions · Answer 2

Storage may have upper limits. Also in our DW I've split things out into logical filegroups to aid with performance and recovery. I've split out file groups as follows:

1. a staging data and staging index file group. If we lose these, no real big deal as it's used strictly to gather data from our subject areas prior to analytical processing and data aggregation. I can easily recreate the empty file groups and then create the empty staging tables with a DDL extract from the Database Publishing Wizard backup that is run every night.

2. A data and index file group for each subject area of the DW.

Rob-406527 SSC Eights! Points: 907 More actions · Answer 3

Thanks for the info guys. Tim do you have any sort of general rule as to when you add another file to your data or index file groups? For example when the last file reaches a certain size. Or I should say how are you deciding the number of files to include in your data and index filegroups.

Dean Jones-454305 SSCarpal Tunnel Points: 4804 More actions · Answer 4

Depends on what you want, you can either use filegroups for performance, i.e place Read intensive data on a RAID 5 seperate physical disks, or have write intensive data stored on RAID 10 physical filegroup configuration.

You can also use it to organise data for easy management, i.e place data on seperate filegroups for organisation, you will need to know what the size of the data in question is, this will give you an idea of how much space you should allocate for each filegroup.

Bob Fazio SSChampion Points: 10794 More actions · Answer 5

The factor that is usually the biggest limit for me is restore time. If you lose a single file, what is the max time you want to spend to restore it. Compare that to max data you will have and upper limits to the number of data files. And find a happy medium.

At one time, I considered 8g to be a big file. If 200g is the size of your db. Just splitting it up to spread the load (yes I know raid takes care of that) across multiple drives can help Maybe shoot for 32 gig files. And even with Gigabit networks, a 200g file can take a while to restore. If I recall some of my tests. 200g could take upwards of a couple hours (of course filesystem performance, CPU etc play into that).

Rob-406527 SSC Eights! Points: 907 More actions · Answer 6

Thanks for the info Bob. Yes I remember the 8GB data file days. I'm looking at a 3TB warehouse so i'm probably going to look at trying to keep files under 200GB.

tew Ten Centuries Points: 1249 More actions · Answer 7

Hi Rob,

Sorry for my delay. Here is a sp that I use to notify me when individual datafiles are > 70% full. I call the sp from an Agent job first am and send the results to me via DB Mail.

USE [DBMonitor]

GO

/****** Object: StoredProcedure [dbo].[spDBSizerptV1] Script Date: 06/22/2007 10:07:13 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER OFF

GO

CREATE PROCEDURE [dbo].[spDBSizerptV1]

AS

BEGIN

SET QUOTED_IDENTIFIER OFF

SET NOCOUNT ON

DECLARE @myCmd VARCHAR(2000)

CREATE TABLE #T2 (

_dbnameVARCHAR(30),

_TypeDesc VARCHAR(10),

_CurrentSize float,

_MaxSize float,

_data_space_id INT,

_name VARCHAR(15)

)

-- populate temp table #T2 for each database

SELECT @myCmd="INSERT INTO #T2 SELECT '?', LEFT(a.type_desc,10), a.size*8/1024, a.max_size*8/1024, a.data_space_id, b.name from [?].sys.database_files a, [?].sys.filegroups b where a.type=0 and a.data_space_id=b.data_space_id"

exec sp_MSforeachdb @command1=@myCmd

-- show all datafiles >70% average full. datafiles with unrestricted growth are not specified

SELECT_dbname as "Database Name",

_name as "File Group",

sum(_CurrentSize) as "Current File Size (MB)",

sum(_maxsize) as "Max File Size (MB)",

(round(sum(_CurrentSize)/sum(_maxsize),2))*100 as "Pct Full"

FROM#T2

WHERE_maxsize > 0

AND_TypeDesc="ROWS"

groupby _dbname, _name

having(round(sum(_CurrentSize)/sum(_maxsize),2))*100 > 70

ORDERBY _dbname, _name asc

-- clean up

DROP TABLE #T2

END

Very Big Data Files

Cookies on SQLServerCentral