Filegroups, data files best practices?

  • Hello fellows,

    I have a couple of questions regarding best practices.

    We have a VLDB with a weird distributed data files. one datafile have 1TB another might have 5TB another might have 2TB and we are planning to reorganize the whole database.

    So when we create the new one to migrate all the data and i have been wondering the next thing, the new one would also have a new feature like FileStream, which we don't actually use.

    When you first create a database it automatically creates a default primary filegroup which is the default, if im not mistaken the primary filegroup have critical tables, pages used to start the database once the instance starts.

    We don't have all the data in the primary, we have different FG, but we still have data in the primary, production data.

    I have the next question:

    1- Shall we create another filegroup and set it as default and leave the PRIMARY alone, like empty alone and leave it be for critical pages like the boot page etc?

    2- Would i get more performance by doing this?

    3- is this a good practice?

    I have been thinking, i don't know where the idea came from, it might be dumb, i don't know.

    Thanks for taking your time to read this 🙂

    Regards,

    1. Some people do this. Filegroups in general are a very good idea. They allow for all sorts of management like piecemeal restores that would be much more difficult without them. Ensuring that you can split the data up within the database, moving things out of the primary file group, can assist in this. It's a good idea.
    2. Nope. A file is a file. A page is a page. A write is a write. You can get more performance by splitting all this up between disks and disk controllers (with multiple CPUs to manage the whole thing), but that would be the same if you had all your files in a single primary group or you split the groups apart. What matters is the lower level storage here, not the filegroup that it belongs to.
    3. Yes. But not for performance.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks a lot for your reply in the post before.

    I have been wondering if you had the chance to reply to the next question, in the example before in the post we manage binaries without FileStream, we are using VARBINARY(MAX), and we want to change that, we have the next issue.

    We have a VLDB with 9TB, all belong to our core and all the binaries are in this database, the 9TB one, and all the binaries are in the same table, yes the same table, the whole data is in one table or maybe 90% of the whole database, we don't have performance issues thanks to indexes.

    We have been seeing FileStream since it’s better to use the NTFS File system with SQL SERVER FILESTREAM to manage binary data than SQL Server using varbinary(max), and we want to make the next changes.

     

    We have the next architecture planned.

    1 – One primary data file (we are planning to leave this one alone, hence the questions in the first post)

    2 – Another data file for data that isn’t the binaries one, like application configuration data.

    2 – Multiple FileStream filegroups to hold the binaries per year, 2010, 2011, 2012, etc.

     

    Our plan would be to lock the old FileStream filegroups so data can’t be written, it would only hold the data of these years and instead of taking full database backups we would be taking backups of the current year, we would have older backups of the 2010, 2011, 2012 data but it wouldn’t change at all, so the backup would be smaller, the datafiles that change and the FileStream filegroup of the current year and it would be smaller and faster, in the end if we need to recover from everything we would be restoring the backups by files per year.

    What do you think?, how would you tackle this issue?

    • This reply was modified 5 years, 5 months ago by  Alejandro Santana. Reason: Excessive space between paragraphs
  • I made a reply, but it said "This was removed by the editor as SPAM", don't know how it happened, i'll try to post it again.

    so here it goes:

    Thanks a lot for your reply in the post before.

    I have been wondering if you had the chance to reply to the next question, in the example before in the post we manage binaries without FileStream, we are using VARBINARY(MAX), and we want to change that, we have the next issue.

    We have a VLDB with 9TB, all belong to our core and all the binaries are in this database, the 9TB one, and all the binaries are in the same table, yes the same table, the whole data is in one table or maybe 90% of the whole database, we don't have performance issues thanks to indexes.

    We have been seeing FileStream since it’s better to use the NTFS File system with SQL SERVER FILESTREAM to manage binary data than SQL Server using varbinary(max), and we want to make the next changes.

     

    We have the next architecture planned.

    1 – One primary data file (we are planning to leave this one alone, hence the questions in the first post)

    2 – Another data file for data that isn’t the binaries one, like application configuration data.

    2 – Multiple FileStream filegroups to hold the binaries per year, 2010, 2011, 2012, etc.

     

    Our plan would be to lock the old FileStream filegroups so data can’t be written, it would only hold the data of these years and instead of taking full database backups we would be taking backups of the current year, we would have older backups of the 2010, 2011, 2012 data but it wouldn’t change at all, so the backup would be smaller, the datafiles that change and the FileStream filegroup of the current year would change and it would be smaller and faster, in the end if we need to recover from everything we would be restoring the backups by files per year.

    What do you think?, how would you tackle this issue?

  • My experience with, and knowledge, filestream is somewhat limited, so I'm hesitant to pronounce too strongly on this approach. I do know that files within databases leads to all sorts of issues, so storing them externally is superior. Beyond that though, I don't feel qualified to answer.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That's cool, i'll keep going using and doing everything i can with FileStream and keep the post updated to see if i find anything weird.

     

    Thanks 😀

    Regards,

  • Grant Fritchey wrote:

    My experience with, and knowledge, filestream is somewhat limited, so I'm hesitant to pronounce too strongly on this approach. I do know that files within databases leads to all sorts of issues, so storing them externally is superior. Beyond that though, I don't feel qualified to answer.

    I've not played with FileStream, yet.  What happens if the volume the files are stored on rolls a 7 and you have to do a restore?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Never mind... I looked it up.  It would appear that if you allow it, the files become a part of the backups... basically, using SQL Server to backup part of the "file system".  You can prevent it, as well.

    Still reading on other things that I suddenly have questions about concerning FileStream.  Heh... I also do so with a Jaundiced Eye because of things like when I drank the Partitioned Table Kool-Aid and then found out what the unwritten limitations and problems are (and still putting up with today).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lordy.  I, unfortunately, found what I was looking for at item #12 at the following URL...

    https://www.mssqltips.com/sqlservertip/1875/best-practices-when-using-filestream-feature-of-sql-server/

    ... which states...

    Disable antivirus software from scanning FILESTREAM volumes if it's not mandatory within your organization. If antivirus scanning is mandatory, then set the policies that will not automatically delete offending files. If the files are deleted it will corrupt your FILESTREAM enabled database.

    I'm not so concerned about the antivirus thing.  I'm concerned that something can delete files and corrupt the database.

    I've also watched a couple of YouTubes on the subject where they show the directory structure and the file names used.  It's not  human friendly at all.

    I'm also looking at point #4 from the same article I cited above and it says...

    It is a good practice to periodically perform disk defragmentation of the volumes which store FILESTREAM data.

    BWAAA-HAAAA!!!!  Oh goody!  Even more things for the DBA to worry about with fragmentation!

    And, last but not least and, since I'm a bit new to this, it looks like it makes a full size copy of the files.  If you don't get rid of the originals, it would appear that you're actually doubling your storage requirement.

    Being a total neophyte to the wiles of FileStream, I could certainly be missing a whole lot but I'm seeing no advantages and a whole lot of disadvantages here.  I'd love it if someone would provide some additional information on FileStream that would make it seem useful to me because, right now, I'm not seeing the draw for anything about it except that it can handle files larger than 2GB IF you need such a thing.

    BTW... I have a 1.2TB table (the only table in a particular database) with (currently) 118 (yeah... I know... crazy but they want to keep it all forever) File Groups with 1 monthly file per file group.  I only have to backup "current month" and "next month" (which is empty until it becomes "current month" because I've set all of the older static stuff (call recordings in the form of "compressed" WAV files) to READ_ONLY.

    When I first inherited the table with only 36 or so File Groups, I was much like everyone else and was totally appalled that they'd be storing some (rather long) WAV files in my database especially since the files were available elsewhere.  When I started research on the files (they had also recorded the file path), I found that about 10% of the files were missing and an additional 10% had actually been corrupted.

    That's when I decided that it's OK to store such things in the database because NO ONE will guard data more fiercely than a DBA.

    My recommendation is that all that glitters is not Gold and you really need to do some serious research and deep-dive testing before you drink the purple Kool-Aid.  I learned this the hard way and ended up Table Partitioning that table.  I'm currently doing the research and testing to convert it to a nice, simple, partitioned view, which has most of the advantages of a partitioned table, a lot less of the disadvantages, and some advantages that you just can't do with a partitioned table.

    Make sure you do the same with FileStream because, from my first blush look at it, there are going to be some hidden "gotcha's".  Compare it carefully with storing your binaries in SQL Server, which might be the better choice.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for joining us Jeff 😀

    Yeah, we gotta do a TON of testing before even trying to put everything in production, i have been reading the white paper made by Paul Randal.

    Gonna put it on the attachment, i've already read it twice, looking to read another couple of times also.

    Jeff Moden wrote:

    Lordy.  I, unfortunately, found what I was looking for at item #12 at the following URL... https://www.mssqltips.com/sqlservertip/1875/best-practices-when-using-filestream-feature-of-sql-server/ ... which states...

    Disable antivirus software from scanning FILESTREAM volumes if it's not mandatory within your organization. If antivirus scanning is mandatory, then set the policies that will not automatically delete offending files. If the files are deleted it will corrupt your FILESTREAM enabled database.

    I'm not so concerned about the antivirus thing.  I'm concerned that something can delete files and corrupt the database. I've also watched a couple of YouTubes on the subject where they show the directory structure and the file names used.  It's not  human friendly at all. I'm also looking at point #4 from the same article I cited above and it says...

    Actually, I’m really afraid of this, we would talk to IT Sec to make the antivirus don’t even look to the drives of FileStream, we even have a weird service that reads all the drives and replicates the data and when I was testing in our environment, I tried dropping the FileStream database… you know what happened? EVERYTHING was gone but the FS Folder, its supposed to drop everything, including the files and you know what? the thousands of files were still there.. and the database error log presented this.

    The operating system returned the error '32(The process cannot access the file because it is being used by another process.)' while attempting 'StreamFileMgr::DoDeleteFsFile' on 'G:\FileStream\FS1' at 'stmfilemgr.cpp'(5460).

    The operating system returned the error '32(The process cannot access the file because it is being used by another process.)' while attempting 'RemoveDirectory' on 'G:\FileStream\FS1' at 'storage.cpp'(865).

    The only person with access to these drives should be the service user, not even us DBAs should even have access to it, I don’t know if the uppers would like to take the risks.

    Jeff Moden wrote:

    It is a good practice to periodically perform disk defragmentation of the volumes which store FILESTREAM data.

    Reading the whitepaper I might have missed this, but I didn’t saw anything related to disk fragmentation but it actually make sense, this even makes it more scarier.. we should dig more into this one.

    Jeff Moden wrote:

    ... I have a 1.2TB table (the only table in a particular database) with (currently) 118 (yeah... I know... crazy but they want to keep it all forever) File Groups with 1 monthly file per file group.  I only have to backup "current month" and "next month" (which is empty until it becomes "current month" because I've set all of the older static stuff (call recordings in the form of "compressed" WAV files) to READ_ONLY. When I first inherited the table with only 36 or so File Groups, I was much like everyone else and was totally appalled that they'd be storing some (rather long) WAV files in my database especially since the files were available elsewhere.  When I started research on the files (they had also recorded the file path), I found that about 10% of the files were missing and an additional 10% had actually been corrupted. That's when I decided that it's OK to store such things in the database because NO ONE will guard data more fiercely than a DBA.

    It's true, no one will guard data more fiercely than a DBA.

    After reading 118 filegroups I was like: dude, these beasts handle hundreds of filegroups per database, we don’t even have 20 for our largest, we are still momma kids and have a wayyy long to go!

    Jeff Moden wrote:

    My recommendation is that all that glitters is not Gold and you really need to do some serious research and deep-dive testing before you drink the purple Kool-Aid.  I learned this the hard way and ended up Table Partitioning that table.  I'm currently doing the research and testing to convert it to a nice, simple, partitioned view, which has most of the advantages of a partitioned table, a lot less of the disadvantages, and some advantages that you just can't do with a partitioned table. Make sure you do the same with FileStream because, from my first blush look at it, there are going to be some hidden "gotcha's".  Compare it carefully with storing your binaries in SQL Server, which might be the better choice.

    Thanks a lot for this, I’ll try to make something like a document with all the tests I’ve done, I mean I’m not that GOOD taking performance counter data or what to look out for when using FileStream, im gonna still try to read lots and lots of documents and making different tests.I’ll try my best to make something useful out of it and post it back here.

    Attachments:
    You must be logged in to view attached files.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply