Drawbacks of having hundreds of data files per database

  • Hi All,

    Wondering if the collective wisdom of this community can help provide some pointers. We have a development team that has developed an App, in their infinite wisdom, to create table partitions and new files/filegroups every night. In my experience, partitioning and creating different files to host the partitions has some value when done yearly or at most monthly. While I know instinctively that it's not good practice and that it will become a maintenance nightmare, I'm unable to articulate specifics.

    Playing this scenario out - there is no plan in the design to remove older partitions at some point. Which means day1 - 1 data file, day 2 - 2 data files....... day 300 will have 300 data files for that one database. The max number of files possible, theoretically, is 32,767, so they're unlikely to run into that for up to 10 years.

    Anyone have any thoughts on what specifically the pitfalls of such a design approach could be ? We manage hundreds/thousands of SQL Servers in the enterprise, and would like to maintain some standards. This App team insists they want to persist with this design, and we see no value add whatsoever. Here's a good(short) treatise on this subject by Gail Shaw: https://www.red-gate.com/simple-talk/databases/sql-server/database-administration-sql-server/gail-shaws-sql-server-howlers/

    We see the upside when done, say on a monthly or yearly basis where we will see improved maintenance times since they are only needed on the active partition. Daily just does not seem to make sense.

     

  • Yes... there is a drawback.  A fair bit of code will actually run slower because there's an extra lay of structure that you have to go through.  Partitioning is NOT a query performance feature even with so called "Partition Elimination".  They'll also take more storage because the clustered index needs to have the partitioning column added to it and that means it will also be added to at least every "partition aligned" index.

    Another issue is, how much freespace ends up in each partition?  I could be that you're wasting a huge amount of space that can be used my nothing else.  "It Depends" on many factors including the initial size of the partition, file growth, and low page densities caused by page splits.

    The good part about it is that it will make index maintenance a whole lot faster than a large monolithic structure and it can make backups a whole lot faster if you do the "seal-a-meal" thing on each partition that becomes static and making it "Read Only" and then doing backups of only writable partitions.

    How many rows per day are you generating for this table?  And how many pages does that entail?  I don't have a number in mind but if the answer is less than a million rows per day, my temptation would be to shift to monthly partitions.

    The bottom line is to not make any hasty decisions one way or another until you can prove one way or the other and if the proof is a draw, then only sensibilities may be offended.  I say that but I'm pretty sure you could get much better performance with the correct and appropriate indexes on monolithic structures.  Again, I'm thinking that someone is using partitioning  because they think it improves performance and it does not.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff. Appreciate the detailed insights regarding performance implications. The product team has so far not engaged with regards to a design discussion, but trying nonetheless. I agree that performance as a reason for implementing table partitioning is a myth that we as DBA's should help dispel. I also agree regarding the space management and subsequent(potential) wastage of disk space. I was wondering what other negative outcomes there could be from proliferating hundreds of data files per database. I have to imagine there is some downside - I feel it in my bones :-).

  • I'd think there is some admin headaches, someone might move a file or delete something thinking it's not important. There might be corruption that goes undetected, but those are really minor.

    Really, I think lots of confusion for new staff that might come on, in addition to the perf stuff Jeff mentions.

    My argument would be I don't think this buys you anything. For that reason, I wouldn't do anything. I'd make the app team prove this helps in some way in order to make a change from how existing systems are set up.

  • TheMaskedDBA wrote:

    Thanks Jeff. Appreciate the detailed insights regarding performance implications. The product team has so far not engaged with regards to a design discussion, but trying nonetheless. I agree that performance as a reason for implementing table partitioning is a myth that we as DBA's should help dispel. I also agree regarding the space management and subsequent(potential) wastage of disk space. I was wondering what other negative outcomes there could be from proliferating hundreds of data files per database. I have to imagine there is some downside - I feel it in my bones :-).

    I have one table that I drank the Purple "Flavor-Aid" on... it has 1 monthly partition for every month in the past since May of 2010 (I'm still trying to get them to trim that down to 2 years 🙁 ).

    Now, I have it very well managed and totally automated but it's a spooky looking list of things when you call up the files and filegroups.  The next DBA is going to leave a serious hashmark in their britches when they first see it.  145 files as of this month not including the nearly empty file already online for next month.  If someone told me they wanted daily files, then I'd need to say the "DBA Prayer" about 20 times a day....

    "Dear Lord, I'm a production DBA... please give me patience because, if you give me strength, I'm going to need bail money to go with it". 😀

    .

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff & Steve. Will keep this thread posted with updates and/or outcome. First meeting with product/dev team did not achieve much.

    EDIT : I like Jeff's DBA prayer 😀

    • This reply was modified 2 years, 6 months ago by  TheMaskedDBA.
  • Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    I'll assume you're not going to waste space on each daily partitioning, i.e. you'll shrink its disk size to only what it actually consumes at the close of day.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    How often do you query older data?  How often do you query across different days?  If you tried from all / most of, say, 100+ data files, I would think you would see some delay just from that.  If you only typically read the most recent month or so of data, it probably won't be a big issue.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • There isn't anything that states you must have a separate filegroup for each partition - in fact, you could have all partitions in the same filegroup or split across 2 or more.  Unless you have a specific requirement for creating separate filegroups there really isn't a downside that I am aware of to placing all partitions in the same filegroup.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeff Moden wrote:

    "Dear Lord, I'm a production DBA... please give me patience because, if you give me strength, I'm going to need bail money to go with it". 😀

    .

    This made my day...I'm definitely going to keep this one in the back pocket!


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Jeffrey Williams wrote:

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    This was not about data types at all, in any way, it was about partitions and filegroups.  And I will stand by what I said: developers should never dictate to the DBA what filegroups / files / partitions are used in the db.  The DBA and developers will collaborate on partitions, of course, but the ultimate decision is the DBA's.  (Until management overrides the DBA to satisfy the developers, as too often happens.)

    And I'm never a bottleneck to developers, as I prioritize their requests.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jeff Moden wrote:

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    Agree Jeff.  When I came onboard, the developers had been treated like idiots in every way by my predecessor.   The biggest thing was his absolute insistence that they use NOLOCK in every query. Without it, there was massive blocking and deadlocks occurring.

    It was painful, there were a lot of "holdouts", but by educating, providing proofs, and many times re-writing the code myself we moved forward.  There is now a partnership between the developers and me.  I am a resource for them.  For a long time, I had to review every line of code they produced.  Now, I have to spend little time reviewing code.  They come to me first whenever they are unsure of something.

    Is there a lot of absolute garbage in the systems? Yes, in the old legacy applications.  But for any new development, there is seldom things that get "sent back".

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • ScottPletcher wrote:

    Jeff Moden wrote:

    Jeffrey Williams wrote:

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    This was not about data types at all, in any way, it was about partitions and filegroups.  And I will stand by what I said: developers should never dictate to the DBA what filegroups / files / partitions are used in the db.  The DBA and developers will collaborate on partitions, of course, but the ultimate decision is the DBA's.  (Until management overrides the DBA to satisfy the developers, as too often happens.)

    And I'm never a bottleneck to developers, as I prioritize their requests.

    If I didn't read it that way, then others may not either.  Datatypes actually ARE a part of "how the data is stored".  I know what you mean now that you've clarified but only because you clarified.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Jeffrey Williams wrote:

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    And I have to say that is incredibly rude - I don't treat developers that way.  I do not expect any developer to understand mount points or how the SAN is configured - or what drive letters we are using for each database file, or even how many files are included in each file group.

    If a developer shows interest in the subject I am more than happy to sit with them and show them, but I don't expect them to know anything about these subjects.  Their job doesn't require them to know that and they shouldn't be responsible for laying out the storage for SQL Server.  That is the DBA's responsibility...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply