SQL data file split on 2 LUN move to 4 LUN

  • My database files have two logical names, and each one of the database files are on a separate SAN partition Lun 1 and Lun 2

    Our vendor recommends 4 LUN with 4 data files.

    As the data is already on the LUN1 and LUN2 and each LUN is set to 500 GIG --total space 1 TB.  Used data is 750 GIG

    Can I get 4 LUN at 250 GIG and some how move the SQL data that is within the 2 LUN into the 4 LUN in SQL ? If so how.

     

  • Well you could do as a first step get 2 more LUNs with 250 GB each and after redistribution of all data shrink the files on the first 2 LUNs to 250 GB each.

    The redistribution would actually be easier if you had 4 Volumes with 500 GB each and shink all 4 Files by 250 GB each but assuming you won't have time / space for that one you basically do the following things:

    1. Rebuild all Clustered Indexes on all tables, rebuild all heap tables, rebuild all NCIX
    2. shrink first 2  data files by the amount of free diskspace you could get back in those 2 data files until you get your 250 / 250 / 250 / 250 GB Setup
    3. repeat 1 and 2 as long as necessary (don't forget your transaction logs meanwhile, they might get bigger) until you reach 250 x4
    4. repeat 1 until you're very close to same data distribution within all data files

    Expect this to be a very time consuming thing to do but all the times I've done that by now have very well paid off in reduced execution times (this last part might still require additional effort for full effect).

  • Thanks for the details.

    What do you mean with redistribution of all data ? Is that a command I have to run ? or does SQL redistribute the data when you add the new Datafiles

  • I think I have a plan, does this seem like the correct method to use.

    Start

    datafile1 500

    datafile2 500

    Add datafile 3 of 250 GIG and set filegrowth to zero.

    Add datafile 4 of 250 GIG and set filegrowth to zero.

    Disable Auto Growth on datafile 3 and datafile 4

    Empty the original data file (reblance) Takes data from end of the data file and moves to the new data files (3, 4) The proportional fill agorithm will evently distribute the data from the original file into the three new files

     

    DBCC Shrinkfile ('datafile1',emptyfile) 
    DBCC Shrinkfile ('datafile2',emptyfile)

    Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth

    Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth

    shrink the original file

     

    DBCC Shrinkfile('datafile1') 
    DBCC Shrinkfile('datafile2')
  • Let me give you a small overview. If you have a filegroup, SQL Server tries to write to all files in the filegroup. I assume you have 1 filegroup now with two files. If you add two files, SQL Server will try to write new information across the files, but this will not change existing distribution.

    There isn't a great way to do this, but what I'd do is look to rebuild each index, which should distribute the data across all 4 files, as you'll be rewriting things.

     

    If you have two filegroups, each with one file, and you want to add new filegroups, you will have to manully move data by rebuilding the clustered index in the filegroup.

  • Single filegroup.   Thanks Steve, that is far easier to do a rebuild index then dbcc shrinkfile, emptyfile and resizing.

     

  • This is actually fascinating. I need to test, but a rebuild should (with rebuild with drop existing) spread the index across all four files. Note that if you have space, shrink the 2 files first. If not, the writes will be proportionally written.

    What I really might do here to balance out sizes is create another filegroup, move most large tables over, shrink, the move them back.

  • well the thing is rebalancing of data is (I think) based on % space free in each Filegroup so if you have like 10 - 20 % free space inside the existing files and if you have large heap tables you will not get around alter table rebuild either (in such case I first do alter table rebuild and then rebuild index with drop existing on), drop with existing on helps but you'll see yourself that there is not going to be any one time shrink 2x500 -> 2x250 GB Files result. Ofcourse feel free to disagree and disprove me and I would actually if it's less than 5% free space in the first 2 Files not bother with shrinking those first, it'll take much more time to squeeze out the last few free gigs right away than after doing the first tour of rebuilds. You'll spend enough time shrinking on the last few gigs after redistribution 😉

    Edit: Just to clarify: Perfect distribution is simply for the best possible performance, if you have distributed most and rebuild all heap tables at least together with all Indexes once, you could leave the rest to index maintenance jobs if you should have time constraints.

    • This reply was modified 5 years, 4 months ago by  DinoRS. Reason: clarification
    • This reply was modified 5 years, 4 months ago by  DinoRS.
  • TRACEY-320982 wrote:

    Our vendor recommends 4 LUN with 4 data files.

    To what end?  WHY does the vendor recommend that?

    In almost every case that I've seen, people are splitting databases up for the wrong reasons.  Your case is a little bit different because you have taken the time to put the files on known separate LUNs and, yes, that CAN help with performance a bit but, as with all else in SQL Server, "It Depends" on a whole lot of different things.

    For example, if your data is accessed fairly evenly across your larger tables then, yes, files on separate LUNs can help with performance but only if you take the time to identify how your larger tables will be split across the multiple files.  If those large tables live on only 1 file, there will be zero performance improvement for that table.

    For that same table, if you put all of the non-clustered indexes on one file and the clustered indexes on another, there could also be a bit of a performance improvement.

    If the data in your large table is based on an ever increasing index key and then only the latest data is accessed to any great extent, breaking the table up won't do a bloody thing for performance because you'll probably end up with all of the latest data on one LUN.

    IMHO, juggling data and data files in an attempt to improve performance almost always turns out to be a total waste of time unless you do some form of partitioning of the data where the most active sections of tables themselves is physically split across multiple LUNs.

    It would be much more effective to put the MDF on one physical LUN and the LDF on another, as an example.  Splitting TempDB in a number of different files and having (depending on how many files you've split it to) 1 or 2 files on each of 4 LUNs can help performance but only if you're having performance problem with TempDB to begin with.

    To be honest, it sounds like you may not know much about this type of thing.  And, no, I'm not bashing you.  I'm trying to help keep you from doing a lot of work for very little ROI.  Unless you have a carefully laid out plan, splitting your current 2 files to 4 isn't going to buy you a thing.

    If you ARE having performance problems, get some help.  There's a more than 90% chance that your time would be better spent in finding performance challenged code and fixing it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm actually fascinated to see such recommendations at all, I've seen enough software which would alone by the expected volume of data (> 16 TB, eDiscovery) require more than one Filegroup, any word from the vendor on that or the install script? Nope.

    One big file and when the time comes for more you'll notice by yourself, not to mention that Writes > Reads for this particular jewel of engineering.

    And yep once you approach that 16 TB File the last thing I would do is add a single file to it, rather DB File Count = CPU Cores (parallelism not only applies to queries but can be leveraged for I/O the very same way) and preferrably right from the god damn beginning aka install script if I have to do that (and I had to once, took me 72+ Hours of rebalancing Data) I'd do it once and right.

    And I did the same thing last year on a 500 GB DWH DB, before the change all jobs together took around 8 hours to complete, afterwards it was 3 hours without touching any queries or SSIS Packages BUT as Jeff said: This was only possible to achieve because I knew exactly what the queries were accessing which helped me reduce file contention during execution. However checkpointing will be faster (due to smaller files) thus you could expect some degree of performance increase during INSERTs - don't judge the ~8% increase in Paul's tests  wrongly - the amount of disks decreased per LUN (but not the total amount of disks) yet there is a performance increase. And yes you will increase the amount of Context Switches/Sec with more files and yes virtualization could cripple performance if CPU Time is shared with other VMs but don't expect this kind of crippling to happen with 4 / 8 Cores and 4/8 Data Files unless the VM Host is really badly configured but running something like 64 Cores with 96 Files (for the interested: That would be cores / 2 as for each of the E T L steps I would count in one Filegroup for a total of 3 so at any time during processing I should stay within 64 Files) might show some different Windows Perfmon Counters popping up. 🙂

  • This is good information you are all providing.

    When you get to high TB of data, then it does take a little more planning and structure.

    Our logs show I/O saturation and long wait times and flushcache, hence the reason for reviewing the current structure of the datafiles and LUN

     

     

  • TRACEY-320982 wrote:

    This is good information you are all providing. When you get to high TB of data, then it does take a little more planning and structure. Our logs show I/O saturation and long wait times and flushcache, hence the reason for reviewing the current structure of the datafiles and LUN    

    That's like NOT something that will be fixed simply by splitting files.  THAT's a nasty set of code problems that you have.

    You also have "flushcache"... how much memory do you have on the machine in question, how much of it is allocated on SQL Server, do you have anything else running on the instance, and is it a VM on a box shared by other things/applications, etc?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff, we doing a review of all details.

    Memory, the server has 109 GIG

     

    • This reply was modified 5 years, 4 months ago by  TRACEY-320982.
  • TRACEY-320982 wrote:

    I think I have a plan, does this seem like the correct method to use. Start datafile1 500 datafile2 500 Add datafile 3 of 250 GIG and set filegrowth to zero. Add datafile 4 of 250 GIG and set filegrowth to zero. Disable Auto Growth on datafile 3 and datafile 4 Empty the original data file (reblance) Takes data from end of the data file and moves to the new data files (3, 4) The proportional fill agorithm will evently distribute the data from the original file into the three new files

    DBCC Shrinkfile ('datafile1',emptyfile) 
    DBCC Shrinkfile ('datafile2',emptyfile)

    Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth shrink the original file

    DBCC Shrinkfile('datafile1') 
    DBCC Shrinkfile('datafile2')

    You're plan is incomplete.  You MUST REBUILD the indexes that ShrinkFile will invert and frequently drive to 99% logical fragmentation.  The problem with that is that anything over just 128 extents (that's only 8MB!) will mean the new defragmented index will be created first and the original index will NOT be dropped until the new one completes.  That might just undo everything you hope shrinkfile was going to do for you if you have a large clustered index go through some inversion.

    There  is a trick to get around that if you can get some temporary space happening.  Create a new file group on the temporary storage, CREATE the existing index using the WITH DROP EXISTING option and point the new index to the new temporary file group.  THEN do the shrinkfile (it should also run faster as well).  Once that's complete, do the same trick with the CREATE but point it back to the original file that you just shrunk.

    I have a very understanding Infrastructure Team.  A have some pretty large tables in a lot of databases and they've allocated a permanent "DBA Only" 1TB drive for me to do all manner of things like this.  A lot of impossibilities have become possible because of the "DBA Only" drive and the only rule that I have to follow is that NOTHING permanent will ever live there.

    You don't have to have a drive THAT size but you should have a "DBA Only" Drive about twice the size of your largest clustered index.  It'll save you a wad of aggravation in many areas and that saves the company money.  The ROI on such a thing is quite high.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TRACEY-320982 wrote:

    Thanks Jeff, we doing a review of all details. Memory, the server has 109 GIG  

    That's on the edge of not being enough for a 750GB database although it'll be good enough for good code.  Consider doubling the amount of memory and then consider using something like sp_WhoIsActive to find the resource intensive, performance challenged code that's driving stuff out of the buffers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply