July 16, 2019 at 3:34 pm
My database files have two logical names, and each one of the database files are on a separate SAN partition Lun 1 and Lun 2
Our vendor recommends 4 LUN with 4 data files.
As the data is already on the LUN1 and LUN2 and each LUN is set to 500 GIG --total space 1 TB. Used data is 750 GIG
Can I get 4 LUN at 250 GIG and some how move the SQL data that is within the 2 LUN into the 4 LUN in SQL ? If so how.
July 16, 2019 at 4:37 pm
Well you could do as a first step get 2 more LUNs with 250 GB each and after redistribution of all data shrink the files on the first 2 LUNs to 250 GB each.
The redistribution would actually be easier if you had 4 Volumes with 500 GB each and shink all 4 Files by 250 GB each but assuming you won't have time / space for that one you basically do the following things:
Expect this to be a very time consuming thing to do but all the times I've done that by now have very well paid off in reduced execution times (this last part might still require additional effort for full effect).
July 16, 2019 at 4:49 pm
Thanks for the details.
What do you mean with redistribution of all data ? Is that a command I have to run ? or does SQL redistribute the data when you add the new Datafiles
July 16, 2019 at 6:32 pm
I think I have a plan, does this seem like the correct method to use.
Start
datafile1 500
datafile2 500
Add datafile 3 of 250 GIG and set filegrowth to zero.
Add datafile 4 of 250 GIG and set filegrowth to zero.
Disable Auto Growth on datafile 3 and datafile 4
Empty the original data file (reblance) Takes data from end of the data file and moves to the new data files (3, 4) The proportional fill agorithm will evently distribute the data from the original file into the three new files
DBCC Shrinkfile ('datafile1',emptyfile)
DBCC Shrinkfile ('datafile2',emptyfile)
Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth
Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth
shrink the original file
DBCC Shrinkfile('datafile1')
DBCC Shrinkfile('datafile2')
July 16, 2019 at 7:32 pm
Let me give you a small overview. If you have a filegroup, SQL Server tries to write to all files in the filegroup. I assume you have 1 filegroup now with two files. If you add two files, SQL Server will try to write new information across the files, but this will not change existing distribution.
There isn't a great way to do this, but what I'd do is look to rebuild each index, which should distribute the data across all 4 files, as you'll be rewriting things.
If you have two filegroups, each with one file, and you want to add new filegroups, you will have to manully move data by rebuilding the clustered index in the filegroup.
July 16, 2019 at 8:24 pm
Single filegroup. Thanks Steve, that is far easier to do a rebuild index then dbcc shrinkfile, emptyfile and resizing.
July 16, 2019 at 9:42 pm
This is actually fascinating. I need to test, but a rebuild should (with rebuild with drop existing) spread the index across all four files. Note that if you have space, shrink the 2 files first. If not, the writes will be proportionally written.
What I really might do here to balance out sizes is create another filegroup, move most large tables over, shrink, the move them back.
July 17, 2019 at 8:01 am
well the thing is rebalancing of data is (I think) based on % space free in each Filegroup so if you have like 10 - 20 % free space inside the existing files and if you have large heap tables you will not get around alter table rebuild either (in such case I first do alter table rebuild and then rebuild index with drop existing on), drop with existing on helps but you'll see yourself that there is not going to be any one time shrink 2x500 -> 2x250 GB Files result. Ofcourse feel free to disagree and disprove me and I would actually if it's less than 5% free space in the first 2 Files not bother with shrinking those first, it'll take much more time to squeeze out the last few free gigs right away than after doing the first tour of rebuilds. You'll spend enough time shrinking on the last few gigs after redistribution 😉
Edit: Just to clarify: Perfect distribution is simply for the best possible performance, if you have distributed most and rebuild all heap tables at least together with all Indexes once, you could leave the rest to index maintenance jobs if you should have time constraints.
July 17, 2019 at 12:19 pm
Our vendor recommends 4 LUN with 4 data files.
To what end? WHY does the vendor recommend that?
In almost every case that I've seen, people are splitting databases up for the wrong reasons. Your case is a little bit different because you have taken the time to put the files on known separate LUNs and, yes, that CAN help with performance a bit but, as with all else in SQL Server, "It Depends" on a whole lot of different things.
For example, if your data is accessed fairly evenly across your larger tables then, yes, files on separate LUNs can help with performance but only if you take the time to identify how your larger tables will be split across the multiple files. If those large tables live on only 1 file, there will be zero performance improvement for that table.
For that same table, if you put all of the non-clustered indexes on one file and the clustered indexes on another, there could also be a bit of a performance improvement.
If the data in your large table is based on an ever increasing index key and then only the latest data is accessed to any great extent, breaking the table up won't do a bloody thing for performance because you'll probably end up with all of the latest data on one LUN.
IMHO, juggling data and data files in an attempt to improve performance almost always turns out to be a total waste of time unless you do some form of partitioning of the data where the most active sections of tables themselves is physically split across multiple LUNs.
It would be much more effective to put the MDF on one physical LUN and the LDF on another, as an example. Splitting TempDB in a number of different files and having (depending on how many files you've split it to) 1 or 2 files on each of 4 LUNs can help performance but only if you're having performance problem with TempDB to begin with.
To be honest, it sounds like you may not know much about this type of thing. And, no, I'm not bashing you. I'm trying to help keep you from doing a lot of work for very little ROI. Unless you have a carefully laid out plan, splitting your current 2 files to 4 isn't going to buy you a thing.
If you ARE having performance problems, get some help. There's a more than 90% chance that your time would be better spent in finding performance challenged code and fixing it.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 17, 2019 at 3:38 pm
I'm actually fascinated to see such recommendations at all, I've seen enough software which would alone by the expected volume of data (> 16 TB, eDiscovery) require more than one Filegroup, any word from the vendor on that or the install script? Nope.
One big file and when the time comes for more you'll notice by yourself, not to mention that Writes > Reads for this particular jewel of engineering.
And yep once you approach that 16 TB File the last thing I would do is add a single file to it, rather DB File Count = CPU Cores (parallelism not only applies to queries but can be leveraged for I/O the very same way) and preferrably right from the god damn beginning aka install script if I have to do that (and I had to once, took me 72+ Hours of rebalancing Data) I'd do it once and right.
And I did the same thing last year on a 500 GB DWH DB, before the change all jobs together took around 8 hours to complete, afterwards it was 3 hours without touching any queries or SSIS Packages BUT as Jeff said: This was only possible to achieve because I knew exactly what the queries were accessing which helped me reduce file contention during execution. However checkpointing will be faster (due to smaller files) thus you could expect some degree of performance increase during INSERTs - don't judge the ~8% increase in Paul's tests wrongly - the amount of disks decreased per LUN (but not the total amount of disks) yet there is a performance increase. And yes you will increase the amount of Context Switches/Sec with more files and yes virtualization could cripple performance if CPU Time is shared with other VMs but don't expect this kind of crippling to happen with 4 / 8 Cores and 4/8 Data Files unless the VM Host is really badly configured but running something like 64 Cores with 96 Files (for the interested: That would be cores / 2 as for each of the E T L steps I would count in one Filegroup for a total of 3 so at any time during processing I should stay within 64 Files) might show some different Windows Perfmon Counters popping up. 🙂
July 17, 2019 at 5:18 pm
This is good information you are all providing.
When you get to high TB of data, then it does take a little more planning and structure.
Our logs show I/O saturation and long wait times and flushcache, hence the reason for reviewing the current structure of the datafiles and LUN
July 17, 2019 at 5:37 pm
This is good information you are all providing. When you get to high TB of data, then it does take a little more planning and structure. Our logs show I/O saturation and long wait times and flushcache, hence the reason for reviewing the current structure of the datafiles and LUN
That's like NOT something that will be fixed simply by splitting files. THAT's a nasty set of code problems that you have.
You also have "flushcache"... how much memory do you have on the machine in question, how much of it is allocated on SQL Server, do you have anything else running on the instance, and is it a VM on a box shared by other things/applications, etc?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 17, 2019 at 8:14 pm
Thanks Jeff, we doing a review of all details.
Memory, the server has 109 GIG
July 17, 2019 at 10:51 pm
I think I have a plan, does this seem like the correct method to use. Start datafile1 500 datafile2 500 Add datafile 3 of 250 GIG and set filegrowth to zero. Add datafile 4 of 250 GIG and set filegrowth to zero. Disable Auto Growth on datafile 3 and datafile 4 Empty the original data file (reblance) Takes data from end of the data file and moves to the new data files (3, 4) The proportional fill agorithm will evently distribute the data from the original file into the three new files
DBCC Shrinkfile ('datafile1',emptyfile)
DBCC Shrinkfile ('datafile2',emptyfile)Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth Reset all datafile datafile1, datafile2, datafile3, datafile4 to 250 and autogrowth shrink the original file
DBCC Shrinkfile('datafile1')
DBCC Shrinkfile('datafile2')
You're plan is incomplete. You MUST REBUILD the indexes that ShrinkFile will invert and frequently drive to 99% logical fragmentation. The problem with that is that anything over just 128 extents (that's only 8MB!) will mean the new defragmented index will be created first and the original index will NOT be dropped until the new one completes. That might just undo everything you hope shrinkfile was going to do for you if you have a large clustered index go through some inversion.
There is a trick to get around that if you can get some temporary space happening. Create a new file group on the temporary storage, CREATE the existing index using the WITH DROP EXISTING option and point the new index to the new temporary file group. THEN do the shrinkfile (it should also run faster as well). Once that's complete, do the same trick with the CREATE but point it back to the original file that you just shrunk.
I have a very understanding Infrastructure Team. A have some pretty large tables in a lot of databases and they've allocated a permanent "DBA Only" 1TB drive for me to do all manner of things like this. A lot of impossibilities have become possible because of the "DBA Only" drive and the only rule that I have to follow is that NOTHING permanent will ever live there.
You don't have to have a drive THAT size but you should have a "DBA Only" Drive about twice the size of your largest clustered index. It'll save you a wad of aggravation in many areas and that saves the company money. The ROI on such a thing is quite high.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 17, 2019 at 10:53 pm
Thanks Jeff, we doing a review of all details. Memory, the server has 109 GIG
That's on the edge of not being enough for a 750GB database although it'll be good enough for good code. Consider doubling the amount of memory and then consider using something like sp_WhoIsActive to find the resource intensive, performance challenged code that's driving stuff out of the buffers.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply