master data files location for AWS EC2 EBS disks

Question

Post reply

master data files location for AWS EC2 EBS disks

afoote2023@outlook.com

Old Hand

Points: 360
More actions
November 12, 2022 at 5:13 pm

#4112107
1. When running SQL Server on EC2, I read AWS suggests putting the Data and Log files on a single, dedicated EBS disk (separate from the system disk). I had used 2 disks: one for Data files and one for Log files. Do you agree with AWS's guidance to use a single EBS disk for both Data and Log?
2. Are there disadvantages/advantages of moving the master & msdb files (data & log) to the same disk(s) that the non-system databases uses, e.g. D:\, L:\ ? Or should master stay in its default location (C:\) ?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80330 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 2

With the understanding that I'm NOT an AWS EC2 ninja by any means, I cannot imagine doing anything different for drives (by letter) on EC2 than you would for your normal ON PREMISE instances.

I WILL say, however, that I don't install SQL server on the C: drive. I typically give it it's own drive with a simplified directory structure for the data and log files for master, model, and MSDB. I DO also put TempDB on it's own drive (T: to be specific).

I also see no harm in putting the MDF/NDF and LDF files for your user databases on a single drive on AWS EC2 except that it offends my personal sensibilities only because that would be a lot different than what I do. And, I don't know if there's any cost advantage to any of this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16155 More actions · Answer 3

Cloud is slightly different than on prem - while on prem normally IOPS and max throughput are not limited on a per vm and/or disk type/size, on cloud that is done.

so depending on the sizes of the servers and disks (which are by default all SSD - although some fasters than others (premium faster and ultra disks even faster) IOPS vs size vs cost is a significant consideration.

on one of my project cloud servers (azure) (very heavy data based) I tested the setup with 2 vs 1 disk for logs/data - so for logs we picked up a 1 TB disk for log disks (Premium P30 iops/throughput = 5000/200MBs) and there where lots of IO's taking more than 15 seconds messages on log. Data was on a P60 8TB (IOPS 16000/500MBs) disk.

when we put both data and log on a P60 messages did go away (as expected) and overall performance increased.

so cloud design needs to take in consideration other facts - and SSD's on prems have mostly also removed in many cases the need to have multiple disks.

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 4

But that doesn't have anything to do with how many drives you have or whether or not you mix MDFs with LDFs. It only has to do with the speed of the disks you selected.

If the number of drives costs extra then, sure, combine the drives... I'd still use separate folders for MDFs and LDFs.

Also looking at the drive premiums you're looking at, I'm liking On-Prem more and more. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16155 More actions · Answer 5

Jeff Moden wrote:

But that doesn't have anything to do with how many drives you have or whether or not you mix MDFs with LDFs. It only has to do with the speed of the disks you selected.
If the number of drives costs extra then, sure, combine the drives... I'd still use separate folders for MDFs and LDFs.
Also looking at the drive premiums you're looking at, I'm liking On-Prem more and more. 😀

extra drives does cost more. https://azure.microsoft.com/en-us/pricing/details/managed-disks/

so 1 P60 is $860 per month - in order to split user data/log and tempdb data/log you are talking about 4 times that (3440).

and this in many cases for space that would never be used by the application (32 TB on above 4 disks).

so yes everything needs to be taken in consideration.... including the very important fact that in order to increase space allocated to a disk you NEED to SHUTDOWN the server (which on prem may not be required)

Ant-Green SSC Guru Points: 113507 More actions · Answer 6

We are going slightly off on a tangent here as the OP mentioned AWS not Azure, but I do appreciate that both clouds will have the similarities and differences.

While I'm not an AWS expert by any means when you deploy an RDS instance of SQL it looks to in the backend just put the MDF/NDF/LDF/BAK/TRN all on the same volume, so that may have some merit to doing it in AWS on EC2 (as essentially RDS is just an EC2 machine anyway) but for me it would always be to put MDF/NDF on one set of disks, LDF on another, backups on another etc.

Going back to Azure and disks, a P60 isn't recommended by Microsoft to house data files, you should stop at a P40 (or at a push a P50, but set the disk size to be 4095GB (not 4096GB), this is so you can take advantage of disk caching. The recommendations on Azure is to have host read caching enabled for your data disks, and no caching on your log disks.

Only disks < 4096GB can have caching enabled.

If you then need capacity you need to look at doing storage pools of multiple P50/P40/P30 disks depending how much IO you need for capacity.

Logs however you can put on any P30-P80 disk, as you don't want to cache your logs.

You then also need to look at your machine sizes and what that particular machine size can handle in terms of throughput, as you mitigate the bottleneck at the disk level, the machine level then becomes your bottleneck.

So for a 8TB P60 (16,000 IOPS, 500MB throughput), if you did that over 4 x 2TB P40s you would get 30,000 IOPS, 1,000 MB throughput, of if you did it over 8 x 1TB P30s you would get 40,000 IOPS, 1,600 MB throughput. Storage capacity is linear but storage performance is far from linear in Azure. And I guess the same in AWS/GCP. Pooling these together the OS thinks it is just one disk, but you gain the added benefit of using multiple spindles so you don't need to go and chunk up your databases into smaller MDF/NDF files, present the pool then you can easily migrated from a P60 to the pool without changing any DB setup.

Had a customer once with a huge SharePoint farm on a P60 and they where having serious bottlenecks when backups where happening because they where hitting the 500MB throughput and 15,000 IOPS while it was occurring, switching to a storage pool of 4 P40s solved that right out for them.

You may end up paying a little bit extra for multiple smaller disks, but for the additional few $$$ you get a lot more in performance.

https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-checklist?view=azuresql

https://blog.coeo.com/make-the-most-out-of-your-azure-disks-using-storage-pools

Jonathan AC Roberts SSCoach Points: 18356 More actions · Answer 7

Ant-Green wrote:

While I'm not an AWS expert by any means when you deploy an RDS instance of SQL it looks to in the backend just put the MDF/NDF/LDF/BAK/TRN all on the same volume, so that may have some merit to doing it in AWS on EC2 (as essentially RDS is just an EC2 machine anyway) but for me it would always be to put MDF/NDF on one set of disks, LDF on another, backups on another etc.

AWS will hire you an RDS instance, but RDS is very different from an EC2 server with SQL Server installed.

Also, some types of RDS server have the tempdb drive on a different letter.

Ant-Green SSC Guru Points: 113507 More actions · Answer 8

Jonathan AC Roberts wrote:

Ant-Green wrote:
While I'm not an AWS expert by any means when you deploy an RDS instance of SQL it looks to in the backend just put the MDF/NDF/LDF/BAK/TRN all on the same volume, so that may have some merit to doing it in AWS on EC2 (as essentially RDS is just an EC2 machine anyway) but for me it would always be to put MDF/NDF on one set of disks, LDF on another, backups on another etc.
AWS will hire you an RDS instance, but RDS is very different from an EC2 server with SQL Server installed.
Also, some types of RDS server have the tempdb drive on a different letter.

Thank you, as I say AWS is not my forte, only played with it a little and from what I saw on what we deployed all the properties showed data/log/backups all pointing to the same underlying storage volume.

AWS is something I need to pickup more of, but most clients I have worked with are all Azure based clients.

afoote2023@outlook.com Old Hand Points: 360 More actions · Answer 9

afoote2023@outlook.com

Old Hand

Points: 360

November 17, 2022 at 4:34 pm

#4114402

Thank yo u all for the comments. Much appreciated.

afoote2023@outlook.com Old Hand Points: 360 More actions · Answer 10

afoote2023@outlook.com

Old Hand

Points: 360

November 17, 2022 at 4:34 pm

#4114403

Thank you all for the comments. Much appreciated.