August 23, 2012 at 2:32 pm
michael.berry 22479 (8/23/2012)
WaitTypeWait_SResource_SSignal_SWaitCountPercentageAvgWait_SAvgRes_SAvgSig_SPAGEIOLATCH_SH93277.2493264.1413.10399999516.970.02330.02330.0000
MSQL_XP68382.9568382.950.0031779712.440.21520.21520.0000
PREEMPTIVE_OS_GETPROCADDRESS68380.7768380.770.0031779712.440.21520.21520.0000
BACKUPIO64827.2064814.3512.85115627111.800.05610.05610.0000
ASYNC_IO_COMPLETION62775.3862775.370.0118811.42333.9116333.91160.0000
BACKUPBUFFER61542.3761414.42127.95236889511.200.02600.02590.0001
PAGEIOLATCH_EX43050.0943045.394.7018565767.830.02320.02320.0000
LCK_M_S36037.1636036.830.3328716.5612.552112.55200.0001
LCK_M_U17102.2717102.270.01553.11310.9504310.95030.0001
WRITELOG12215.1212188.0027.136467402.220.01890.01880.0000
Okay let's see.
The above waitstats show you what SQL has been waiting for since it's last restart or since you last cleared the waitstats. On top of the list is pageIOLATCH_SH which is associated with reading datapages from the database files. Can't say to much about that right now, your average wait is 23ms per read. Which is, without nowing further info of your system neither good or bad at this point.
then Msql_XP and Preeemptyive_OS_GetProcAddress are associated with using extended stored procs. Now since the wait times are so similar with the backup wait times, I'm making the assumption that you are backing up with a 3rd party tool that is hooking into XPs.
BACKUPIO, ASYNC_IO_COMPLETION (during backups) and Backupbuffer are all related with creating backups. Together with the assumption that you use the 3rd party tool that makes up for the xp waits We can see that 55% of the time that sql is waiting is for backups.
Now, I don't no what your normal load has been on this server. If you haven't done much apart from the backups, it could be perfectly normal 55% of the time that sql has been waiting is backup related.
I would suggest that you do the following:
1) run the waitstat query on dev server and store the results for later reference
2) clear the waitstat on the dev server
3) start the backup on the dev server and wait till it finishes
4) run waitstats query again and post th results
do the same 4 steps on production.
that way we can see the waitstats just assosiated with the backup process and have a good compare between the dev and prod waits.
Also, ask if their are any differences between the LUN for the dev and the LUN for production.
Different amount disk spindles, different SANs? , different raid? different fabric, different paths to SAN?
regards,
Edward
August 24, 2012 at 9:36 am
working with SAN guys now to dig deeper.
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 24, 2012 at 10:12 am
You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.
backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10
This will tell you how long it takes to read the data from the disk and if that is the problem.
If it is fast, then the problem is more than likely writing the backup to your backup file.
Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.
August 24, 2012 at 10:17 am
Michael Valentine Jones (8/24/2012)
You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.
backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10
This will tell you how long it takes to read the data from the disk and if that is the problem.
If it is fast, then the problem is more than likely writing the backup to your backup file.
Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.
Now that is a really good suggestion!!
I will do so. My test that I mention at the beginning has copy_only so I am good with that.
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 24, 2012 at 10:42 am
OK. it is taking a long time on prod...
and here are the wait stats while it is running
type wait totalwait
BACKUPIO 988.943242 5746.281352
BACKUPBUFFER 990.484487 5674.775883
BACKUPTHREAD 0 8.825387371
BACKUP 0 0
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 24, 2012 at 4:39 pm
I may have missed it so I have to ask... are both using the same recovery model? Are both pointed to the SAN using the same type of network cards? Are the backup areas for both servers configured the same way (IE raid level, etc)? Is the "path" from the two servers to the backup area the same? Is the size of the data the same?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2012 at 7:40 am
Jeff Moden (8/24/2012)
I may have missed it so I have to ask... are both using the same recovery model? Are both pointed to the SAN using the same type of network cards? Are the backup areas for both servers configured the same way (IE raid level, etc)? Is the "path" from the two servers to the backup area the same? Is the size of the data the same?
Both are using same recover ability model
the VM is local storage I found out and prod is SAN.
I tried running a Copy_only on both and even tried local storage on prod and its still the same.
database is nightly copy from prod to dev so it is same size.
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 28, 2012 at 7:45 am
michael.berry 22479 (8/24/2012)
Michael Valentine Jones (8/24/2012)
You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.
backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10
This will tell you how long it takes to read the data from the disk and if that is the problem.
If it is fast, then the problem is more than likely writing the backup to your backup file.
Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.
It is slow on prod and fast on dev.....
even with no writing so I believe we can assume (always scared to) that it is not the write but the read.
I had the network guys monitor the SAN and before I kicked off the backup on prod the disk was at 2500 bits/read per second then when the backup occurred we jumped to 22 million bits per sec (22 GB per sec) so it seems the reads of the database are working... but still very very slow backup completion.
Now that is a really good suggestion!!
I will do so. My test that I mention at the beginning has copy_only so I am good with that.
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 28, 2012 at 8:55 am
I think a key element in your reply is: vm uses local storage and your prod db uses SAN.
So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?
Are you backing up when everybody is backing up?
How many disks are underneath your database data LUN?
What is the queuedepth setting of the HBA of your server?
Just a couple of questions to keep you busy 😉
Have you got enough memory on your physical server? And have you set max server memory?
Edward
August 28, 2012 at 9:27 am
Edward Dortland (8/28/2012)
I think a key element in your reply is: vm uses local storage and your prod db uses SAN.So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?
Are you backing up when everybody is backing up?
How many disks are underneath your database data LUN?
What is the queuedepth setting of the HBA of your server?
Just a couple of questions to keep you busy 😉
Have you got enough memory on your physical server? And have you set max server memory?
Edward
I appreciate the response
above I said "I tried running a Copy_only on both and even tried local storage on prod and its still the same."
so I can stop looking at the SAN I believe for write.. now the issue apparently is read from db for the backup.
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 28, 2012 at 9:32 am
I'm talking about your read. Your database file is still on the SAN lun right? Or did you also try and move your mdf to local disks?
Regarding tour backup software, you are using exactly the same versions on both servers?
Regards,
Edward
August 28, 2012 at 9:36 am
Edward Dortland (8/28/2012)
I'm talking about your read. Your database file is still on the SAN lun right? Or did you also try and move your mdf to local disks?Regarding tour backup software, you are using exactly the same versions on both servers?
Regards,
Edward
I have sent these questions to my server & storage admins and will get back to you. Thanks!
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 28, 2012 at 11:38 am
micber (8/28/2012)
Edward Dortland (8/28/2012)
I think a key element in your reply is: vm uses local storage and your prod db uses SAN.So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?
Are you backing up when everybody is backing up?
How many disks are underneath your database data LUN?
What is the queuedepth setting of the HBA of your server?
Just a couple of questions to keep you busy 😉
Have you got enough memory on your physical server? And have you set max server memory?
Edward
I appreciate the response
above I said "I tried running a Copy_only on both and even tried local storage on prod and its still the same."
so I can stop looking at the SAN I believe for write.. now the issue apparently is read from db for the backup.
The answers from my admin
The SQL data LUN (E drive) is connected to OURSERVERNAME over 8GB fibre channel. The LUN lives on our 500GB SATA aggregate which has 37 disks in it. The SQL data LUN shares the disk I/O of that aggregate between 5 total volumes I don't know for sure what the queue depth is set to on that HBA but I believe the default is 16.
Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
August 28, 2012 at 11:44 am
Since you are sharing spindles with other servers, looks like you need to have the SAN guys monitor what is happening during the backups. Looks like contention between serveral servers trying to use the same set of disks at the same time.
August 28, 2012 at 11:52 am
I agree with lynn, have your SAN admin look for iops overloading on that aggregate. Is it Netapp?
Meanwhile to see if it is truly the SAN, try restoring a back of the database to local disks and then make a backup to NUL from that local db. That would verify our theory.
Viewing 15 posts - 16 through 30 (of 40 total)
You must be logged in to reply to this topic. Login to reply