Backup taking MUCH longer on prod HELP:)

Question

Backup taking MUCH longer on prod HELP:)

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply

Edward Dortland Old Hand Points: 303 More actions · Answer 1

michael.berry 22479 (8/23/2012)
WaitTypeWait_SResource_SSignal_SWaitCountPercentageAvgWait_SAvgRes_SAvgSig_S
PAGEIOLATCH_SH93277.2493264.1413.10399999516.970.02330.02330.0000
MSQL_XP68382.9568382.950.0031779712.440.21520.21520.0000
PREEMPTIVE_OS_GETPROCADDRESS68380.7768380.770.0031779712.440.21520.21520.0000
BACKUPIO64827.2064814.3512.85115627111.800.05610.05610.0000
ASYNC_IO_COMPLETION62775.3862775.370.0118811.42333.9116333.91160.0000
BACKUPBUFFER61542.3761414.42127.95236889511.200.02600.02590.0001
PAGEIOLATCH_EX43050.0943045.394.7018565767.830.02320.02320.0000
LCK_M_S36037.1636036.830.3328716.5612.552112.55200.0001
LCK_M_U17102.2717102.270.01553.11310.9504310.95030.0001
WRITELOG12215.1212188.0027.136467402.220.01890.01880.0000

Okay let's see.

The above waitstats show you what SQL has been waiting for since it's last restart or since you last cleared the waitstats. On top of the list is pageIOLATCH_SH which is associated with reading datapages from the database files. Can't say to much about that right now, your average wait is 23ms per read. Which is, without nowing further info of your system neither good or bad at this point.

then Msql_XP and Preeemptyive_OS_GetProcAddress are associated with using extended stored procs. Now since the wait times are so similar with the backup wait times, I'm making the assumption that you are backing up with a 3rd party tool that is hooking into XPs.

BACKUPIO, ASYNC_IO_COMPLETION (during backups) and Backupbuffer are all related with creating backups. Together with the assumption that you use the 3rd party tool that makes up for the xp waits We can see that 55% of the time that sql is waiting is for backups.

Now, I don't no what your normal load has been on this server. If you haven't done much apart from the backups, it could be perfectly normal 55% of the time that sql has been waiting is backup related.

I would suggest that you do the following:

1) run the waitstat query on dev server and store the results for later reference

2) clear the waitstat on the dev server

3) start the backup on the dev server and wait till it finishes

4) run waitstats query again and post th results

do the same 4 steps on production.

that way we can see the waitstats just assosiated with the backup process and have a good compare between the dev and prod waits.

Also, ask if their are any differences between the LUN for the dev and the LUN for production.

Different amount disk spindles, different SANs? , different raid? different fabric, different paths to SAN?

regards,

Edward

micber Old Hand Points: 356 More actions · Answer 2

working with SAN guys now to dig deeper.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 3

You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.

backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10

This will tell you how long it takes to read the data from the disk and if that is the problem.

If it is fast, then the problem is more than likely writing the backup to your backup file.

Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.

micber Old Hand Points: 356 More actions · Answer 4

Michael Valentine Jones (8/24/2012)
You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.
backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10
This will tell you how long it takes to read the data from the disk and if that is the problem.
If it is fast, then the problem is more than likely writing the backup to your backup file.
Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.

Now that is a really good suggestion!!

I will do so. My test that I mention at the beginning has copy_only so I am good with that.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

micber Old Hand Points: 356 More actions · Answer 5

OK. it is taking a long time on prod...

and here are the wait stats while it is running

type wait totalwait

BACKUPIO 988.943242 5746.281352

BACKUPBUFFER 990.484487 5674.775883

BACKUPTHREAD 0 8.825387371

BACKUP 0 0

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 6

I may have missed it so I have to ask... are both using the same recovery model? Are both pointed to the SAN using the same type of network cards? Are the backup areas for both servers configured the same way (IE raid level, etc)? Is the "path" from the two servers to the backup area the same? Is the size of the data the same?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

micber Old Hand Points: 356 More actions · Answer 7

Jeff Moden (8/24/2012)
I may have missed it so I have to ask... are both using the same recovery model? Are both pointed to the SAN using the same type of network cards? Are the backup areas for both servers configured the same way (IE raid level, etc)? Is the "path" from the two servers to the backup area the same? Is the size of the data the same?

Both are using same recover ability model

the VM is local storage I found out and prod is SAN.

I tried running a Copy_only on both and even tried local storage on prod and its still the same.

database is nightly copy from prod to dev so it is same size.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

micber Old Hand Points: 356 More actions · Answer 8

michael.berry 22479 (8/24/2012)
Michael Valentine Jones (8/24/2012)
You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.
backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10
This will tell you how long it takes to read the data from the disk and if that is the problem.
If it is fast, then the problem is more than likely writing the backup to your backup file.
Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.

It is slow on prod and fast on dev.....

even with no writing so I believe we can assume (always scared to) that it is not the write but the read.

I had the network guys monitor the SAN and before I kicked off the backup on prod the disk was at 2500 bits/read per second then when the backup occurred we jumped to 22 million bits per sec (22 GB per sec) so it seems the reads of the database are working... but still very very slow backup completion.

Now that is a really good suggestion!!

I will do so. My test that I mention at the beginning has copy_only so I am good with that.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

Edward Dortland Old Hand Points: 303 More actions · Answer 9

I think a key element in your reply is: vm uses local storage and your prod db uses SAN.

So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?

Are you backing up when everybody is backing up?

How many disks are underneath your database data LUN?

What is the queuedepth setting of the HBA of your server?

Just a couple of questions to keep you busy 😉

Have you got enough memory on your physical server? And have you set max server memory?

Edward

micber Old Hand Points: 356 More actions · Answer 10

Edward Dortland (8/28/2012)
I think a key element in your reply is: vm uses local storage and your prod db uses SAN.
So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?
Are you backing up when everybody is backing up?
How many disks are underneath your database data LUN?
What is the queuedepth setting of the HBA of your server?
Just a couple of questions to keep you busy 😉
Have you got enough memory on your physical server? And have you set max server memory?
Edward

I appreciate the response

above I said "I tried running a Copy_only on both and even tried local storage on prod and its still the same."

so I can stop looking at the SAN I believe for write.. now the issue apparently is read from db for the backup.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

Edward Dortland Old Hand Points: 303 More actions · Answer 11

I'm talking about your read. Your database file is still on the SAN lun right? Or did you also try and move your mdf to local disks?

Regarding tour backup software, you are using exactly the same versions on both servers?

Regards,

Edward

micber Old Hand Points: 356 More actions · Answer 12

Edward Dortland (8/28/2012)
I'm talking about your read. Your database file is still on the SAN lun right? Or did you also try and move your mdf to local disks?
Regarding tour backup software, you are using exactly the same versions on both servers?
Regards,
Edward

I have sent these questions to my server & storage admins and will get back to you. Thanks!

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

micber Old Hand Points: 356 More actions · Answer 13

micber (8/28/2012)
Edward Dortland (8/28/2012)
I think a key element in your reply is: vm uses local storage and your prod db uses SAN.
So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?
Are you backing up when everybody is backing up?
How many disks are underneath your database data LUN?
What is the queuedepth setting of the HBA of your server?
Just a couple of questions to keep you busy 😉
Have you got enough memory on your physical server? And have you set max server memory?
Edward
I appreciate the response
above I said "I tried running a Copy_only on both and even tried local storage on prod and its still the same."
so I can stop looking at the SAN I believe for write.. now the issue apparently is read from db for the backup.

The answers from my admin

The SQL data LUN (E drive) is connected to OURSERVERNAME over 8GB fibre channel. The LUN lives on our 500GB SATA aggregate which has 37 disks in it. The SQL data LUN shares the disk I/O of that aggregate between 5 total volumes I don't know for sure what the queue depth is set to on that HBA but I believe the default is 16.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 14

Since you are sharing spindles with other servers, looks like you need to have the SAN guys monitor what is happening during the backups. Looks like contention between serveral servers trying to use the same set of disks at the same time.

Edward Dortland Old Hand Points: 303 More actions · Answer 15

I agree with lynn, have your SAN admin look for iops overloading on that aggregate. Is it Netapp?

Meanwhile to see if it is truly the SAN, try restoring a back of the database to local disks and then make a backup to NUL from that local db. That would verify our theory.