June 10, 2008 at 9:45 am
Hi, I was having difficulty getting snapshots done. Some publications that only have a few small articles and that used to finish snapshotting in a few seconds just hang. I got 'The process is running and is waiting for a response from the server'. I got every other runnable processes disconnected, thinking it was because of the contention on the server, but did not work. Btw, our publisher and distributor are on the same server. Which I guess should not matter, because the snapshotting worked fine two days ago.
Thank you for any insight on this.
Kathleen
June 10, 2008 at 10:09 am
Kathy,
Please check sql server memory and processor utilization(for sqlservr.exe as well as for os level processes) counters under perfmon.
Manu
June 10, 2008 at 10:15 am
there are several replication agents
LogReader Agentfrom LDF -> distribution db
Snapshot Agentgenerate the flat-files (e.g. *.bcp)
Distribution Agentdeploy initial snapshots, then subsequent changes ex distribution db
where you are actually commenting on speed of the DA and not the Snapshot ??
it is possible that you have hit a resource limit (check db+tlog sizes against AutoGrow max's)
anyway, the agents are *.EXE that run at the Distributor and connect to the various db's
- the DA can be quick to do a Bulk Insert to sub db, but then take ages to create PKs etc
I suggest that you use sp_who2 or activity monitor to check again for blocking
- failing that identify the DA's SPID and use DBCC inputbuffer(spid) to find what its doing
BTW there is a dedicated form for replication matters so I suggest you post questions there to maximise received wisdom !
HTH
Dick
June 10, 2008 at 10:16 am
Thanks, Manu,
I will do that then.
But since it is a dedicated SQL box, with no other active connections to it, I would think the CPU should be low and memory for sql should stay where they were before ( I give 29 GB out of total 32GB to SQL, and I believe SQL memory stays at the max they are allowed).
Thanks for the input.
June 10, 2008 at 10:31 am
Dick, thanks for the reply.
I am pretty sure its the snapshot agent being slow, because viewing the agent detail, it says
....
[0%] bulk copied snapshot data for article...
...
[15%] bulk copied...
..
[99%] the process is running is waiting for a response from the server
[99%] the process is running is waiting for a response from the server
[99%] the process is running is waiting for a response from the server
or some other publication it might say
[0%] the process is running is waiting for a response from the server
and it will re-try and re-try.
For those that eventually went thru, it took much more time than it usually is.
I forgot to run my blocker process to check on blocking processes. Thank you for reminding me.
I will post my message on the right forum. Thank you!
June 10, 2008 at 10:59 am
Hi Dick, I have a question about the database autogrowing. I understand that when snapshots happen, a bulk copy happen on the publisher, then the scripts get posted to snapshot folders, does the distribution agent read from the folders, in order to distribute the commands to the subscribers? If so, the distribution database should not grow much, since the scripts are stored in the file system, not the distribution database?
Thanks,
Kathleen
June 10, 2008 at 11:13 am
that is correct, the Snapshot writes flat-files on the Distributor(?) which the DA's then read
- there are doubtless very few rows written to tables in distribution db
worth checking that you have enough diskspace to accommodate all the .bcp files
- you should also be running a respectable patch level (eg SP2 is 9.0.3159) and not RTM !
best of luck checking blocking, space etc
Dick
June 10, 2008 at 11:26 am
Thanks again, Dick. I just checked-- we do have enough space for the distribution database.
I have one more question ( I am asking for too much). So the whole snapshot process include basically two steps, one to bulk copythe tables (run by snapshot agent), second to write the commands to the snapshot folder. If the whole process is slow, it could be either bulk copying is slow ( so memory is in demand?) or could be writing to the folder is slow ( could be disk io problem). So these two resources are potential bottlenecks?
June 11, 2008 at 2:07 am
Thanks again, Dick. I just checked-- we do have enough space for the distribution database.
I have one more question ( I am asking for too much). So the whole snapshot process include basically two steps, one to bulk copythe tables (run by snapshot agent), second to write the commands to the snapshot folder. If the whole process is slow, it could be either bulk copying is slow ( so memory is in demand?) or could be writing to the folder is slow ( could be disk io problem). So these two resources are potential bottlenecks?
It is the Snapshot process that reads the [entire] table(s) from published database and writes the flat-files containing
.prepreparation (drop constraints, tables)
.schtable definition and repl sprocs
.bcpdata
.idxindex
and you should find this in folder like (here's one from my system as example)
D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\repldata\unc\UATDB03$ENV3_BOND_BOND FOR DB03\20080611063794
and then will write related entries in distribution db
it is the DA that actually
reads those files
delivers to each sub
deletes the files [usually]
and using replication monitor you would see [double-click the entries and set auto-refresh] what each is doing
- DA probably says "waiting for Snapshot" or somesuch
so in your case it does look like the Snapshot Agent is the one having trouble completing so DAs have nothing to do [yet]
I suggest you look at flat-files and see what most recent file is (sort by modified date) as a clue where it's stuck.
Check all NTFS permissions and that agent has permissions to write to distribution
- I recommend making the agent's domain login to be sysadmin as TEMPORARY measure
HTH
Dick
June 13, 2008 at 9:16 am
Dick, thanks for all your help.
I was able to get the snapshots done. I found the snapshot agent process got a waittype 'sos_scheduler_yield'. I figured the contention is on sql server agent scheduler threads, because the CPU was low between 10%-15%, and we have tons of jobs running (including logreader and distribution agents for other databases), so I disabled those running ones and snapshots went really fast.
Thanks again. I learned a lot from you.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply