June 24, 2009 at 4:28 am
Hi,
We have an issue with a publisher behaving a bit odd...
The setup:
Publisher: Win2003 with sql server 2005 (8 cores, 16 GB ram)
Subscriber: Win2008 with sql server2005 (16 cores, 128 GB ram)
Distributor: Same server as subscriber (push).
The published database is some 600 GB in size, with 267 articles (tables + 1 view), the only changes made to the data occur in batch jobs (once per day, deletes and inserts, some 10 million rows affected in total).
The problem we have is that one of the two batch jobs have gone up from approx. 2hrs to 12hrs execution time after we started replicating. This job deletes and inserts on four tables (from a Sun Solaris server, connects via Easysoft ODBC-ODBC Bridge).
The other job performs much the same actions, but is a local sql job, and on other tables. And on this job we see no difference in execution time.
Another odd thing is that all of a sudden differential database backups have slowed from 3 minutes to several hours... But only on the published database, others back up as normal.
And if this wasn't enough, I even removed the articles affected by the problem-job from subscription and publication , with no difference in performance... And I see no blocks anywhere, perfomance counters not showing any bottlenecks. Oh, and while the job is running, I can't open activity monitor (timeout).
I see some intermittent latency on the repl. monitor, but nothing out of the ordinary.
What on earth can cause this behaviour?
Regards,
Dagfinn Övstrud
June 25, 2009 at 9:18 am
June 26, 2009 at 12:58 am
Kendal Van Dyke (6/25/2009)
What recovery model was the DB in before you started replicating? And do your transaction logs sit on dedicated drives or are they on shared drives with your data files?
Simple mode. And the data/log files are on separate san luns (fibre channel). The disks are not likely the problem (especially when you consider the fact that the local job runs ok, and the actual data load on that is in fact greater than on the other job). Anywho, we are forced to stop the replication for now and go with ssis/data flow instead. Will probably do some more testing after the holidays, and try and replicate (no pun intended) the problem on the testservers. If all else fails, we'll open a case at Microsoft.
As I said, this is really bizarre. There is absolutely no sign of a bottleneck anywhere (monitored everything), apart from the server just going to a near halt when this particular job is running. I have no real insight to the specifics of the job (third party proc and servers, both the sun and the sql). It's either something with the job itself or some shady hardware issue, I can't think of anything else.
When I spoke to a person at this 3rd party co, he said something about truncate and insert. What would happen if you issue a truncate on a replicated table in sql 2005? Error raised? He has to be mistaken, right?
June 26, 2009 at 7:27 am
Truncate is not allowed on a table that is being replicated. It will just error out
-Roy
June 26, 2009 at 8:06 am
One difference that you'll see in the transaction logs with replication in play is that transactions are retained in the log until the log reader agent has picked up the changes. If you're doing massive amounts of change all of it has to be logged, and you may be seeing I/O problems as a result of the logging and potential log file growth. Do you shrink your transaction logs on the replicated database?
Just to clarify, you said your distributor is NOT the same as your publisher, right?
You cannot truncate replicated tables.
June 30, 2009 at 2:32 am
Kendal Van Dyke (6/26/2009)
One difference that you'll see in the transaction logs with replication in play is that transactions are retained in the log until the log reader agent has picked up the changes. If you're doing massive amounts of change all of it has to be logged, and you may be seeing I/O problems as a result of the logging and potential log file growth. Do you shrink your transaction logs on the replicated database?Just to clarify, you said your distributor is NOT the same as your publisher, right?
You cannot truncate replicated tables.
No IO problems, I assure you. SAN max throughput is 4Gb/sec, the problems lie elsewhere. The transaction log is not shrinked, it is set to a start size much larger than necessary for the logging (40GB).
I have tried shutting down the subscriber while the "good" job is runninng to create a back log of transactions. The job itself is not affected, and when the subscriber is back up it catches up within some 20 minutes.
And yes, the distributor is on the subscriber.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply