Rather critical perfomance issue

Question

Post reply

Rather critical perfomance issue

Randy-734603

SSC Enthusiast

Points: 111
More actions
February 4, 2009 at 5:47 am

#199898

First our configuration:
Windows server 2003 R2 fully updated (64-bit) 32 GB of RAM and 4 Intel Xeon MP 3.1 GHZ processors. (Cluster set up .. 2 nodes, equal machines) (lots of disk space on a high-speed SAN)
SQL 2000 (32-bit) with SP4
Recently due to hardware limitations, we had to move from a 32-bit OS and 32-bit SQL 2000 to our new cluster (mentioned above) ... now performance is terribly slow and the server's CPU's are constantly pegged at 90%.
We've failed over the cluster nodes to attempt to see if it was the node we were on failing, but that didn't help. We've updated statistics and watched the current activiy but we're at a loss.
I'm a realtively newbie DBA and hope someone has an idea on where we can look next.
Thanks.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Carolyn Richardson SSCrazy Eights Points: 8357 More actions · Answer 1

How did you do the migration? have the databases been reindexed after the migration?

Facts are stubborn things, but statistics are more pliable - Mark Twain
Carolyn
SQLServerSpecialists[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 2

we did a detach, copy and re-attach essentially.

we have updated statistics and re-indexed.

I am not opposed to doing it again, but my question is can that be done without locking out the user? To do it now would take a VERY long time and I can't shut the server down for any amount of time (we run 3 shifts against it).

Luke L SSC-Dedicated Points: 35547 More actions · Answer 3

Is it truly SQL Server that's pegging the CPU's or some other process?

Have you identified certain queries that may be taking a long time with a trace or profiler?

How does your IO look? Just cause you have a "High Speed SAN" doesn't mean your SAN Admin set it up properly.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 4

We have about 36 databases on that server (and thats all that runs on it) and all of them are slow. Also I don't have the numbers in front of me but I do know the disk I/O is low.

I've seen other posts about Hyperthreading and parrallelism, but the old server this was running on was hyperthreaded (albeit 32-bit) as well with no issue.

Luke L SSC-Dedicated Points: 35547 More actions · Answer 5

If there's even 1 query that's being repeated often enough which sucks up all the resources available to it, It will take all it can thus affecting your other databases as well. The only exception would be that you have them all separated into different instances with all kinds of resource governors in place.

Heh.. timely delivery of the simple-talk newsletter... Check out this article on SQL profiler and finding slow queries...

http://www.simple-talk.com/sql/performance/how-to-identify-slow-running-queries-with-sql-profiler/

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 6

Thanks all for the replies. So far I've been going through every possible setting and what not.

Right now here is where I am at:

Set the parallelism to 4 (default was 0 .. which used the 8 processors)

Had one query that was not used much with 4 million rows that wasn't using any indexes get corrected and this morning my CPU usage is averaging (per perfmon) 46% and the average latch watis/sec are averaging right around 300 (down from 3000) but for what seems are shorter periods of time.

My disk I/0 is good and the average lock timeouts/sec is right around 3 for an average.

I haven't heard anything from my end users about changes in performance yet .. but based off of the information from perfmon ... are those averages in "acceptable" ranges?

Luke L SSC-Dedicated Points: 35547 More actions · Answer 7

My understanding is that average lock timeouts/sec should be 0, and if it's not that means you have queries that are timeing out while waiting for locks... These should be evident in your trace as exceptions...

Have a gander at http://www.sqlservercentral.com/articles/Administering/performancemonitoringbasiccounters/1348/. It's mostly for SQL 2000 but includes links to other Monitoring articles that may help put some perspective on certain things...

Mostly though they all talk about counters being something that should be in a general range, and that you need to have a baseline first so that you have something to compare it to.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 8

Thanks for the info ... and actually this is on SQL 2000 so that article is appropriate.

I would have thought the locks should be at or close to zero .. and I'm fairly certain we have queries that just aren't right ... but the hardware it is on I would think should be adequate to handle those .. but not sure what I really should be seeing with lock/waits.

Luke L SSC-Dedicated Points: 35547 More actions · Answer 9

latch waits are fine and to be expected because of the transactional nature of your or really any database. But you need to have a base line for which to compare them to. If they are normally at 300/sec and all of a sudden jump to 30,000/sec for an extended period of time then there's a problem somewhere.

Do you have performance metrics from your old server prior to the migration? You might want to look at how your current numbers compare to those?

What does your buffer hit ratio look like, are you seeing a lot of recompiles etc?

To help us help you read this[/url]For better help with performance problems please read this[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 10

I don't have any benchmarks from the old server however, I had a query adjusted last night tht dropped the latch waits from 3000 to 160 today. But there are still performance issues going on.

My buffer hit ration right now is holding rather steady at 99.8

I am seeing a lot of recompiles .. is there something that can be done with that?

Luke L SSC-Dedicated Points: 35547 More actions · Answer 11

Also silly question but did you apply the post SP4 hotfix that fixes the way AWE uses memory? I can't remember the KB number off the top of my head...

To help us help you read this[/url]For better help with performance problems please read this[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 12

Hmmm .. I'm not certain ... (I wasn't the one who set this up) ......

I'll have to see if I can find this.

Luke L SSC-Dedicated Points: 35547 More actions · Answer 13

here's the KB Article...

http://support.microsoft.com/kb/899761

To help us help you read this[/url]For better help with performance problems please read this[/url]

Randy-734603 SSC Enthusiast Points: 111 More actions · Answer 14

I just checked and that hotfix is on.

Right now I'm at a loss .. I just don't see any reason why the performance should be so bad.