July 27, 2009 at 7:22 am
SQL Server 2005 sp 3 running on windows 2003 sp2 with 4gb ram and 4 dual-core opteron 2.4ghz processors.
After about 1 week of server uptime, CPU rises to nearly 100% (around 95% constantly until it locks up completely) and comes to a crawl. This is a dedicated database server, there are no other services/applications running (only sql server). perfmon shows resource monitor consuming most of cpu time. The number of connections to the server stays relatively the same and doesn't change whether CPU usage is 5% or 95%. Running a profiler against the server shows a high ratio of sp:cachemiss to sp:cachehit events ( about 10:1) on one particular database. The sql statements from the database are not parameterized and don't use schema names. I'm working with the developer of the app on changing this (missing schema names + parameters).
One thing that puzzles me, though, is that exact same queries generate cachehit after service reboot under exact same server load and there's barely any cpu usage at all for several days.
Any pointers on where (what) I should be looking at?
July 28, 2009 at 4:37 am
It could be many things, but assuming you haven't forgotten to mention anything really important, one possibility is that the problem relates to one or more caches getting a bit out of control.
Run the following query at regular intervals (say once every five minutes) and save the results to a table. This will show if there is a problem with one or more of the caches growing too large.
It would be interesting to see whether you see a pattern emerging.
SELECTsnapshot_time = CURRENT_TIMESTAMP,
name,
cache_size_mb = SUM(single_pages_kb + multi_pages_kb) / 1024
FROMsys.dm_os_memory_clerks
WHEREname IN (N'TokenAndPermUserStore', N'SQL Plans', N'Object Plans')
GROUPBY name
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 28, 2009 at 6:35 am
Thank you! I'll post with the results and/or any findings in a couple of days once i have more information.
July 28, 2009 at 7:20 am
32 bit or x64 bit?
Did you limit the memory SQL server can use? Could be the OS and SQL fighting for memory.
Greg E
July 28, 2009 at 7:40 am
Cool. Just in case it turns out to be useful, have a read of the following article by the CSS SQL Server Team if you get a moment:
It would probably be worthwhile monitoring things like buffer cache page life expectancy too, as the server does seem a little light on memory overall too.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 28, 2009 at 7:52 am
Greg Edwards (7/28/2009)
32 bit or x64 bit?Did you limit the memory SQL server can use? Could be the OS and SQL fighting for memory.
Greg E
Just read your post Greg. Could be that, but I suspect it less since the server is dedicated to SQL Server. Of course there may be things we haven't been told about going on 🙂
Sure does sound like memory pressure, given how hard the Resource Monitor (Lazy Writer) is working. Of course it could simply be a case of having too little memory in the box (1GB per core isn't much), but it won't hurt to eliminate other common causes - like the TokenAndPerm store problem.
Binko could help us a lot by providing much more detail about the server and how it is currently configured!
@binko: if you read this (and at the risk of posting links to the entire internet), there is a pretty cool article on TechNet which covers this area pretty well: http://technet.microsoft.com/en-us/magazine/2007.10.sqlcpu.aspx
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 28, 2009 at 8:45 am
Good link Paul.
We run both SQL and SSAS on the same box. It was a real balancing act - SSAS, SQL, and the OS can be battle of the titans.
Doesn't the default install put 80% of memory for SQL?
80% of 4GB is 3.2GB - which might be a bit short for the OS.
Just a thought.
Greg E
July 28, 2009 at 11:20 am
Greg Edwards (7/28/2009)
32 bit or x64 bit?Did you limit the memory SQL server can use? Could be the OS and SQL fighting for memory.
Greg E
32 bit with PAE/AWE enabled;
3Gbs ram max for sql server;
running on a virual server.
typical payload on that server is simple insert/update/select statements, nothing too complicated. all of the databases have all of the needed indexes, and there are no unused indexes on tables. I looked through queries that are ran against the server, and there are several that are executed awefully often. I'm checking to see if that frequency is really necessary.
Paul,
I scheduled your query to run every 5 minutes. we'll see what happens once the server reaches that critical point (happens every 10 days or so; the service was restarted last friday).
July 28, 2009 at 5:09 pm
Good stuff Greg and binko - look forward to seeing the results.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 30, 2009 at 9:28 am
here're the results:
July 30, 2009 at 9:42 am
I found the root of the problem. One of the applications that was using the database server ( a ticketing system ) has ticket as an object. the problem with that is that in order to refresh datagrid showing tickets and related information, a separate query is issued for every ticket on a regular basis. Every couple of seconds for every user, to be more specific. None of the queries are parametarized and don't use schema names. the server would end up spending most of it's time recompiling exact same queries over and over again.
we reduced refresh rate to several minutes, and the server has been running at under 10% cpu usage for the past couple of days. the developers will revise the ticketing application so that it doesn't issue a separate sql statement for every ticket in a datagrid.
July 30, 2009 at 11:58 am
Looks like you found something that 'worked great in test' but failed to scale to the real world.
The assumption of unlimited resources never seems to work out too well.
Great job, and good explaination of the root problem.
Greg E
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply