April 1, 2008 at 12:15 pm
This is a problem that has been going on for a couple of months now, on and off.
Symptom: Windows 2003 Server hangs for 5-10 minutes. Check the screen via RDP and there is an 'out of virtual memory' dialog on the screen.
Server Specs:
Windows Server 2003, SP2, x64 Edition
16gb Physical RAM
SQL Enterprise x64, SP2
C: Drive has 8gb Page file
D: Drive on SAN has 20gb Page file
SQL configured to use 8gb RAM
This server is a datawarehouse/reporting server, so is not particularly busy. Maybe 20-30 users on it at once, running various reports.
I have tried the following:
- Increase size of the page file, splitting it across two drives
- Explicitly setting the max SQL memory to 8192 mb
- defrag the c drive once a day @ 10pm (low to no users on)
- Stop and Restart SQL once a dat @ 5:30am after all ETL processing is complete
- Stop SQL and defrag D: drive with database files on it once a week.
What happens is that the server stops responding to query requests, and usually when I check SQL agent has also quit working somtime a few minutes before the failure. RDP won't work, and a remote 'shutdown' command also fails. RDP will eventually come up (~10 minutes later?) and using RDP with the /CONSOLE command shows the 'out of virtual memory' dialog still on the screen.
What should I be checking for? Other troubleshooting ideas?
April 1, 2008 at 8:04 pm
Any kind of file copy going on during this time? How often is the data warehouse being loaded and how is the load done?
You can check out this thread for some information about memory issues with 64-bit http://www.sqlservercentral.com/Forums/Topic463445-146-1.aspx
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 2, 2008 at 7:22 am
No file copying going on that I am aware of, no. The DW gets loaded once a day, between 3 and 5 am. I do a schedule SQL restart before and after the ETL load, so everything is done by 5am.
I will also check the other thread, thanks.
April 2, 2008 at 8:46 am
Wow, I think the defragging is overkill....
Enough with the editorializing though, does the service account for SQL have the right to Lock Pages in Memory? If not, grant it with grpedit.msc and reboot the server. Even though it is 64-bit and you don't need to enable AWE you still need to perform this step to access the full amount of memory.
What does your paging, processor, and queue length counters look like on the server?
- Tim Ford, SQL Server MVPhttp://www.sqlcruise.comhttp://www.thesqlagentman.com http://www.linkedin.com/in/timothyford
April 2, 2008 at 9:15 am
Tim has misspelled the Group Policy edit it should be gpedit.msc. Here is a link to a an MS articel about this: http://support.microsoft.com/kb/918483
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 2, 2008 at 9:28 am
Ah, my bad!
- Tim Ford, SQL Server MVPhttp://www.sqlcruise.comhttp://www.thesqlagentman.com http://www.linkedin.com/in/timothyford
April 2, 2008 at 9:32 am
Timothy Ford (4/2/2008)
Wow, I think the defragging is overkill....Enough with the editorializing though, does the service account for SQL have the right to Lock Pages in Memory? If not, grant it with grpedit.msc and reboot the server. Even though it is 64-bit and you don't need to enable AWE you still need to perform this step to access the full amount of memory.
Yes, the service account has those rights.
What does your paging, processor, and queue length counters look like on the server?
Not sure where I would check this, sorry. Can you be more specific?
April 2, 2008 at 9:36 am
Hi,
Any other applications installed on server? This sounds like a possible driver issue.
In addition, you state the D:\ drive is SAN based and contains another paging file. This is not an ideal configuration, as if the SAN load is I/O heavy, you may experience latency in moving data in and out of paging file. Your symptom of "Windows 2003 server hangs" seems to fit perfectly into the type of issue. I would suggest you try to use a local paging file only and see if the problem still persist.
HP note:
The memory paging system requires fast access to the pagefile.sys file. In SAN environments with heavy I/O loads, the access request for moving information to or from the paging file may be delayed. This can cause the operating system to halt, requiring a reboot to recover. Microsoft recommends that if paging errors occur, page files should be relocated to disks that are built into the server using its internal data paths.
Thanks,
Phillip Cox
April 2, 2008 at 1:36 pm
I will try expanding the page file on the local C: drive and removing it from the SAN drive to see if that helps. I just experienced another hang a few minutes ago.
The next step is to open a support case with Microsoft, and they want $259. 🙁
April 2, 2008 at 1:43 pm
Here is an error from the event application log just about the same time as the hang:
Event Type:Error
Event Source:MSSQLSERVER
Event Category:(2)
Event ID:17052
Date:4/2/2008
Time:3:25:41 PM
User:N/A
Computer:HTEDW01
Description:
The description for Event ID ( 17052 ) in Source ( MSSQLSERVER ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: SQLServerAgent Monitor: SQLServerAgent has terminated unexpectedly..
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply