x64 SQL 2k5 server running out of Virtual Memory

  • This is a problem that has been going on for a couple of months now, on and off.

    Symptom: Windows 2003 Server hangs for 5-10 minutes. Check the screen via RDP and there is an 'out of virtual memory' dialog on the screen.

    Server Specs:

    Windows Server 2003, SP2, x64 Edition

    16gb Physical RAM

    SQL Enterprise x64, SP2

    C: Drive has 8gb Page file

    D: Drive on SAN has 20gb Page file

    SQL configured to use 8gb RAM

    This server is a datawarehouse/reporting server, so is not particularly busy. Maybe 20-30 users on it at once, running various reports.

    I have tried the following:

    - Increase size of the page file, splitting it across two drives

    - Explicitly setting the max SQL memory to 8192 mb

    - defrag the c drive once a day @ 10pm (low to no users on)

    - Stop and Restart SQL once a dat @ 5:30am after all ETL processing is complete

    - Stop SQL and defrag D: drive with database files on it once a week.

    What happens is that the server stops responding to query requests, and usually when I check SQL agent has also quit working somtime a few minutes before the failure. RDP won't work, and a remote 'shutdown' command also fails. RDP will eventually come up (~10 minutes later?) and using RDP with the /CONSOLE command shows the 'out of virtual memory' dialog still on the screen.

    What should I be checking for? Other troubleshooting ideas?

  • Any kind of file copy going on during this time? How often is the data warehouse being loaded and how is the load done?

    You can check out this thread for some information about memory issues with 64-bit http://www.sqlservercentral.com/Forums/Topic463445-146-1.aspx

  • No file copying going on that I am aware of, no. The DW gets loaded once a day, between 3 and 5 am. I do a schedule SQL restart before and after the ETL load, so everything is done by 5am.

    I will also check the other thread, thanks.

  • Wow, I think the defragging is overkill....

    Enough with the editorializing though, does the service account for SQL have the right to Lock Pages in Memory? If not, grant it with grpedit.msc and reboot the server. Even though it is 64-bit and you don't need to enable AWE you still need to perform this step to access the full amount of memory.

    What does your paging, processor, and queue length counters look like on the server?

    - Tim Ford, SQL Server MVPhttp://www.sqlcruise.comhttp://www.thesqlagentman.com http://www.linkedin.com/in/timothyford

  • Tim has misspelled the Group Policy edit it should be gpedit.msc. Here is a link to a an MS articel about this: http://support.microsoft.com/kb/918483

  • Ah, my bad!

    - Tim Ford, SQL Server MVPhttp://www.sqlcruise.comhttp://www.thesqlagentman.com http://www.linkedin.com/in/timothyford

  • Timothy Ford (4/2/2008)


    Wow, I think the defragging is overkill....

    Enough with the editorializing though, does the service account for SQL have the right to Lock Pages in Memory? If not, grant it with grpedit.msc and reboot the server. Even though it is 64-bit and you don't need to enable AWE you still need to perform this step to access the full amount of memory.

    Yes, the service account has those rights.

    What does your paging, processor, and queue length counters look like on the server?

    Not sure where I would check this, sorry. Can you be more specific?

  • Hi,

    Any other applications installed on server? This sounds like a possible driver issue.

    In addition, you state the D:\ drive is SAN based and contains another paging file. This is not an ideal configuration, as if the SAN load is I/O heavy, you may experience latency in moving data in and out of paging file. Your symptom of "Windows 2003 server hangs" seems to fit perfectly into the type of issue. I would suggest you try to use a local paging file only and see if the problem still persist.

    HP note:

    The memory paging system requires fast access to the pagefile.sys file. In SAN environments with heavy I/O loads, the access request for moving information to or from the paging file may be delayed. This can cause the operating system to halt, requiring a reboot to recover. Microsoft recommends that if paging errors occur, page files should be relocated to disks that are built into the server using its internal data paths.

    Thanks,

    Phillip Cox

  • I will try expanding the page file on the local C: drive and removing it from the SAN drive to see if that helps. I just experienced another hang a few minutes ago.

    The next step is to open a support case with Microsoft, and they want $259. 🙁

  • Here is an error from the event application log just about the same time as the hang:

    Event Type:Error

    Event Source:MSSQLSERVER

    Event Category:(2)

    Event ID:17052

    Date:4/2/2008

    Time:3:25:41 PM

    User:N/A

    Computer:HTEDW01

    Description:

    The description for Event ID ( 17052 ) in Source ( MSSQLSERVER ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: SQLServerAgent Monitor: SQLServerAgent has terminated unexpectedly..

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply