System shutdown

  • Our production server unexpectedly shutdown last Saturday and database was down for 3 minutes.

    Anybody faced this ACR event issue? If so can you please let me know the chances that something like this could happen again, & if there is any action(s) that we can take to prevent it from happening again.

    I looked at HP Proliant Management Viewer and it shows that because of HP ProLiant's ASR (Automatic Server Recovery) feature, the server was rebooted.

    See log below:

    The system has rebooted from a Automatic Server Recovery (ASR) event.

    User Action

    Determine the nature of the Automatic Server Recovery (ASR) event, and take corrective action.

    WBEM Indication Properties

    AlertingElementFormat: 0 0 (Unknown)

    AlertType: 5 0x5 (Device Alert)

    Description: "The system has rebooted from a Automatic Server Recovery (ASR) event."

    EventCategory: 16 0x10 (System Power)

    EventID: "1"

    ImpactedDomain: 4 0x4 (System)

    IndicationTime: "20110319160206.810000-240"

    OSType: 69 0x45 (Microsoft Windows Server 2003)

    OSVersion: "5.2.3790"

    PerceivedSeverity: 5 0x5 (Major)

    ProbableCause: 111 0x6f (Timeout)

    ProbableCauseDescription: "ASR Reboot Occurred"

    ProviderName: "HP Recovery"

    ProviderVersion:

    RecommendedActions[0]: "Determine the nature of the Automatic Server Recovery (ASR) event, and take corrective action."

    Summary: "ASR reboot occurred"

    SystemCreationClassName: "HP_WinComputerSystem"

    For more information, please contact HP Support.

    ----------------------

    The ASR feature is a hardware-based timer.

    The ASR Timeout option sets a timeout limit for resetting a server that is not responding. When the server has not responded in the selected amount of time, the server automatically resets.

    Events which may contribute to the operating system locking up include:

    A peripheral device − such as a Peripheral Component Interconnect Specification (PCI) adapter − that generates numerous spurious interrupts when it fails.

    A high priority software application consumes all the available central processing unit (CPU) cycles and does not allow the operating system scheduler to run the ASR timer reset process.

    A software or kernel application consumes all available memory, including the virtual memory space (for example, swap). This may cause the operating system scheduler to cease functioning.

    A critical operating system component, such as a file system, fails and causes the operating system scheduler to cease functioning.

    Any other event besides an ASR timeout that causes a Non-Maskable Interrupt (NMI) to be generated.

    http://h20000.www2.hp.com/bizsupport/TechSupport/document.jsp?objectID=c01158873&lang=en&cc=us&taskId=101&prodSeriesId=428936

    Thank you for you help.

  • Based on the event that caused the ASR I would say you have a hardware issue. There is nothing you can do to prevent this, hardware failure will always be a reality of IT.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply