Q834628 doesnt seem to work

  • I am running SQL Server 2000 Ent Edition, sp3a, W2K3, 8 CPU, 8GB (max sql memory 6GB), /3GB. I applied hotfix Q834628, referenced here

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;838765

    It doesn't seem to help. I am still getting these sorts of errors (listed below, delimited by broken line). Any ideas?

    -----------

    2004-11-09 21:55:00.26 server Error: 17883, Severity: 1, State: 0

    2004-11-09 21:55:00.26 server The Scheduler 4 appears to be hung. SPID 210, ECID 0, UMS Context 0x06B73F38.

    -----------

    2004-11-10 21:34:35.16 spid1 Error: 823, Severity: 24, State: 4

    2004-11-10 21:34:35.16 spid1 I/O error 1450(Insufficient system resources exist to complete the requested service.) detected during write at offset 0x00000042530000 in file '<MDF file>'..

    -----------

    2004-11-13 17:05:25.17 spid2 LogWriter: Operating system error 1450(Insufficient system resources exist to complete the requested service.) encountered.

    2004-11-13 17:05:25.26 spid2 Write error during log flush. Shutting down server

    2004-11-13 17:05:25.55 spid167 Error: 9001, Severity: 21, State: 4

    2004-11-13 17:05:25.55 spid167 The log for database '<database>' is not available..

    2004-11-13 17:05:25.62 spid167 Error: 9001, Severity: 21, State: 1

    2004-11-13 17:05:25.62 spid167 The log for database '<database>' is not available..

    2004-11-13 17:05:25.68 spid167 Error: 9001, Severity: 21, State: 1

    2004-11-13 17:05:25.68 spid167 The log for database '<database>' is not available..

    2004-11-13 17:05:25.77 spid167 Error: 3314, Severity: 21, State: 4

    2004-11-13 17:05:25.77 spid167 Error while undoing logged operation in database '<database>'. Error at log record ID (32283:38320:3)..

    2004-11-13 17:05:25.79 spid167 Error: 9001, Severity: 21, State: 1

    2004-11-13 17:05:25.79 spid167 The log for database '<database>' is not available..

    2004-11-13 17:05:25.85 spid167 Error: 3314, Severity: 21, State: 5

    2004-11-13 17:05:25.85 spid167 Error while undoing logged operation in database '<database>'. Error at log record ID (32283:38320:1)..

    2004-11-13 17:05:26.63 spid22 Database '<database>' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information.

    2004-11-13 17:05:26.70 spid22 Database '<database>' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information.

    2004-11-13 17:05:27.23 spid22 Starting up database '<database>'.

    2004-11-13 17:05:34.53 spid22 Analysis of database '<database>' (8) is 100% complete (approximately 0 more seconds)

    2004-11-13 17:05:34.54 spid22 Recovery of database '<database>' (8) is 0% complete (approximately 8 more seconds) (Phase 2 of 3).

    2004-11-13 17:05:34.81 spid22 Recovery of database '<database>' (8) is 100% complete (approximately 0 more seconds) (Phase 2 of 3).

    -----------

    *Dump thread - spid = 106, PSS = 0x57a61220, EC = 0x57a61548

    *Stack Dump being sent to <location>\SQLDump0001.txt

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    * 11/13/04 19:24:30 spid 106

    *

    *

    *

    * Input Buffer 140 bytes -

    * dbcc dbreindex(N'[dbo].[<tablename>]', N'', 90, sorted_data_reorg)

    *

    ....

    2004-11-13 19:24:32.55 spid106 Stack Signature for the dump is 0x6E979C26

    2004-11-13 19:24:32.57 spid106 SQL Server Assertion: File: <buffer.c>, line=3745

    Failed Assertion = '!(bp->bdbid == dbid && ALL_ON (BUF_HASHED | BUF_CHECKWRITE | BUF_DIRTY, bufstat) && IS_OFF (BUF_IO, bufstat) && bp->bpage->GetXdesId () == xdesId)'.

    2004-11-13 19:24:32.64 spid106 Error: 3624, Severity: 20, State: 1.

  • mak,

    1) Check your torn page detection and your page chaining ( probably incorrect pointers )

    2) problems related to high end subsystems http://support.microsoft.com/default.aspx?scid=kb;EN-US;810885

    3) info on RollBack text operations with read UNCOMMITED http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q319851

    hope this pushes you forwards...

    GKramer

    The Netherlands

  • Although I never had any problems with this patch I might observe that I've had a few problems with the /3gb switch enabled. I'd try using /PAE but no /3Gb and see if it solves the problem ( probably not !! ) - it resolved some issues on a couple of servers for me.

    I have seen the unyielding scheduler, as far as I remember we eventually restarted the entire cluster.

    Are you connected to a SAN ?

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Do you still have Error 17883 and database marked suspect ?

  • For colin Leversuch-Roberts, we are not on a SAN.  We use a network attached storage device (NAS).

    For Jimmy Jen, we intermittently see the Error 17883 once in a while.   The database is not marked suspect; as you can see in the errorlog, it fully recovered about 9 seconds after going bad.

    Thanks for the /3GB option idea.  Ironically, everything I've read in Microsoft documentation implies to use this switch, but probably the OS needs the 2GB of memory instead of 1GB to manage the AWE virtual memory.  By the way, /PAE is not needed on Windows 2003 Server Enterprise Edition if you have it configured for hot-swap memory. We do not, so we use /PAE.

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;838765

     

    Thanks!

     

  • so is tempdb local or on the NAS ? - I have system databases on SAN but I'm not too sure about NAS - no experience, sorry.

    If you have the sytem dbs on the NAS maybe that is the problem.

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • We are removing the /3GB option. Here are some articles we found. We also found that the available PTEs under normal conditions when /3GB is off was about 180000. When /3GB is on, it dropped to about 10000.  Thus, it is pretty clear that under load, the OS kernel was starved for PTE availablility, which eventually led to all the Operating Sytem 1450 errors.

    http://groups.google.com/groups?q=pae%20pte%20sql%20server%20problem&hl=en&lr=&sa=N&tab=wg

    http://www.microsoft.com/whdc/system/platform/server/PAE/pae_os.mspx

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply