Help with 'ACCESS_METHODS_BULK_ALLOC'

  • During scheduled reindexing this morning on a 64bit EE SQL2005 SP2 - 24 CPUs server we began receiving numerous timeout errors as follows.

    Timeout occurred while waiting for latch: class 'ACCESS_METHODS_BULK_ALLOC', id 0000000364B0FB90, type 2, Task 0x0000000003C6CB08 : 11, waittime 300, flags 0x1a, owning task 0x0000000003CEFD68. Continuing to wait.

    This is a new one for us and we cannot find much on the internet to explain what's happening. One of three reindexing jobs failed and produced a short stack dump, yet the process itself is still running despite a job failure. Strange. Must be orphaned and stuck.

    We will send this info to Microsoft on Monday, but can anyone help us understand the timeout. It may be due to parallelism given all of the CXPACKET and EXECSYNC, but it could also be disk since we may be having SAN problems.

    Thanks, Dave

  • We happened to have encountered a similar problem. Could you kindly share your findings?

  • Microsoft didn't have much information on ACCESS_METHODS_BULK_ALLOC. They said it is a class used to perform bulk extent allocations.

    With regards to EXECSYNC, they said it is another waittype related to parallelism. CXPACKET pertains to a parallel thread going into a sleep state while waiting on other parallel threads to complete their work. EXECSYNC is experienced when one of the parallel threads performs thread level synchronization with other parallel threads. An example of thread level synchronization would be the acquisition of a mutex or any other routine synchronization. When a unit of work or operation is divided among the parallel threads they need to share and sync information while they go about achieving their assigned work. EXECSYNC is a wait during this type of synchronization.

    Hope this helps.

  • Thanks for your prompt response and knowledge sharing, Dave.

    Actually, I was particularly interested in finding out more on the timeout for class 'ACCESS_METHODS_BULK_ALLOC'. We have seen in several occasion where the following errors intertwining each other and blocking a series of subsequence events in our SharePoint environment.

    1) Timeout occurred while waiting for latch: class 'ACCESS_METHODS_BULK_ALLOC',

    2) Timeout occurred while waiting for latch: class 'FGCB_ADD_REMOVE',

    Looking at the SQL server log, following the every first the 'ACCESS_METHODS_BULK_ALLOC' timeout error, a short stack dump was produced, yet the process itself was still running despite a job failure. The job continued to run and wait. And subsequently it blocked other processes. And it eventually led to too many sessions open in the pool and generated application errors. We're still trying to find a fix for this.

  • We, including Microsoft, suspected our problem was related to disk IO issues. It may be worth running perfmon to check your disks if you haven't already done so. That said, there may be other issues unrelated to disk bottlenecks that could lead to this type of problem. The Microsoft engineer we worked with couldn't find much information on ACCESS_METHODS_BULK_ALLOC.

    Dave

  • Books Online: sys.dm_os_latch_stats


    ACCESS_METHODS_BULK_ALLOC : Used to synchronize access within bulk allocators.

    That's about it for the official documentation!

    Two possible causes stand out:

    (1) A bug in SQL Server related to synchronization during high-parallelism bulk allocations

    (2) A problem with the storage sub-system

    That may be stating the obvious; however I will press on 🙂

    Fixes and workarounds for bugs which only occur with high degrees of parallelism are reasonably common in SQL Server patches - historically speaking. I would be tempted to retry the index operations with a much lower degree of parallelism. MAXDOP = 4 is often favoured for various reasons.

    If the problem still occurs, try MAXDOP = 1, just to eliminate parallelism as the cause of the problem. It is often possible to design a workable index maintenance strategy without parallelism - especially since you have Enterprise Edition and can take advantage of partition-level rebuilding (though that does remove the option to rebuild online).

    More often though, the cause will be rather more mundane: a physical storage sub-system problem, an out-dated driver, or a subtle mis-configuration of the server, fabric, or SAN. The very heavy demands placed on the wider system by a highly parallel index rebuild are great for exposing subtleties like this.

    After all, SQL Server is just as likely to be the victim here as it is perpetrator. Things like odd latch waits and stack dumps are frequently indicative of an unusual problem elsewhere in the system which SQL Server cannot handle gracefully.

    If it's not a physical problem in the SAN, very often an updated storport or other driver makes these sorts of problems go away. If you are running a certified system, check with the vendor to see if an updated driver pack is available. I can't stress enough how important it is to get the server build and all the associated bits and bobs running the latest stable software and drivers.

    Paul

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply