If you are dealing with a SQL Server which got a lot of memory, you may encounter the out of memory (OOM) error because of low “Available Virtual Memory”.
Recently, I assigned more than 820GB memory to one of the SQL Server instances. The instance was dealing with a significant reporting workload. After a couple of days, the SQL Server instance got hung with error “Insufficient System Memory”.
Here is the physical machine details – IBM (Lenovo) x3850 X6, 64 cores, 1TB RAM.
Problem
After pumping a lot of the memory to a SQL Server, you may end up by seeing “Error 701” in the error log of SQL Server. The error can be generated by many reasons. For example;
- The SQL Server instance really needs more physical memory because it got very aggressive and heavy workload.
- If you do ten of thousands of inserts in a single batch (Applies to SQL Server 2005 & 2008, but you need to check with Microsoft for other versions)
- SQL Server virtual address space got exhausted
In my scenario, I was not sure what caused SQL Server to throw the error that’s where the finding started.
Findings
To figure out the cause of the error 701, I started analyzing the SQL Server error log file.
- Here is the complete error which I found in the error log
2016-11-30 04:27:59.42 spid30284 Failed allocate pages : FAIL_PAGE_ALLOCATION 207
2016-11-30 04:28:19.42 spid30284 Error: 701, Severity: 17, State: 123.
2016-11-30 04:29:00.12 spid30284 There is insufficient system memory in resource pool ‘default’ to run this query.
- When I further looked up at the error log, the above error was logged multiple times with the output of DBCC MEMORYSTATUS that is automatically logged to the error log on 701 error messages
Process/System Counts Value ---------------------------------------- ---------- Available Physical Memory 135581184896 ---->126GB Available Virtual Memory 17179169280------->15GB Available Paging File 337114435584 Working Set 44043255808 Percent of Committed Memory in WS 100 Page Faults 26218401 System physical memory high 0 System physical memory low 0 Process physical memory low 0 Process virtual memory low 0 2016-02-16 04:28:00.10 spid2284 Memory Manager KB
- It also generated a SQL Server memory dump for “Non-yielding” process.
2016-11-30 04:32:42.12 Server **Dump thread – spid = 0, EC = 0x0000000000000000
2016-11-30 04:32:42.12 Server ***Stack Dump being sent to XXXXXX\MSSQL\XXXX\SQLDump0001.txt
2016-11-30 04:32:42.12 Server * *******************************************************************************
2016-11-30 04:32:42.12 Server *
2016-11-30 04:32:42.12 Server * BEGIN STACK DUMP:
2016-11-30 04:32:42.12 Server * 2016-11-30 04:32:42 spid 31092
2016-11-30 04:32:42.12 Server *
2016-11-30 04:32:42.12 Server * Non-yielding IOCP Listener
- The wait type SOS_MEMORY_TOPLEVELBLOCKALLOCATOR was part of the top three wait types on the SQL Server.
Observation
- The output of DBCC MEMORYSTATUS was logged 8 times in the error log and each time the “Available Physical Memory” was showing greater than 100GB but the “Available Virtual Memory” was almost equal to 15GB. Kindly refer the below details;
Note:
The counter “Available Virtual Memory” tells how much virtual memory is available at OS Level. By default, the X64 OS supports 8TB of the virtual memory. The Windows OS runs every process, including the SQL Server process, in its own dedicated area of virtual memory, known as the Virtual Address Space (VAS).
- To get insight from the generated SQL memory dump, I further analyzed the non-yielding scheduler dump using Public symbol;
.cxr sqlmin!g_copiedStackInfo+0X20
*** WARNING: Unable to verify timestamp for sqlmin.dll
rax=0000000000000000 rbx=00000000f105eb48 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000077661190 rdi=0000000000000002
rip=00000000774ebd6a rsp=00000000f105ea40 rbp=00000000f105eac8
r8=00000000f105f4c8 r9=00000000ffffffff r10=0000000000554a90
r11=00000000f105f2e8 r12=00000000f105eb48 r13=0000000000145000
r14=00000000f105eaa8 r15=0000000000000020
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246
ntdll!ZwQueryVirtualMemory+0xa:
00000000774ebd6a c3 ret
0:045> k100
# Child-SP RetAddr Call Site
00 00000000f105ea40 00000000`772bfcb1 ntdll!ZwQueryVirtualMemory+0xa
Note:
The dump output also indicates that it was calling virtual memory allocator “ZwAllocateVirtualMemory”. To know, the “ZwQueryVirtualMemory” exists for what kindly refer the link
https://msdn.microsoft.com/en-us/library/windows/hardware/ff566416(v=vs.85).aspx
Conclusion
The Windows Server was continuously experiencing low “Available Virtual Memory” (AVM was 15GB instead of 8TB). As the available virtual address space was very low, queries had to wait longer to perform memory allocation. Sometimes SQL Server could not perform memory allocation at all and ended up by throwing the error 701.
The repeated allocation pattern under such condition might have caused fragmentation of the memory blocks and consumption of virtual address space. As the SQL Server reported the error many times, it means the processes got repeated many times which exhausted the virtual address space of the server. Therefore, even we had enough physical memory available on the server but could not login into the server because of the virtual memory crunch.
It confirms that the issue happened because of the “Available Virtual Memory”
Solution
- As per KB3074434, Microsoft recommends to apply Cumulative Update for SQL Server;
– Cumulative Update 4 for SQL Server 2014 SP1
– Cumulative Update 10 for SQL Server 2014
– Cumulative Update 8 for SQL Server 2012 SP2
- Once you apply the Cumulative Update, you need to enable the trace flag T8075.
Additional information
What is the purpose of the trace flag 8075?
It helps eliminate VAS fragmentation issue by caching large memory blocks to minimize the VAS fragmentation.
Is the trace flag only available after SQL Server 2012 SP2 CU8 or Can I apply on the server which is running only SQL Server 2012 RTM or an older version?
The trace flag wasn’t available before versions specified in the KB3074434 article. For example, it was not available in the SQL Server 2012 SP1 CU7.
Do I still need to enable the trace flag 8075 even we have applied SQL Server 2012 SP3 CU8?
Yes, it applies to 2012/2014 versions specified in the KB3074434 article
Do I need to apply on all the server irrespective we see high SOS_MEMORY_TOPLEVELBLOCKALLOCATOR waits or not?
Yeah, it would make sense to apply in particular the bigger machines.
Do I need to apply this trace flag in SQL Server 2016 also?
It is enabled by default in SQL 2016 so you won’t require it in SQL Server 2016.
Is there performance impact of the trace flag?
There is no negative impact of trace flag 8075
Reference Link
- How to play with SQL Server mdmp dump refer the link – https://mssqlwiki.com/tag/sqlserver-mdmp/
- MSDN Forum Link – https://social.msdn.microsoft.com/Forums/sqlserver/en-US/0ae3395f-4517-4838-82b7-9b411faef8e0/what-is-use-of-trace-flag-t8075?forum=sqldisasterrecovery
Special thanks to Microsoft Program Managers – Denzil Ribeiro and Parikshit Savjani for guiding on this!
Thanks for reading!!
The post Out of memory error because of the very low virtual address space – Error 701 – Trace Flag 8075 appeared first on .