high writelog wait on OLTP system and fast SSD storage, log flush is slow

  • we have recently upgraded from SQL server 2008 to new servers and upgraded to sql server 2016 (SP1 CU3)

    new servers hardware hp proliant 580G8 , 4 socket intel xeon E7 processor, total 60 cores, 640GB ram, 8 intel SSD S3700 Raid 10, local

    the environment is a busy OLTP system with around 3K batch request/second mostly insert and to a lesser degree updates. as an OLTP system there is mostly 1 insert/transaction so there are alot of transaction commits. there are some reporting queries for system purposes(queue readers, monitoring, etc)

    the problem is log flush performance is slow and we get writelog waits.
    i have tested I/O and it can manage over 100K iops/sec in 4k random write test, sequential writes up to 2500MB/sec

    databases
    compatibility level = 130(sql server 2016)
    indirect checkpoint = on,
    VLF count = 80-120

    processor
    cpu usage of 3%-8% , average of 5%, maxed under 10% when log backups runs(compressed backups)
    total signal time/total wait time = 6% - 94% resource wait time

    memory
    PLE is about to tick every second, currently stands for 584000 secs
    there is very little page swap observable (pages/sec)

    this is the real problem i found
    "log flushes/sec" = 1000-2000 /sec
    "log flush waits time" = 1000-4000 ms/sec
    "log flush write time" = 0-1 ms/sec with spikes of 10-15 ms/sec
    "AVG disk sec/write" = 0
    "current disk queue" = 0 (with spikes of 2 every few minutes)
    "Disk idle time" = 90%-100%

    so it seems there's something wrong as sql server is unable to use ultra low latency that SSD array provides...

    i tried to disable log writer threads from cpu affinity mask , chris adkin wrote in his blog with no success, also used extended events to monitor log flushes, and saw multiple waits happens for a single log flush.

    i don't know if this issue exists on previous servers because on that platform I/O was kinda problem and "log flush write time" was alot higher...

    so did you ever see this issue in the field and what is the solution from now on
    thanks

  • The problem, I suspect, is that log flushes are synchronous. The log has to be written in order, so with 1000 flushes a second, even if you had a 1ms latency on the disk, you'd be seeing contention.

    Can you reduce the log flushes? Usually done by wrapping multiple small inserts/updates into transactions so that you have fewer flushes that are larger (log flushes when a transaction commits or the log buffer is full)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • msm_dev - Sunday, August 27, 2017 7:57 AM

    we have recently upgraded from SQL server 2008 to new servers and upgraded to sql server 2016 (SP1 CU3)

    new servers hardware hp proliant 580G8 , 4 socket intel xeon E7 processor, total 60 cores, 640GB ram, 8 intel SSD S3700 Raid 0, local

    the environment is a busy OLTP system with around 3K batch request/second mostly insert and to a lesser degree updates. as an OLTP system there is mostly 1 insert/transaction so there are alot of transaction commits. there are some reporting queries for system purposes(queue readers, monitoring, etc)

    the problem is log flush performance is slow and we get writelog waits.
    i have tested I/O and it can manage over 100K iops/sec in 4k random write test, sequential writes up to 2500MB/sec

    databases
    compatibility level = 130(sql server 2016)
    indirect checkpoint = on,
    VLF count = 80-120

    processor
    cpu usage of 3%-8% , average of 5%, maxed under 10% when log backups runs(compressed backups)
    total signal time/total wait time = 6% - 94% resource wait time

    memory
    PLE is about to tick every second, currently stands for 584000 secs
    there is very little page swap observable (pages/sec)

    this is the real problem i found
    "log flushes/sec" = 1000-2000 /sec
    "log flush waits time" = 1000-4000 ms/sec
    "log flush write time" = 0-1 ms/sec with spikes of 10-15 ms/sec
    "AVG disk write/sec" = 0
    "current disk queue" = 0 (with spikes of 2 every few minutes)
    "Disk idle time" = 90%-100%

    so it seems there's something wrong as sql server is unable to use ultra low latency that SSD array provides...

    i tried to disable log writer threads from cpu affinity mask , chris adkin wrote in his blog with no success, also used extended events to monitor log flushes, and saw multiple waits happens for a single log flush.

    i don't know if this issue exists on previous servers because on that platform I/O was kinda problem and "log flush write time" was alot higher...

    so did you ever see this issue in the field and what is the solution from now on
    thanks

    Please check this in conjunction with the IO LATCH waits , PAGEIOLATCH_XX , LATCH_XX etc . Also look at the HBA queue length setting on your SAN  , which you get from the SAN administrator . This needs to be increased if the disk bottlenecks / io ticks are higher (with io_pending 1 , meaning, its waiting to write to the storage) and are repeated for the same files , especially for the duration of performance of bottlenecks .  How are your physical files distributed on the storage ? are the data and log files stored separately ? how about the temp db file structure ?

    Please use the queries below to check the disk bottlenecks :

    Calculates average stalls per read, perwrite, and per total input/output
    --
    for each database file.

    SELECT
    DB_NAME(database_id) AS [Database Name] , file_id , io_stall_read_ms , num_of_reads ,CAST(io_stall_read_ms / ( 1.0 + num_of_reads ) AS NUMERIC(10, 1)) AS [avg_read_stall_ms] ,io_stall_write_ms , num_of_writes ,
    CAST(io_stall_write_ms / ( 1.0 + num_of_writes )
    AS NUMERIC(10, 1)) AS [avg_write_stall_ms] ,io_stall_read_ms + io_stall_write_ms AS [io_stalls] , num_of_reads + num_of_writes AS [total_io] , CAST((io_stall_read_ms + io_stall_write_ms ) / ( 1.0 + num_of_reads
    + num_of_writes)
    AS NUMERIC(10,1)) AS [avg_io_stall_ms] FROM sys.dm_io_virtual_file_stats(NULL, NULL)ORDER BY avg_I

     

    -- Look at pending I/Orequests by file

    SELECT
    DB_NAME(mf.database_id) AS [Database] , mf.physical_name ,r.io_pending , r.io_pending_ms_ticks, r.io_type , fs.num_of_reads , fs.num_of_writesFROMsys.dm_io_pending_io_requests AS r INNER JOIN sys.dm_io_virtual_file_stats(NULL, NULL) AS fs ON r.io_handle = fs.file_handleINNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
    AND fs.file_id = mf.file_idORDER BY r.io_pending ,r.io_pending_ms_ticks DESC ;

  • Gail,
    first, thanks for your answer
    maybe log flushes are synchronous but i think it's not alot. on similar(test) server with 2 disk raid 1 SSD i can reach 7K flushes/sec (insert only, restored database, same stored procedures on 5 tables) with very little wait (log flush wait time 0-50ms/sec)

    question 1:
    there is only one data volume on the server (log, data files on same volume, also tempDB), with nature of SSD , i decided to not seperate disks at all. with disk times so fast , can it be the culprit on our setup ?

    question 2:
    there are 6 databases on the server, 1 very active database, 2 semi active, 3 with low activity.
    can those 3 active databases can cause log write contention ?

    and no, it's not possible to put multiple small inserts into 1 transaction. this is a financial system. every transaction issued seperatly and must processed seperatly too

    something can shed some light on the problem. when i alter databases to delayed durability waits are gone as expected. when i turn it back to normal durability, for a minute or so, there are very low waits (log flush wait time/sec < 200ms/sec) and it takes sometime to grow to regular rates

  • SSC-Addicted,
    There is no SAN here, no HBA, etc... everything local. 8 SSD in a raid 10 array and 1 volume.
    as i stated in my first post this is a fast I/O subsystem. i ran your queries and nothing is back after multiple runs. i don't have I/O Latch wait types in my top 20 waits but writelog is my number 1

    i uploaded my wait stats and my I/O stalls (log files). from images it appears my average wait time for writelog is 800 micro seconds but my average I/O stall time for most active log file is 71 micro seconds.

    the system consist of 3 similar servers in always on availability groups asynchronous commit. we wanted to bring 2 servers in synchronous commit mode but with this amount of waits it's not possible in our mission critical system

  • msm_dev - Sunday, August 27, 2017 11:05 AM

    Gail,
    first, thanks for your answer
    maybe log flushes are synchronous but i think it's not alot. on similar(test) server with 2 disk raid 1 SSD i can reach 7K flushes/sec (insert only, restored database, same stored procedures on 5 tables) with very little wait (log flush wait time 0-50ms/sec)

    question 1:
    there is only one data volume on the server (log, data files on same volume, also tempDB), with nature of SSD , i decided to not seperate disks at all. with disk times so fast , can it be the culprit on our setup ?

    question 2:
    there are 6 databases on the server, 1 very active database, 2 semi active, 3 with low activity.
    can those 3 active databases can cause log write contention ?

    and no, it's not possible to put multiple small inserts into 1 transaction. this is a financial system. every transaction issued seperatly and must processed seperatly too

    something can shed some light on the problem. when i alter databases to delayed durability waits are gone as expected. when i turn it back to normal durability, for a minute or so, there are very low waits (log flush wait time/sec < 200ms/sec) and it takes sometime to grow to regular rates

    Your Question 1 seems to be to be the main culprit from my point of view.  I can imagine your TempDB is getting bashed quite hard as is the Transaction Log.

    You would definitely benefit from separating the three components (.mdf, .ldf and TempDB) not only onto seperate volumes but more importantly dedicating spindles to each one.  Here is a wonderful candidate for IO contention which I think maybe happening here.  Consider as well how many files you need for the TempDB.  If you don't know whether TempDB contention is occuring then a quick check in the log will confirm (or deny) it.

    Don't forget, you have a 2-disk RAID1 so only one spindle services the entire instance.  You won't be doing your SSD any favours....

    The databases themselves won't be the problem in their own right.  This is IMHO an problem in the configuration of the instance.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply