Blog Post

SQL MCM Training – Day 1

,

Today was my first day of SQL MCM training here in Bellevue. We started at 08:30am

and now (05:30pm) Kimberly is currently talking about how to use the provided VPC J.

The main theme for today's day was about learning the fundamental about performance

tuning. We have covered the following 3 modules:

  • SQL Server IO
  • Waits & Queues
  • IO – Core Concepts

In the morning Paul and Kimberly provided a foundation about how SQL Server is doing

IO and the inner architecture of SQL Server. There was a big overlap with the first

day of the first week (which I have attended in London), but this was "by design" J.

Afterwards Paul jump into Waits & Queues and he has shown a lot of different demos

about the various "well-known" wait types like CXPACKET, ASYNC_NETWORK_IO, WRITELOG,

etc. Paul also walked down to Latches and Spinlocks, which was really great. Afterwards

Jonathan kicked in and told us a lot of IO Core Concepts like the difference between

traditional hard disks and SSDs.

I've written down a lot of personal notes for the first day, and the great thing is

that I want to share all those notes (and all notes that are coming within the next

3 weeks) with you, so that you can get a picture about the content that we covered.

Also a big thanks to Paul, that he allows that! J

Module 1: SQL Server IO

  • File Header Page in the LDF is a mirror of the Boot Page of the MDF
    • SQL Server can write the File Header Page in parallel but NOT the VLFs
    • SQL Server has to read all File Header Pages during startup, therefore it is not recommended

      to have too much databases on a SQL Server instance

  • DEFAULT file group should be changed so that new objects go automatically to another

    file group instead of PRIMARY file group

  • When IFI (Instant File Initialization) is enabled you can have a larger Auto Growth

    setting instead of when IFI is not enabled

  • When Auto Growth for the Log File kicks in, NO additional log records can be

    flushed to disk and therefore have to wait

    • But only for the database where Auto Growth kicked in, and NOT for the WHOLE instance
  • FGCB_ADD_REMOVE Latch (File Group Control Block)
    • Needed for add/remove files, and Auto Growth
    • Can be checked if Auto Growth is a performance issue
  • The IO subsystem can intersect the partition alignment of Windows Server 2008
    • The partition can be therefore ALSO misaligned!!!
  • File System Level Defragmentation
    • Shutdown SQL Server first - corruption can occur!
    • Gains maybe 1 - 2% performance (when doing large range scans)
  • Reads from the TxLog can be sequential or random
    • Sequential
      • Log Backup
      • Log Reader Agent for Replication
    • Random
      • Crash Recovery
      • Rollback
    • Log Records are chained together backwards through their LSN
    • Rollback always goes backward, therefore Random IO
    • Redo goes forward
    • Rollback goes backward
  • Minimally Logged Operation
    • Changed Data Pages are flushed to disk during the COMMIT
    • The flush is Extent based
  • Allocation Weightings
    • Weightings are per filegroup
    • One file always has a weighting of 1
    • Weightings are recalculated
      • Grown, Shrunk, Added, Removed or 8192 allocations took place
    • During an allocation the weighting is reduced by 1, e.g. 8192 to 8191
    • FGCB_ADD_REMOVE Latch when weightings are recalculated
    • Proportional Fill doesn't work on TempDb
    • Can be tracked through Extended Events - see Jonathans Blog
  • TF 1117
  • Backups/Restores
    • Backups are doing sequential IO
    • Backups are not going through the Buffer Pool
      • They are going directly to the data files
    • Fast Recovery
      • Only available on Enterprise Edition
      • Database is accessible after the REDO phase, and before the UNDO phase begins
    • Separate threads when you do backup striping
  • TempDb
  • Perfmon counter "Disk Queue Length" is not relevant
    • Outstanding IOs are ok, just look at the latency times from DMV sys.dm_io_virtual_file_stats

Module 2: Waits & Queues

  • Waits are just symthoms
  • LCK_M_S: Lock Mode Shared
  • APPEND_ONLY_STORAGE_FIRST_ALLOC
    • Latch on the Version Store
  • FGCB_PRP_FILL: proportional fill latch
  • Dead Latches are possible
    • Bugs inside SQL Server
  • Dead Spins are not possible
    • You can't acquire a Spin, when it is already acquired
  • When a Spinlock is acquired, nothing else can't be latched or locked
  • SUSPENDED threads are not a ordered list, anyone can be signaled to go into the RUNNABLE

    state

  • RUNNABLE is a FIFO queue, expect when Resoure Governor is used
    • Workload starvation could be theoretical possible for low priority workload
  • Propagated Page Split
    • Scans from the Virtual Root down to ensure that the Shared Latches are present
  • CXPACKET
    • Parallel Index Rebuild uses the statistics to determine how to divide the work
    • Resource Governor can be used to set MAXDOP for resource pools
      • DBCC CHECKDB can't use MAXDOP query hint, so Resource Governor can be used with a

        dedicated resource pool

  • OLEDB
    • DBCC CHECKDB uses OLEDB internally
    • Microsoft Office (Access, Excel)
  • WRITELOG
    • SSD: 1 - 2ms
    • Traditional Hard Disk: < 10ms
  • Reasons for CXPACKET
    • Physical Cores are a little bit faster than logical cores, therefore CXPACKET

      waits also can occur

      • Therefore Hyper Threading is not recommend for SQL Server
    • Scan of partitioned tables, when one partition is on a faster storage system than

      another partition

    • Out-Of-State Statistics
  • Service Broker Wait Types
  • Preemptive Wait Types of Win32 IO are indicating that FILESTREAM is used
  • System table contentation can be removed if you use multiple insertion points

    (like in TempDb)

    • Create a dedicated table for each SPID in TempDb (not a temp table)

Module 3: IO Core Concepts

  • "General" SQL Server Latency recommendation
    • < 8ms: Excellent
    • < 12ms: Good
    • < 20ms: Fair
    • > 20ms: Poor
  • Parallel disk head positions are not possible
    • All disk heads are positioned on the same position
  • Short Stroking
    • Storing data in the outer tracks because of the higher data density and lower seek

      times

    • Disks are storing data from the outer side to the inner side
  • SSD Seek Times
    • From 0.0ms to 0.16ms
  • RAID1 can improve read performance by reading from each disk
  • RAID10 and RAID01 provides the same performance, but a different redundancy
  • RAID01
    • 1 drive failure fails the whole RAID 0 array

Thanks for reading

-Klaus

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating