As part of this Transaction Log internal series, In this article We are going to revisit some basic properties of Transaction log.
Though architecture, purpose and properties of data and transaction log files are not similar, I have tried to compare some similarities
and differences of data and t-log files. Tried to cover some common myths around data and log files.
We are going to confirm some of below T-Log properties with some simple examples in coming days.
Slno | Property | Data Files | Transaction Log Files |
1 | File | One primary (.MDF) and optional secondary (.NDF) data Files | Ideally one T-Log file (.LDF). |
2 | Access type | Random | Sequential and Circular. Will grow if required |
3 | Logically divided as | Pages | Virtual Log Files (VLF) |
4 | Size | Page size is always 8 KB | Initial VLF count: When a database is created, If Log file size is i) Less than 1 MB - 2 VLF ii) Between 1 MB and 64 MB - 4 VLF iii) Between 64 MB and 1 GB - 8 VLF iv) Greater than 1 GB - 16 VLF When Log file grows, VLFs are added for the increased log file size. |
5 | States | Page states: Free or Allocated in Mixed/Uniform extents (GAM, SGAM, IAM and PFS are used for this) | VLF states: |
6 | Unique Identifier | Two part number (file_no:page_no) | 3 part Log Sequence Number (LSN) - (In Hex) VLFSeqNo:OffsettoLogBlock:SlotNoInsideLockBlock |
7 | Link | It depends. i) Heaps - pages are not logically connected (IAM) ii) Clustered/Non Clustered Index - pages are linked (B-Tree) | In every transaction, LSN records has unique sequence numbers with one common transaction number. |
8 | Purpose | Purpose: i) System pages (GAM, SGAM, PFS, BCM etc) ii) Data/Index pages - To store actual user data | Multi Purpose: 1. Crash recovery & Restore recovery 2. Point in time recovery (Log backup) 3. Replication 4. Log Shipping 5. Mirroring 6. Database snapshot creation 7. CDC and CT |
9 | In Memory | Buffer cache | Log Cache |
10 | Memory to Disk | IO to data files is asynchronous | IO to T-Log file is asynchronous |
11 | Delay in applying changes from memory to disk | Data page in memory is dirtied many times. DML changes in page are not immediately written to data files. CHECKPOINT or Lazy-writer moves dirty pages to data files irrespective of whether chage is commited or in progress. Key: Reduce IO & Improve performance | Changes are first recorded in T-Log file (WriteAheadLogging) Key: Record all DDL and DML changes in disk to fullfill ACID requirements. |
12 | Accuracy of data in disk | Not necessarily 100% accurate. Checkpoint or Lazywrite may even flush dirty pages impacted by un-commited transactions. See point 11 and 18 | Purpose of T-Log file is to record all DDL and DML activities immediately. |
13 | Recovery phases | No recovery phase for data files (?) | Recovery phases: 1. Analysis - Dirty page table & Active transaction tables are created. 2. Redo - Committed changes in T-Log are applied in data files if not done. (if T-Log Prev. LSN = Data page LSN) 3. Undo - For Incomplete transactions, compensation log records are created |
14 | Some frequently used commands/fn's | DBCC EXTENTINFO DBCC IND DBCC PAGE | CHECKPOINT DBCC LOGINFO DBCC LOG fn_dblog() fn_dump_dblog() |
15 | Behaviour change in Recovery Models | No major behaviour change? | i) Simple - Checkpoint truncates in-active VLFs ii) Bulk logged - Bulk logged operations are minimally logged and Only T-Log backup can truncate in-active VLFs iii) Full - Only T-Log backup can truncate in-active VLFs Funny fact: Database with Full recovery will act as Simple recovery until you initiate Log chain by taking first full backup. |
16 | Shrink | It is not a good practice to shrink data files. This will lead to fragmentation and poor performance | We can shrink un-used VLF portions if required. *Read more on this |
17 | Friendliness | Two frequently accessed data files are not friends. If possible, keep them in seperate disk drives 🙂 Key: Performance | Data and Log files are not friends. If possible, Seperate them and |
18 | Delete | Deleted user records are marked as ghost records and not physically removed immediately. Ghost clean up will happen at frequent interval | T-Log records are not deleted. During Rollbacks compensation records are created to undo the changes made. |
19 | Documentation | Anatomy of data and index pages are documented | Anatomy un-documented. You may get some clues if you are good at page anatomy. |
Out of scope:
One interesting point here is: in a busy OLTP environment, at any point, there is no guarantee that pages in data files, buffer cache (and even T-Log records) are accurate or in consistant state. reasons,
i) Data file pages - Checkpoint may flush dirty page which is not yet commited (and ghost records are yet to be cleaned up)
ii) Buffer cache - One data page might be dirtied many times (and not necessarily every change in memory is flushed to disk)
iii) T-Log records - Records too many DML changes with no guarantee that the transaction will be commited
But When you query a frequently updateable table, RDBMS returns you consistant records (within the scope of transaction isolation levels)
Interesting to realize the complexity of RDMS architecture components.
Reference:
Paul Randal and Kalen Delaney