Introduction
In this article we will look at one of the new technologies that was introduced in Microsoft in SQL Server 2014. The new technology is In-Memory OLTP. We will look at what this technology is, how to implement it and when it can be used. In my next article we will look at how to manage memory when using this technology.
Reducing Bottlenecks
SQL Server is one of the more robust database technologies that are available, but SQL Server has its performance bottlenecks. A performance bottleneck usually implies contention of a resource. Let us take a look at the usual bottlenecks in SQL Server.
- CPU: One of the main performance bottlenecks is the CPU. SQL Server can require a high amount of CPU. For each T-SQL statement to be executed, SQL Server has to execute a large number of instructions.
- Disk IO: Disk IO, in my opinion, is the biggest culprit for performance bottlenecks. Even with the introduction of SSD’s, you will still see performance bottlenecks due to IO contention.
- Network: SQL Server uses a very chatty protocol for communication between clients and the database server. This, along with a high amount of data transfer, could lead to a Network IO bottleneck.
- Memory: When you have low memory in your database server, the system starts paging to disk. This causes high IO and can cause performance issues.
- Locks, latches and spinlocks: When the load on the SQL Server is high, there is a chance of resource bottlenecks due to a high amount of locking.
With the introduction of In-Memory OLTP, there is a chance to get rid of some performance bottlenecks. Let us look at each item from the list above and see what In-Memory OKTP can do to reduce the bottleneck.
- CPU: When we are accessing In-Memory objects, the amount of CPU usage is greatly reduced. The reason for this is that the new version does not need as many instructions to execute the same T-SQL statement. This helps with reducing contention of the CPU.
- Disk IO: If the performance of your application is degrading due to an IO bottleneck, using In-memory technology will definitely help reduce that. In SQL Server 2014, the index keys are not duplicated, so less disk IO is needed. Another reason why In-Memory OLTP helps with reducing IO bottleneck is that it requires the table to reside in memory. This reduces trips to the disk to retrieve or update data.
- Memory: If your bottleneck is memory, then using this technology could make your problems worse. In-Memory OLTP server uses more memory since it requires the data to reside in memory.
- Network: SQL 2014 does not help reduce the network bottleneck since it still needs to send data to the application layer.
Locks, latches and spinlocks: When accessing data from In-Memory objects, latches are not used. The row level versioning is implemented differently in this version. One of the main differences is that it does not use TempDB to create versioning for transactions.
A Look at In-Memory OLTP
SQL Server was originally designed based on the idea that memory was very costly. This made it necessary for the data to reside on disk. Since a disk has moving parts, accessing data from the disk can be very slow. Even with the introduction of faster disks, access is still very slow.
It is common knowledge that memory has become very cheap. With the introduction of 64 bit architectures, we can have a server with one TB of memory for a low price. This means that most of the OLTP DB’s out there can reside in memory at a reasonable cost.
Based on this, Microsoft introduced the In-Memory OLTP engine. This version uses memory to hold a table and all operations on that table in memory. This reduced IO bottlenecks drastically. Microsoft introduced two types of memory tables.
- Schema only: In this case only the schema is stored permanently. As soon as a fail over on a cluster or a shutdown happens, all data in that table is lost. Only the schema will be recreated at restart. This is useful when you are planning to cache data in memory.
- Schema and Data: As the name implies, both schema and data are stored permanently. In this case both the schema and the data are persisted across restarts and everything is a fully logged operation. This means that every data change is logged.
Microsoft also changed the internal behavior of the relational engine to support memory based tables and operation. They introduced natively compiled stored procedures. These are T-SQL stored procedures compiled to native code that access memory optimized tables. The compiled stored procedures and tables are loaded into the SQL Server process as DLLs. They allow a much efficient execution of queries. The DLL’s are recreated every time SQL Server is restarted, and the information to recreate the DLLs is stored in the database metadata.
Even though the concept of In-Memory optimized table was chosen due to the fact that memory is cheap, the basic query optimizer has not changed. The query optimizer still assumes that the data is stored in the disk and it has to read from the disk.
One of the disadvantages of natively compiled stored procedures and tables is that once the object is created, it cannot be altered. You will have to drop and recreate or create the object with a new name. This is applicable even when we have to change just the bucket_count of an existing index. We will look at how to alter the objects in detail at a later point of this article.
The natively compiled stored procedures cannot access disk based tables. To make sure that a memory optimized table can be joined with a non-memory resident table, we have to use T-SQL. This T-SQL code will not have the same good performance as a compiled stored procedure.
Database mirroring and replication are also not supported on databases that contain memory optimized objects. These databases can be backed up and restored, Log shipping can be set up and the database can be part of AlwaysOn availability group.
For a persisted memory optimized table, all operations are fully logged. You might wonder why the writes to the transaction log do not affect the performance of the persisted table. To understand this, let us look at how a disk table writes data changes to the transaction log. For every insert or update, a record is written to the transaction log. If there are one hundred data changes, there will be one hundred writes to the transaction log.
In the case of a memory optimized table, the writes to the transaction log is not that high. The writes to the transaction logs are in blocks. Since the writes are in blocks, there will be one single record written to the transaction log with all the information needed. This means that there are less frequent writes to the transaction log.
Creating In-Memory Objects
When installing SQL Server 2014, the In-Memory OLTP components will be installed when you run the installation Wizard. One thing to keep in mind, In-Memory OLTP components will not work on a 32-bit server. In fact, the In-Memory OLTP components will not even be installed unless the server is 64 bit.
To create any memory-optimized table, there has to be at least one MEMORY_OPTIMIZED_DATA filegroup. To create a MEMORY_OPTIMIZED_DATA filegroup, you must use the keyword CONTAINS. Now you might ask why there is a specific need for a special filegroup. This filegroup will contain the checkpoint and delta files needed by SQL Server when it has to recover data after a shutdown. This filegroup can either be specified at the time creating the database or we can add this filegroup to an existing database. We will look at an example for both these cases.
Let us first look at the case where we create a new database which supports In-Memory objects.
CREATE DATABASE InLineDB ON PRIMARY(NAME = [InLineDB_data], FILENAME = 'D:\data\InLineDB_data.mdf', size=2048MB), FILEGROUP [InLineDB_Mem] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [InLineDB_File1], FILENAME = 'D:\data\InLineDB_File1'), (NAME = [InLineDB_File2], FILENAME = 'E:\data\InLineDB_File2') LOG ON (name = [InLineDB_log], Filename='L:\log\InLineDB_log.ldf', size=512MB) COLLATE Latin1_General_100_BIN2; GO
As you can see, the syntax is almost the same as creating any user database except that this one has a new key word CONTAINS and specifies that it contains MEMORY_OPTIMIZED_DATA. The memory optimized object will be stored in its own filegroup. The database can have only one filegroup for memory optimized objects but can have multiple files. You might also have noticed that the collation specified is binary. The indexes on memory optimized tables can only be on columns using Windows BIN2 collation.
With the previous example we have seen how to create a new database that will support memory optimized objects. We can also alter an existing database so that it can support memory optimized objects. What we need to do is to add another filegroup with the Keyword CONTAINS and then add files to that new filegroup. Once this is done we can create memory optimized objects in this existing database.
ALTER DATABASE ExistingDB ADD FILEGROUP InLine_FG CONTAINS MEMORY_OPTIMIZED_DATA GO ALTER DATABASE ExistingDB ADD FILE (NAME='InLine_FG', FILENAME='D:\data\InLine_FG') TO FILEGROUP InLine_FG GO
Now let us look at how we can create a table that resides just in memory and another one that resides in memory but is also persisted.
First let us look at some of the prerequisites for creating tables in memory. The first thing to note is that In-Memory OLTP does not support all datatypes. The datatypes supported are
- Bit
- Tinyint, smallint, int, bigint
- Money, smallmoney
- Float, real
- Datetime, smalldatetime, datetime2, date, time
- Numeric and decimal types
- Char, varchar, nchar, nvarchar
- Binary, varbinary
- Uniqueidentifier
One thing to keep in mind is that the record size is limited to 8060 bytes. The space used by a memory optimized table that is persisted also cannot exceed more than 512 GB.
In addition to the datatype limitations, memory optimized table cannot have and of the following:
- DML Triggers
- foreign keys
- identity columns
- unique indexes
The tables can also have a maximum of eight indexes.
When creating a memory optimized table we have to use two keywords. The first keyword is MEMORY_OPTIMIZED=ON. This specifies that the table is a memory optimized table. The second keyword is DURABILITY which specifies whether it is a persisted or a non-persisted table. If we want to have the table persisted, then the value for the keyword DURABILITY must be SCHEMA_AND_DATA and if we want it non-persisted then the value must be SCHEMA_ONLY
All memory optimized tables should have a primary key if the data is going to be persisted. There should be at least one index created for the table. There is no support for a clustered index and all indexes created have to be Hash indexes. Since the default for the primary key is clustered, we have to specify NONCLUSTERED when defining the PK. Now let us create a table that is persisted.
CREATE TABLE [dbo].[tbPersistedTable] ( [PersistedCol1] int NOT NULL, [PersistedCol2] varchar(250) NOT NULL, [PersistedCol3] Datetime NULL, CONSTRAINT [pk_tbPersistedTable__PersistedCol1] PRIMARY KEY NONCLUSTERED HASH ([PersistedCol1]) WITH(BUCKET_COUNT = 1024) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) go
We can create non-clustered composite index after identifying the columns in the table as shown below. Please keep in mind that all the indexes have to be created when the table is being created. You cannot add index as separate T-SQL statement.
CREATE TABLE [dbo].[tbPersistedComposIteIndex] ( [PersistedIndex1] int NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024), [PersistedIndex2] varchar(250) NOT NULL, [PersistedIndex3] Datetime NOT NULL, INDEX idx_tbPersistedComposIteIndex__PersistedIndex2__PersistedIndex3 NONCLUSTERED HASH (PersistedIndex2,PersistedIndex3) WITH (BUCKET_COUNT = 1024) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) go
We can see the details of an index by querying a new system table named sys.hash_index. Here is a sample query to get the detail of all indexes for the memory optimized tables in the database.
select object_name([object_id]) 'NameOfTable', [object_id], [name] as 'NameOfIndex', [type_desc], [bucket_count] from sys.hash_indexes
The main difference between a non-clustered index of a disk based table and a memory optimized table is that in the case of the latter the index is not stored on disk. The index is stored in memory. So there is no need to write a transaction entry when data is changed on the memory optimized table. The only drawback is that when the server has to be restarted, it will take more time since the index is created on the fly at the time of initializing the database.
Now let us look at how to create a non-persisted memory optimized table. The syntax used to create this table is the same as in previous two examples. The only difference is the value for the keyword DURABILITY. For a non-persisted table, the value for DURABILITY should be SCHEMA_ONLY.
CREATE TABLE [dbo].[tbNonPersistedTable] ( [NonPersistedCol1] int NOT NULL, [NonPersistedCol2] varchar(250) NOT NULL, [NonPersistedCol3] Datetime NOT NULL, CONSTRAINT [pk_tbNonPersistedTable__NonPersistedCol1] PRIMARY KEY NONCLUSTERED HASH ([NonPersistedCol1]) WITH(BUCKET_COUNT = 1024) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY) go
As you can see from the script above, the only difference is that the value for DURABILITY is SCHEMA_ONLY.
In the index portion in the scripts, you might have noticed the key word BUCKET_COUNT. What is BUCKET_COUNT? BUCKET_COUNT is the number of buckets that should be created in the hash table. There are no default values for this.
One of the easiest ways to find out what the value for bucket count would be one to two times the cardinality of the table. This value is always rounded up to the next higher power of two. Please note that if the bucket count value is too high then the memory allocated will be bit too high and will be wasted. If the bucket count is too low then the hash table will not have enough buckets, and that could cause a performance impact.
To see the details of the index created on the memory optimized tables we can use the new catalog view sys.hash_indexes.
Modifying an existing memory optimized object
As stated previously, memory optimized objects cannot be altered. Let us consider a scenario where we might want to add an index on an existing table that has data in it. Let us see how we can achieve this.
Here are the steps we need to take to add an index for the table tbPersistedTable.
- Script all stored procedures that reference this table.
- Create a table with the same structure (tbPersitedTable_New).
- Copy data from the old table tbPersistedTable to the new table tbPersitedTable_New.
- Drop the old table tbPersistedTable .
- Recreate the table tbPersistedTable with the right index.
- Copy data from the tbPersistedTable_new to the recreated table.
- Recreate all the stored procedures once again.
The above stated steps need to done in order to make any alteration to an existing memory optimized table.
In a nutshell
SQL 2014 introduced the In-Memory OLTP technology which is a boon for some scenarios where there is a high amount of transaction per second. The idea is to keep the tables that are accessed heavily in memory so that there will not be any disk IO waits. Another reason why it is fast is that there are no LATCHES on these objects. SQL Server 2012 uses optimistic multi version concurrency when accessing memory optimized table. In this version, memory optimized table does not acquire any locks during data changes. Since there are no locks, there are no blockings. This in turn helps give better performance since taking locks is a costly operation.
Like any new technology, there will be drawbacks as well. This is still in the infant stage. Not all datatypes are supported. One of the main drawbacks I have seen so far is that there is no way you can alter an existing table. Maintenance on these memory based objects is not easy. Another drawback that was noted is that memory optimized tables does not support auto update of statistics. We have to manually update the statistics of these tables. If the database server is starved for memory, the database becomes inoperable since there is not enough memory to do any write operations.