Choose your Hash Bucket Count very wisely in Hekaton!

(Be sure to checkout the SQLpassion Online Academy, where you get High-Quality SQL Server Trainings with Instant Access!)

Today I have encountered a very simple blog posting where someone was inserting 1 millions of rows into a Hekaton Hash Index of 2048 buckets. Therefore I was running some tests how Hash Collisions in the Hash Buckets will affect the throughput of your Hekaton workload – and the results are very, very interesting. In the first step I want to simple describe what a Hash Collision is.

As you might know (hopefully), Hekaton tables are implemented as Hash Indexes in SQL Server 2014. Wikipedia has a very nice page about the basics of Hash Tables in general, which are used as the basic foundation of Hash Indexes. In a nutshell, SQL Server is applying a Hash Function to the key value of your Hash Index, and the result of that Hash Function determines in which Hash Bucket your row will be stored finally. If multiple key values are hashed to the same value, SQL Server inserts multiple rows in that specific Hash Bucket, and just chains the multiple entries in that Hash Bucket. See the following picture (taken from Wikipedia):

HashCollision

As you can see from this picture, the keys “John Smith” and “Sandra Dee” are hashed to the same bucket – in our case, the bucket 152. This means that both rows are stored in the same Hash Bucket, which affects the INSERT performance, and also the SELECT performance of your queries. During the INSERT, SQL Server has to maintain the linked list, and during SELECT queries SQL Server has to scan the linked list.

After that introduction, let’s demonstrate the performance impact of Hash Collisions with a simple example. Let’s create a database with a simple Hekaton table:

-- Create new database
CREATE DATABASE HashCollisions
GO
--Add MEMORY_OPTIMIZED_DATA filegroup to the database.
ALTER DATABASE HashCollisions
ADD FILEGROUP HekatonFileGroup CONTAINS MEMORY_OPTIMIZED_DATA
GO
USE HashCollisions
GO
-- Add a new file to the previous created file group
ALTER DATABASE HashCollisions ADD FILE
(
NAME = N'HekatonContainer', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\HashCollisionsContainer'
)
TO FILEGROUP [HekatonFileGroup]
GO
-- Create a simple table
CREATE TABLE TestTable
(
Col1 INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1048576),
Col2 INT NOT NULL,
Col3 INT NOT NULL
)
WITH
(
MEMORY_OPTIMIZED = ON, 
DURABILITY = SCHEMA_ONLY
)
GO

As you can see from the source code, I’m using here just a Hash Bucket count of 1024 – not too many buckets, when we insert afterwards 1 million of records. In the next step I’m also creating a native compiled Stored Procedure, so that I can go wicked fast with Hekaton:

-- Create a native compiled Stored Procedure
CREATE PROCEDURE InsertTestData
WITH 
    NATIVE_COMPILATION, 
    SCHEMABINDING, 
    EXECUTE AS OWNER
AS 
BEGIN
ATOMIC WITH 
(
TRANSACTION 
ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = 'us_english'
)
DECLARE @i INT = 0

WHILE @i < 1000000
BEGIN
INSERT INTO dbo.TestTable (Col1, Col2, Col3) VALUES (@i,@i, @i)
SET @i += 1
END
END
GO

As you can see, I’m just inserting here 1 million of records in a simple loop. When I’m running this Stored Procedure in a VM with 1 CPU, and 2 GB RAM, the execution time is around 24 seconds – awefully slow. When you are doubling the Hash Bucket count over and over again till you have 1048576 buckets, you can see very nicely that the performance increases as you create more and more Hash Buckets. With 1048576 buckets, the execution of the same Stored Procedure just takes around 600ms! This is huge differentiation compared to the first test run with just 1024 buckets. You can also check with the DMV sys.dm_db_xtp_hash_index_stats how many buckets you have in your Hash Index.

So the moral of this story? Choose your Hash Bucket count for Hekaton Hash Indexes *very, very* wisely, as they are really impacting the performance of your workload! The optimal number of buckets is just the number of distinct values that you are storing in the Hash Index – with some additional free space, just to be on the safe side You also have to make sure not to choose the bucket count too high, because otherwise you are just waisting memory. So again – as almost everything in SQL Server – it depends on your workload, besides Database Shrinks

Thanks for reading

-Klaus

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Choose your Hash Bucket Count very wisely in Hekaton!

Rate

Share

Share

Rate

Choose your Hash Bucket Count very wisely in Hekaton!

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts