Hekaton Part 8 - Hash Collisions - Hash_Index_Stats DMV

Continuing from previous post , this post will discuss a few scenarios on hash collisions.

The following "in memory" table with a hash index is created. Please note that the hash bucket count has been set to 1024. Hash bucket count 1024 implies that hash collisions are certain after row count increases 1024.

CREATE TABLE dbo.hash_collision

(
[ID] Int identity(1,1) PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024) NOT NULL ,
[Data] uniqueidentifier DEFAULT newsequentialid() ,
[dt] datetime NOT NULL ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

Step 1: Load 500 rows

Insert into hash_collision(dt) select getdate()

GO 500

dm_db_xtp_hash_index_stats dmv can be used to check statistics on hash indexes n hash collisions. At 500 rows, the hash index stats indicates that the buckets are partially filled in.

SELECT
* FROM

sys.dm_db_xtp_hash_index_stats

WHERE object_name(object_id) = 'hash_collision'

"Total Bucket count" indicates the number of buckets in the hash index which is fixed.

"Empty Bucket count" indicates the number of buckets that are empty. In this case, close to 50% are empty as we have inserted only 500 rows.

"Average Chain Length" indicates average length of a hash chain. In other words, average number of hops one may need to take to find a row.

Step 2: Add another 500 rows

Let us add another 500 rows and check the index status from dm_db_xtp_hash_index_stats

Insert into hash_collision(dt) select getdate()

go 500

select * FROM sys.dm_db_xtp_hash_index_stats WHERE object_name(object_id) = 'hash_collision'

dm_db_xtp_hash_index_stats indicates that 80% of the buckets are full but the average hash chain length is still at 1 as there are still a few empty hash buckets.

Step 3: Add 9000 rows

Let us add few more thousands - say 9000 rows. Now table contains 10,000 rows but only 1024 hash buckets

Insert into hash_collision(dt) select getdate()
go 9000

SELECT * FROM sys.dm_db_xtp_hash_index_stats
WHERE object_name(object_id) = 'hash_collision'

Notice a sharp increase in "Average Chain Length" as there more values than the number of buckets. The number of empty buckets is obviously zero.

So does longer hash chain affect the performance? Longer hash chains cause reads or the index scan to be slower. So, it is important to pick the hash bucket count carefully. General recommendation is at least 2 times the number of distinct values in the table. Also, it is always better to over size the hash bucket count instead of under sizing it.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Hekaton Part 8 - Hash Collisions - Hash_Index_Stats DMV

Rate

Share

Share

Rate

Hekaton Part 8 - Hash Collisions - Hash_Index_Stats DMV

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts