October 6, 2014 at 10:52 pm
Hi,
I am dealing with a multi-tenant database design on Azure. It will also be receiving data from other databases by way of sync, hence the primary key has been kept to a GUID. Because this would have lead to a lot of fragmentation, (NEWSEQUENTIALID is not supported in Azure), I selected the cluster index not on the primary key but on the field called RecordCreatedDtTime.
On doing this I realized that every query started doing key lookups as RecordCreatedDtTime attribute will rarely ever form part of the where clause in any query. So I am thinking along the lines that the better cluster index would have been the TenantID as TenantID is part of almost every query. But If I choose TenantID, it will definitely not be unique across rows, hence there will be the extra overhead of SQL Azure adding the extra identifier against each Tenant ID cluster index column to make it unique.
Has anyone faced similar situations in selecting the primary key and cluster index key in multi-tenant Azure for databases that need to support syncing with other offline databases too. If so, I would love to hear your advice, should I use TenantID or RecordCreatedDtTime as my cluster index?
November 25, 2014 at 8:42 am
Hi mate,
Yeah, I don't think you're going to find a perfect solution to this. You're just going to have to try to find the solution that fits your scenario best. I guess for me, it would depend on the size of the table and what it's being used for, throughput / inserts / reads etc. How many non-clustered indexes etc you're planning on creating.
I tend to try to keep the primary key as thin as possible, especially where I'll be creating subsequent non-clustered indexes. I might be thinking along the lines of an INT / BIGINT as the PK, then possibly adding NC indexes on the guids where appropriate (although as you say, these are going to get pretty fragmented, so might take a bit of maintenance).
Hope this helps.
November 25, 2014 at 9:01 am
I did some Azure stuff a couple years ago so forgive me if I've forgotten something important. Last year I switched a whole database from int to guid primary keys and gained an immediate 20% performance increase. The trick is to use heaps (no clustered indexes) and let clients generate guids. OLTP rarely needs clustered indexes anyway. Performance comes from sql caching mostly just the active portion of tables where inserts are happening. When switching to heaps be aware that it uses RID lookups instead of key lookups. Because of that, consider switching some nonclustered indexes to be covering indexes for even more performance gains.
If Azure requires a clustered index then you're probably OK "wasting" an index on something that is never used like your datetime column. You'll still have performance benefits from guids.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply