Question on Index

  • Hello,

    In my Datamart having FactTable like below and planing to create a index for hashkey column with fixed length, Please advice which index is good CLUSTER or NONCLUSTER.

    CREATE TABLE [dbo].[FactClusterTest] (

    [Col_Hashkey] CHAR (40) NOT NULL,

    [Col1] MONEY NULL

    .

    .

    .

    )

  • OK< a few generalisations.  A clustered index usually works best on a narrow key (int, for example), and for a column that is monotonically ascending column (a timestamp or identity, for example) and frequently used for range searches.  It's usually better to have a clustered index on a table than not to have one.  Beyond that, you haven't provided enough information for us to advise you.  Probably you never could - indexing is as much trial and error as it is science.

    John

  • Heh... "42".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... rethinking this a little bit and even though the question was devoid of most information, the fact is that there isn't much people do with hashkey columns except to do compares during "Upserts" and the like.

    With that thought in mind, I probably wouldn't waste the Clustered Index on such a thing but, as with all else in SQL Server, I'll say "It Depends" on things like how the "Upserts" are being done and whether or not someone is using the column for anything else.

    The other thing to consider is that such a hashkey is going to be almost random in nature.  If you use it as the leading column of a Clustered Index, it's going to be very much like have a Clustered Index on a Random GUID.  That means many things...

    1. Every Non-Clustered index will also have the 40 byte hash key appended to it, which will take up quite a bit of extra space, extra memory, and slow things down quite a bit.
    2. REALLY change your index maintenance strategy.  Most people get it horribly wrong for such columns.  For example, you must NOT use REORGANIZE on such columns.  To summarize the biggest reason for that is that REORGANIZE will NOT make new pages and so cannot "clean" the area between the Fill Factor and being 100% full.  That problem alone will perpetuate rampant page splits  and fragmentation every bloody day.  If you don't use REORGANIZE on such columns, they can actually work out quite well at seriously preventing fragmentation given the correct  Fill Factor and dwell time between REBUILDs and can go literally for months with absolutely zero page splits and the resulting fragmentation.  Such indexes are also wicked good for very high rate inserts because they avoid "hot spots" in the table.

    So, all that being said and to summarize my thoughts... I'd probably NOT make the column the Clustered Index BUT, and strongly depending on what else the column will be used  for, there can be extreme trade-off advantages to doing so.  Without additional information, though, it's not possible to make an accurate recommendation... only a general one.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Based on the extremely limited info so far, I would say this is likely the (very) rare case where you should leave the main table as a heap and create just a nonclustered index on the hash key column.  Presumably the table is at least reasonably narrow (since it's a fact table), so you wouldn't want to have both a clus and nonclus index unless it was nearly unavoidable.

    I do suggest adding an identity column to the table, just so it's available if you should (very) rarely need to rebuild the heap.  The $IDENTITY gives you an easy to do that: drop the existing nonclus index; create a clustered index on $IDENTITY; drop it; then re-create the nonclus index.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks for your suggestions !!

    Added additional note on my post, Hashkey values are from source system, I am not generated hashkey while loading datamart.

    I am using  hashkey for UPSERT records into the DataMart, also the incremental load count is approximately 1000  records every hour.

    Please let me know if you need any addition information and update the same.

     

  • In that case, use a non-clustered index on the hashkey.  It will be more narrow than a clustered index and will run much more quickly.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply