Clustered PK perf -

  • Hi Experts,

    I have a doubt reg. the creation of a clustered PK for one of our DWH tables.

    Background - There's a SP which builds a Fact table (say CT) with 30 odd columns, of which 12 are Dim Keys. Total records ~10M.
    Table CT is drop and create. If the SP has to be divided into few logical steps (say 10 steps), in the 3rd or 4th step there's a ADD Constraint (PK+clustered). The 1st and 2nd steps, insert records into CT. The 5th and subsequent steps, they update various columns (measures) in these records by reading and computing data from other tables/facts. These joins are done on all the Dim Keys all together at once. 

    The problem - The constraint/PK takes about ~20 mins to get created. Structure is something like this -
    ALTER TABLE CT ADD CONSTRAINT ct_pk PRIMARY KEY clustered 
    (
       dim_col1 asc, dim_col2 asc, dim_col3 asc, ... , dim_col12 asc
    )

    Is this PK really required on all the DIM keys? I understand that since CT table is joined with other tables using all the DIM Keys together, so they are all being included in the PK.
    CT JOIN some_other_table x ON (
    ct.dim_col1 = x.dim_col1 AND
    ct.dim_col2 = x.dim_col2 AND
    . . .
    ct.dim_col12= x.dim_col12 AND
    )

    Please advise. Let me know for any inputs from my side..

    I'm on - SS 2008 R2 (SP1) - Standard Edition

    Regards,
    RB

  • If you (almost) always join on all the keys, then yes, cluster on all of them.

    You really should strongly consider defining the clus index before loading the table if at all possible.  

    You should also consider doing all the computations of other columns as part of the original load if possible.  Otherwise, you're just creating additional overhead by having to insert a row and immediately update it one or more times.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You really should strongly consider defining the clus index before loading the table if at all possible
    > do u mean I should create the index immediately after creating the table structure (before any data is loaded in it) ?

    Will the index creation be faster in that case? But I guess, data load will be slower right?

  • ranitb - Tuesday, June 19, 2018 12:58 PM

    You really should strongly consider defining the clus index before loading the table if at all possible
    > do u mean I should create the index immediately after creating the table structure (before any data is loaded in it) ?

    Will the index creation be faster in that case? But I guess, data load will be slower right?

    > do u mean ... <
    Yes, that's exactly what I mean.

    > Will the index creation be faster in that case? But I guess, data load will be slower right? <
    The initial data load will be slower because SQL has to sort the rows.  But it will be faster than loading the entire table first with no clus index, then creating a clus index which forces the same sort and re-load of the entire table.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ranitb - Tuesday, June 19, 2018 8:27 AM

    ...
    Background - There's a SP which builds a Fact table (say CT) with 30 odd columns, of which 12 are Dim Keys. Total records ~10M.
    Table CT is drop and create. If the SP has to be divided into few logical steps (say 10 steps), in the 3rd or 4th step there's a ADD Constraint (PK+clustered). The 1st and 2nd steps, insert records into CT. The 5th and subsequent steps, they update various columns (measures) in these records by reading and computing data from other tables/facts. These joins are done on all the Dim Keys all together at once. 
    ...Is this PK really required on all the DIM keys? I understand that since CT table is joined with other tables using all the DIM Keys together, so they are all being included in the PK.
    ...
    I'm on - SS 2008 R2 (SP1) - Standard Edition

    Regards,
    RB

    There are a number of things odd about this setup.
    -  SS 2008 R2 Service Pack 3 has been out for quite a while, why are you still on SP1?
    -  It seems counter-intuitive that you would regularly query the fact table with ALL the dimension keys.  Some such as a date dimension would always be used but I could easily see others only being queried sometimes.  Index for how reports and users queries will be run against this table not for how you load them.
    -  I agree with Scott that adding the clustered index before the data is loaded would probably be better overall than loading the data, then essentially rebuilding the table by adding the clustered index after.

  • Chris Harshman - Tuesday, June 19, 2018 3:16 PM

    ranitb - Tuesday, June 19, 2018 8:27 AM

    ...
    Background - There's a SP which builds a Fact table (say CT) with 30 odd columns, of which 12 are Dim Keys. Total records ~10M.
    Table CT is drop and create. If the SP has to be divided into few logical steps (say 10 steps), in the 3rd or 4th step there's a ADD Constraint (PK+clustered). The 1st and 2nd steps, insert records into CT. The 5th and subsequent steps, they update various columns (measures) in these records by reading and computing data from other tables/facts. These joins are done on all the Dim Keys all together at once. 
    ...Is this PK really required on all the DIM keys? I understand that since CT table is joined with other tables using all the DIM Keys together, so they are all being included in the PK.
    ...
    I'm on - SS 2008 R2 (SP1) - Standard Edition

    Regards,
    RB

    There are a number of things odd about this setup.
    -  SS 2008 R2 Service Pack 3 has been out for quite a while, why are you still on SP1?
    -  It seems counter-intuitive that you would regularly query the fact table with ALL the dimension keys.  Some such as a date dimension would always be used but I could easily see others only being queried sometimes.  Index for how reports and users queries will be run against this table not for how you load them.
    -  I agree with Scott that adding the clustered index before the data is loaded would probably be better overall than loading the data, then essentially rebuilding the table by adding the clustered index after.

    The joins would always use all the dimension keys.  Having the tables identically clustered allows for a merge join possibility.  Merge joins are extraordinarily efficient when they're applicable.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • If there is a useable business key on the fact table, use it for the primary key, usually though without clustering it.  That's not always possible, but often the fact table grain is an individual something that has an identifiable business key, even if it's just an identity.  At that point, you're good.  Whether that's possible or not, consider creating a clustered index on only one of the fields, usually the date field.  Then non-clustered indexes on the other foreign key fields.  There are lots of options, just be sure you've analyzed options.

  • All - Thank you so much for those wonderful suggestions. Great learning!

    I will analyze all of them and come back with further doubts, if any.

    -RB.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply