cluster index

  • what i understood is we can have only one cluster index for a table. by default primary key is set to be a cluster index. but why sql server 2005 allows to create more than one cluster index. it's so confusing. please advice. 

  • SQL does not allow more than one clustered index per table; a clustered index defines the physical sorting order of your data in the table. However, a clustered index can be defined on 1 OR many fields. It is not mandatory to define the clustered index on the PK field(s), instead you should monitor your load to see how data on that table is accessed. For example you have many INSERT ... WHERE fieldn=@filedn, UPDATE ... WHERE fieldn=@filedn, SELECT ...WHERE fieldn=@filedn AND WHERE fieldZ=@filedZ; in this case fieldn should have a clustered index, eventualy fieldn+fieldZ a non-clustered index (this is, again, just an example)

    PKs and indexes have different functionality, first ones for assuring referential integrity, the second for optimising the db transactions.

     

  • I have a table with one identity column ,user login and , ID Column in a table with few more columns.

    In this table identity column , user login , ID Column all have unique values and are unique fields.

    but we have set the identity column as a primary key and it is only used to join tables and not used for any other purpose.

    but user login is used as a field in the where condition.

    my question is, is this the right design. can we have identity columns in all the tables only for the primary key purpose and joining tables when we have other unique fields for each table. please guide me.

     

     

  • Please reply for this

  • It is fine to have primary keys with "no meaning". Mark the other unique fields with unique indexes/constraints and eventually a foreign key so later on you still will able to see what relations those tables have.

    A clustered index is not necessary. The best candidate for a clustered index is a (small) set of column(s) that are queried for a range. Like between 5000 and 5100. Small column because every other index references the clustered index if present.

  • I would disagree with the point that this design is OK. 

    As for all other points I agree.  And a UNIQUE index on Login is suggested at the least on your other fields if you do choose to keep the ID field.

    See the following example.  If you use your LOGIN is unique why not use it as your PK.

    In my example below, I have to query 2 tables just to get the data I need.  But I usually know the key, so why not make it my PK.  FYI - Relational Database Design Purists will NOT agree with me.  All Entities MUST have Code tables. 

    Why have an ID column for joins only?  What is easier and more efficient?

    SELECT

    *

    FROM tblApplicationSignal

    INNER JOIN tblApplication

    ON tblApplicationSignal.[Application] = tblApplication.ApplicationID

    WHERE tblApplication.AppName = 'APP'

     OR

    SELECT *

    FROM tblApplicationSignal

    WHERE AppName = 'APP'

     

  • Actually - you usually WANT your keys to have no "other meaning" to the user community.  A "primary key" that a user can change/affect is a relational integrity update problem.  This is not a hard-set rule (although I believe this is advocated in the various normalization forms), but it would save you a LOT of headaches to NOT allow your users anywhere near the PK's...

    Unique ID is Vastly different a concept than a primary key.  In my mind - a PK has no meaning to anyone but the developer (if you can help it).

    Just a thought....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I would agree with Matt.  Having primary keys with 'intelligence' can be a major headache - especially if the column value is subject to change, or the business changes and the value is no longer unique. Also, avoid having composite primary keys.  Consider if you have a table that has a 3 column primary key.  Any table referencing that table (via FK constraint) would also have to contain all three referencing primary key columns. If the primary key was an identity, only one column is required when referencing table.  In general, our practice here is to use identity columns almost always as primary keys/clustered.  It's true that the PK does not HAVE to be the clustered index, but remember that the data in the database is physically ordered by the clustered index.  If you create a clustered index on a column that is frequently updated or inserted out of order, this can cause data and/or index fragmentation and performance issues.  Good candidates for clustered indexes are those that are inserted in order, (such as an identity or date columns), and that are updated infrequently.

  • Didn't I say, there are several that will disagree with me.  The examples given are the standard ones.... Only one field to reference the column. ... What if I want to change the key...

    If you have to change your primary key, you didn't do your job correctly.

    Per the other point.  Why is referencing with one field a good thing?

    As for the various normal forms, yes in theory that is how you want to define a relational model. 

    I still stand with my statements, I have tested them at great length.  Back when I had the luxury of doing these types of tests. I was part of a team that developed the same application using two different table designs.

    One was developed out to true 3rd normal form, where every key has an identifier that is a non descriptive number. 

    The other normalized each entity out to a UNIQUE, but descriptive key.  Each Parent was part of the Key of the Children.  etc. 

    I forget the exact number, nor could I publish them if I had them, but the end result was that the 2nd design was signficantly more efficient.  SQL was simpler in nearly every case.  The number of joins was reduced by over 75%.

    Consider this fact.  If you use the design suggested by Anthony/Matt, to get a the records referenced by a table that is say 5 levels deep you will need to join 6 tables.

    If you use the design I am suggesting, the PK of the parent table is part of each and every child table.  (does this add space to your system?  YES!!!) but what you gain is performance.  So you don't do a single join.  You already have the field as part of the table. 

    On the reverse side, to get the details of a parent table, it is in most cases 1 and only 1 join. 

    Now this can almost be done with a pure design, but only if you make child tables keys include all parent keys. 

    Just a suggestion, but in this case, you will not get a concensus on this from this group.

  • Heh.. asking for a concensus on modeling?

    Would probably be easier to get a concensus on religion or politics

    As with most things, there's a theoretical aspect, and there's a practical aspect.

    In this business, the theory part goes by the name 'nth normal form', and the practical could be called 'denormalization'. Somewhere in between 'common sense' fits in pretty well.

    Nothing is for free. What you gain over here, you have to pay for over there.

    Normalize to the extreme - the 'practical' will pay the price.

    Be too practical - the model will suffer. If the model suffers, stuff becomes 'impractical'.

    'Good modelling' is like walking a tightrope while keeping balance with minimum hassle and effort.

    At least that's how I see it.

    /Kenneth

     

  • <quote>

    Didn't I say, there are several that will disagree with me.  The examples given are the standard ones.... Only one field to reference the column. ... What if I want to change the key...

    If you have to change your primary key, you didn't do your job correctly.

    </quote>

    I don't think any of us are advocating either circumstance to the exclusion of all others - I certainly wasn't.  Avoiding an update on the value of a primary key is precisely why I advocate making it something no user input can affect.  So - for example - I would not make SSN a key in an employee system, if the SSN is keyed in.  That will *invariably* lead to someone miskeying it somewhere, and lots of headaches for me to deal with.  Of course - that's when I have a choice (I get lots of external data sources, with little or no controls over the fields in them - so I play with what I got....).  It's a rule of thumb more than an unbreakable rule.

    I'm also a BIG fan of compound indexes keys in the right places, precisely to deal with certain scenarios like you mentioned (like being able to bypass 6 inner joins when 1 or 2 will do the trick).  It is a form of denormalization, so you have to be aware of the conflicts it might create, but some circumstances do justify it.

    Like Kenneth mentioned - Normalization is a theory, and it can be a good thing to strive towards, but reality rarely conforms to "neat" data models.  You have to make a call as to where the pain vs gain meter puts you, and that's application by application.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Even thou you may not want to use SSN as the PK (which btw is bad unless you have a composite key with date of death as SSNs can be reassigned once a person has passed, but there have been mistakes made where the number has been reassigned twice, which leads me to believe the Gov may not have a constraint on this, or they have on SSN and Birth Date which will allow for errors in the data.) but you should have a natural key where possible.

    The arguement that a composite key is bad because of the relationship to another table is also poor. In this case it is very common to create an Alternate (Surrogate or Artificial if you prefer to say) Key as well for relationships.

    As for the original question you cannot have two clustered indexes on a table as has been stated due to the data is sorted this way. But your Primary Key can be a non-clustered index, especially in the case where you may have an artificial key for relationships. It really depends on your systems design and needs. Clustered index also can be non-unique if you choose (but be carefull here as the filegroup/page/record slot indentifier will be used instead of the actuall data on non-clustered indexes). The key is cluster on the item as best suits your overall performance needs.

  • Sorry about making assumptions.  I am sure you all have been in those situations where you put a huge effort into something and you are really proud of what you come up with.  Then it gets shot down because it didn't fit the company standards/"rule of thumb" documentation.  This was one of those projects.  So sometimes I forget that I am not still trying to explain the concepts of database design to "designers" who don't believe in ever asking the question why.  This is the requirement, so we do it.

    FYI, below I am going to take a few literary freedoms below (fiction for a min).

    Which in some ways fits with the OP.  I am guessing that there was a requirement for an index and possibly even a requirement that it be clustered.  In this case of course, you can't accomplish it, but if you don't ask "Why do you want to do this?" you'll probably do the wrong thing. 

    If the real reason for the requirement is that they just want the fields indexed, a non-clustered index will work fine here.  But if you just fulfill the requirement of adding a clustered index then you would have to remove the current one.  (Which is possible, but could be a very bad thing to do). 

  • Pardon me for perhaps straying a bit OT, but I feel that a few things may need to be sorted out.

    We talk here about keys and indexes in a very haphazard way, almost like they were the same, though they aren't.

    A 'key' is a key, and only a key. (be it primary, foreign, alternate, super... all are 'keys')

    A 'key' is also an abstract concept. It's part of the logical model.

    An index is a physical structure, and it's a part of the physical model.

    A key is not an index.

    An index is not a key.

    These two distinctions are important to understand.

    Now, the thing is that we use an index to implement a key.

    But, this doesn't make the index to be the key.

    Am I making sense here...?

    Anyways, I feel that there's a lot of confusion around these concepts, perhaps making things more difficult than necessary for most people that yet don't fully understand the theories and differences between the 'logical' and the 'physical' aspects of our databases.

    /Kenneth

     

     

  • Interesting discussion. The reality of this in MY opinion is that there is no single answer as to which form is correct. I believe that you have to normalize down to at LEAST third normal during the design process to truly understand your data and how it will behave. Given that understanding, you can then CHOOSE to demormalize away from true third normal where it makes sense in your application. There truly is no single "this is the way it should be done".

    Bottom line - Normalize 'till it hurts, denormalize 'till it works. Take into account storage capacity (3 field PK of char(30) vs single int for 10 million rows = .25Gb), table joins, volatility of the data, maintenance requirements, and yes, just plain comon sense as you design your schema.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply