Table Design

  • Jeff,

    I’m on a phone at the moment so I won’t try to make a full response right now.

    I’ll just say, in my environment, insert hot spots aren’t a “top 10 problem”... Certainly nothing compared to the massive log growth caused by excessive page splits.

    I have been tasked with coming up with a new, enterprise wide, indexing strategy.

    If it’s ok with you, I’d like to pick your brain on the matter. I’m especially interested in finding out more about your aversion to doing index reorganize... We do use AOAG/HADR, which means we can’t use Bulk mode for index maintenance... So I was able to convince the people who decide if I keep getting paychecks, that we should stop doing rebuilds on the larger indexes and switch to “multi-session” reorgs instead.

    It seems to stand in direct opposition to advise offered by Paul Randall...

    https://youtu.be/p3SXxclj_vg

  • Jason A. Long - Thursday, November 23, 2017 6:45 PM

    Jeff,I’m on a phone at the moment so I won’t try to make a full response right now. I’ll just say, in my environment, insert hot spots aren’t a “top 10 problemâ€... Certainly nothing compared to the massive log growth caused by excessive page splits. I have been tasked with coming up with a new, enterprise wide, indexing strategy. If it’s ok with you, I’d like to pick your brain on the matter. I’m especially interested in finding out more about your aversion to doing index reorganize... We do use AOAG/HADR, which means we can’t use Bulk mode for index maintenance... So I was able to convince the people who decide if I keep getting paychecks, that we should stop doing rebuilds on the larger indexes and switch to “multi-session†reorgs instead. It seems to stand in direct opposition to advise offered by Paul Randall...https://youtu.be/p3SXxclj_vg

    Actually, everything in Paul's great presentation (I've got it burned into my brain because I've studied it so many times) supports this "new thing" that I've found.  I still have (if I were asked to estimate) somewhere between 60 and 80 hours of testing to do because I need to cover a whole lot of permutations (even including memory size vs table size).  I don't use AOAG/HADR (we use SAN replication, instead) so I'm not sure what I can flush out there.

    As a bit of a teaser, I absolutely do agree that the "bad" type of page splits DO have a heck of a lot more impact on the log file than the "good" page splits on a well behaved ever-increasing keyed index.  BUT, I'm also finding out that they don't occur as much as you would think even in a no-index maintenance environment.

    As a bit more of a teaser, I'm finding out that people are actually rebuilding their indexes incorrectly and re-orgs are rather ineffective in preventing page splits of any kind.  Early indications are that they're (reorgs) actually tougher on the log file than not doing any index maintenance at all but, like I said, I still have more testing to do.

    Still the proof is only in the pudding and I'm not done making the pudding yet.  I have to provide repeatable tests because my almost 2 years of not doing index maintenance in prod doesn't count for anyone but me because, when I started not doing index maintenance, I hadn't anticipated it being successful and so didn't keep adequate records to qualify it as a proper case study.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I’m not postponing a reply... I’m stupid drunk w/ fiends & family. Like I said, I’m very interested in picking your brain on this topic. I’m not making an argument one way or another. I’m just trying to get my head wrapped around the topic in general. Your insights are genuinely appreciated. I just want to make sure I have everything in it’s proper context.

  • Jason A. Long - Thursday, November 23, 2017 10:32 PM

    I’m not postponing a reply... I’m stupid drunk w/ fiends & family. Like I said, I’m very interested in picking your brain on this topic. I’m not making an argument one way or another. I’m just trying to get my head wrapped around the topic in general. Your insights are genuinely appreciated. I just want to make sure I have everything in it’s proper context.

    BWAAA-HAAA!!!!  You shouldn't be posting with me... you should be having fun with your friends and family.  If you want, PM me when you get up later this morning.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've never had much use for GUIDs, I much prefer INT identities.

    However, I once had a situation that required unique keys across distributed computers, potentially tens of thousands of individual databases (for a game application with a requirement that each locally installed user/database had to have absolutely unique keys for every record (object) in the database.

    Keeping the many failings of GUIDs in mind (fragmentation issues, humongous key sizes, etc.) I came up with a hybrid solution that minimized GUID impact. I'm not sure this solution would fit anyone else's needs, but I'll go ahead and share it, just in case.

    Since any single user/database would never approach 2 billion records I simply kept a table that translated the (truly universally unique) GUID into a local INT identity column. The GUID was generated when an object was created, but every other reference was to the local identity until the user decided to export the object, which then used its GUID. When imported to another user it stored the GUID in the translation table.

    Worked well in tests, until the project fell by the wayside, never completed, alas.

  • I too like these answers, learning all the time.  Have a few bits and pieces I follow:

    1. There is more ‘data integrity’ from foreign keyconstraints with GUIDs rather than simple IDENTITY columns.  Values in IDENTITY columns on differenttables will likely overlap.  (That said,I have rarely justified GUIDs.)

    2. Do identify all candidate keys, natural orotherwise and especially composite keys. Query optimisers love defineduniqueness.   For composite alternate keys, aim to eithermake all columns in the key not null or else define a table check constraintsuch that either all columns are not null or all columns are null. 

    3. Avoid using composite keys as primary keys orthe source/target of foreign key constraints. From experience, these are also ‘from Satan’.

    4. Do not use the same name for the primary keycolumn in all tables eg ‘ID’. 

    5. Take care if column is used identify a ‘default’data value.  For example, customer has multipledelivery addresses; one row is the ‘default’ delivery address, identified by IsDefaultbit field.  If multiple default deliveryaddress are added, queries return too many rows…

    6. Use newer ‘date’ datatype to hold pure datesrather than older ‘smalldatetime’ and ‘datetime’.  Its smaller and avoids time comparisonissues.

    7. Instead of older design patterns, where a validfromand validto date pair are added to support history, consider 2016 TemporalTables.

    8. Star schema database designs do not observetraditional normalisation.  If not anOLTP database, consider star schemas in preference to improvising wholesale denormalization.

    All the best.

    Bredon

  • Bredon Hill - Friday, November 24, 2017 7:16 AM

    I too like these answers, learning all the time.  Have a few bits and pieces I follow:

    1. There is more ‘data integrity’ from foreign keyconstraints with GUIDs rather than simple IDENTITY columns.  Values in IDENTITY columns on differenttables will likely overlap.  (That said,I have rarely justified GUIDs.)

    2. Do identify all candidate keys, natural orotherwise and especially composite keys. Query optimisers love defineduniqueness.   For composite alternate keys, aim to eithermake all columns in the key not null or else define a table check constraintsuch that either all columns are not null or all columns are null. 

    3. Avoid using composite keys as primary keys orthe source/target of foreign key constraints. From experience, these are also ‘from Satan’.

    4. Do not use the same name for the primary keycolumn in all tables eg ‘ID’. 

    5. Take care if column is used identify a ‘default’data value.  For example, customer has multipledelivery addresses; one row is the ‘default’ delivery address, identified by IsDefaultbit field.  If multiple default deliveryaddress are added, queries return too many rows…

    6. Use newer ‘date’ datatype to hold pure datesrather than older ‘smalldatetime’ and ‘datetime’.  Its smaller and avoids time comparisonissues.

    7. Instead of older design patterns, where a validfromand validto date pair are added to support history, consider 2016 TemporalTables.

    8. Star schema database designs do not observetraditional normalisation.  If not anOLTP database, consider star schemas in preference to improvising wholesale denormalization.

    All the best.

    Bredon

    Not sure I understand what you mean by point #1.

    Do you mean replication between databases with the same tables in them? Or different tables in the same database having the same PK value using identities? If the latter that makes no difference and wouldn't affect the degree of referential integrity at all, would it?

  • If you number all your identities from 1 then there will be duplicated values in different tables.  If an application developer gets hold of the wrong integer key and inserts it as, say, a foreign key value, then there is a high likelihood that database won't complain.   We all make mistakes.  Have positive experience of very appreciative developers from the few times I have used GUIDs, always with replication.

  • Bredon Hill - Friday, November 24, 2017 7:43 AM

    If you number all your identities from 1 then there will be duplicated values in different tables.  If an application developer gets hold of the wrong integer key and inserts it as, say, a foreign key value, then there is a high likelihood that database won't complain.   We all make mistakes.  Have positive experience of very appreciative developers from the few times I have used GUIDs, always with replication.

    Hmm. While that's possible I suppose I've never encountered it before.

    Live and learn, I guess. 🙂

  • John Mitchell-245523 - Thursday, November 23, 2017 2:10 AM

    Arsh - Wednesday, November 22, 2017 7:37 PM

    Thank you Jeff, John, Lynn, Grant and Bill for the points. How about column lengths ? There's a debate with dev team especially on Varchar. Where 20 is enough, they ask for length to be 500 sometimes because SQL anyways uses only the actual data-length. My aurgument is keep only what is required or may grow realistically.. and also since SQL estimates and holds half of the column length in anticipation of data in a column .. isn't it better to use what is required? I believe that keeping things that you don't require , comes with a cost . what do you think ?

    Yes, use only what's required.  This is for two reasons.  First, as you hinted at, the memory grant for a query may be based on the declared width of a varchar column.  Second, you're more likely to prevent bad data from getting into your database if you restrict its width to the absolute maximum of what it could be.

    John

    Agreed.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Adding my vote against the use of GUIDs except for certain circumstances.  Replication comes to mind.  With the sequence numbers, there's no longer any need to use them to maintain a database wide uniqueness.  I have to deal with them at a client site.  Pain in the neck.  I will always use a business key in the OLTP designs if available.  In my experience, these rarely change, though their descriptions might.  I use integers (small and regular) for the keys with the OLAP designs.  The 3NF is also very important in OLTP designs and too often not done correctly.  OLAP designs have their own rules, even then different for fact and dimension tables.  Just take the time to get it right.

  • RonKyle - Sunday, November 26, 2017 4:35 PM

    Adding my vote against the use of GUIDs except for certain circumstances.  Replication comes to mind.  With the sequence numbers, there's no longer any need to use them to maintain a database wide uniqueness.  I have to deal with them at a client site.  Pain in the neck.  I will always use a business key in the OLTP designs if available.  In my experience, these rarely change, though their descriptions might.  I use integers (small and regular) for the keys with the OLAP designs.  The 3NF is also very important in OLTP designs and too often not done correctly.  OLAP designs have their own rules, even then different for fact and dimension tables.  Just take the time to get it right.

    I have run into natural keys used as a primary key getting broken as a result of a change causing them to no longer being unique.  I agree it is good to have a natural key as an alternate key, and would encourage an SID for a primary key that is not changeable by any users.

  • Thank you everybody for putting in points for the design aspects. Would be great if you can also advise me on the design tools available in the market, and which ones are worth trying as per your experience. Thank you so much.
  • Arsh - Tuesday, November 28, 2017 3:24 AM

    Thank you everybody for putting in points for the design aspects. Would be great if you can also advise me on the design tools available in the market, and which ones are worth trying as per your experience. Thank you so much.

    Anything from Red Gate, also a database modeling tool called ModelRight.

  • roger.plowman - Tuesday, November 28, 2017 6:22 AM

    Anything from Red Gate

    Well, you get no argument from me on this point.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply