November 23, 2017 at 6:45 pm
Jeff,
I’m on a phone at the moment so I won’t try to make a full response right now.
I’ll just say, in my environment, insert hot spots aren’t a “top 10 problem”... Certainly nothing compared to the massive log growth caused by excessive page splits.
I have been tasked with coming up with a new, enterprise wide, indexing strategy.
If it’s ok with you, I’d like to pick your brain on the matter. I’m especially interested in finding out more about your aversion to doing index reorganize... We do use AOAG/HADR, which means we can’t use Bulk mode for index maintenance... So I was able to convince the people who decide if I keep getting paychecks, that we should stop doing rebuilds on the larger indexes and switch to “multi-session” reorgs instead.
It seems to stand in direct opposition to advise offered by Paul Randall...
November 23, 2017 at 9:06 pm
Jason A. Long - Thursday, November 23, 2017 6:45 PMJeff,I’m on a phone at the moment so I won’t try to make a full response right now. I’ll just say, in my environment, insert hot spots aren’t a “top 10 problemâ€... Certainly nothing compared to the massive log growth caused by excessive page splits. I have been tasked with coming up with a new, enterprise wide, indexing strategy. If it’s ok with you, I’d like to pick your brain on the matter. I’m especially interested in finding out more about your aversion to doing index reorganize... We do use AOAG/HADR, which means we can’t use Bulk mode for index maintenance... So I was able to convince the people who decide if I keep getting paychecks, that we should stop doing rebuilds on the larger indexes and switch to “multi-session†reorgs instead. It seems to stand in direct opposition to advise offered by Paul Randall...https://youtu.be/p3SXxclj_vg
Actually, everything in Paul's great presentation (I've got it burned into my brain because I've studied it so many times) supports this "new thing" that I've found. I still have (if I were asked to estimate) somewhere between 60 and 80 hours of testing to do because I need to cover a whole lot of permutations (even including memory size vs table size). I don't use AOAG/HADR (we use SAN replication, instead) so I'm not sure what I can flush out there.
As a bit of a teaser, I absolutely do agree that the "bad" type of page splits DO have a heck of a lot more impact on the log file than the "good" page splits on a well behaved ever-increasing keyed index. BUT, I'm also finding out that they don't occur as much as you would think even in a no-index maintenance environment.
As a bit more of a teaser, I'm finding out that people are actually rebuilding their indexes incorrectly and re-orgs are rather ineffective in preventing page splits of any kind. Early indications are that they're (reorgs) actually tougher on the log file than not doing any index maintenance at all but, like I said, I still have more testing to do.
Still the proof is only in the pudding and I'm not done making the pudding yet. I have to provide repeatable tests because my almost 2 years of not doing index maintenance in prod doesn't count for anyone but me because, when I started not doing index maintenance, I hadn't anticipated it being successful and so didn't keep adequate records to qualify it as a proper case study.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2017 at 10:32 pm
I’m not postponing a reply... I’m stupid drunk w/ fiends & family. Like I said, I’m very interested in picking your brain on this topic. I’m not making an argument one way or another. I’m just trying to get my head wrapped around the topic in general. Your insights are genuinely appreciated. I just want to make sure I have everything in it’s proper context.
November 23, 2017 at 11:14 pm
Jason A. Long - Thursday, November 23, 2017 10:32 PMI’m not postponing a reply... I’m stupid drunk w/ fiends & family. Like I said, I’m very interested in picking your brain on this topic. I’m not making an argument one way or another. I’m just trying to get my head wrapped around the topic in general. Your insights are genuinely appreciated. I just want to make sure I have everything in it’s proper context.
BWAAA-HAAA!!!! You shouldn't be posting with me... you should be having fun with your friends and family. If you want, PM me when you get up later this morning.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2017 at 6:47 am
I've never had much use for GUIDs, I much prefer INT identities.
However, I once had a situation that required unique keys across distributed computers, potentially tens of thousands of individual databases (for a game application with a requirement that each locally installed user/database had to have absolutely unique keys for every record (object) in the database.
Keeping the many failings of GUIDs in mind (fragmentation issues, humongous key sizes, etc.) I came up with a hybrid solution that minimized GUID impact. I'm not sure this solution would fit anyone else's needs, but I'll go ahead and share it, just in case.
Since any single user/database would never approach 2 billion records I simply kept a table that translated the (truly universally unique) GUID into a local INT identity column. The GUID was generated when an object was created, but every other reference was to the local identity until the user decided to export the object, which then used its GUID. When imported to another user it stored the GUID in the translation table.
Worked well in tests, until the project fell by the wayside, never completed, alas.
November 24, 2017 at 7:16 am
I too like these answers, learning all the time. Have a few bits and pieces I follow:
There is more ‘data integrity’ from foreign keyconstraints with GUIDs rather than simple IDENTITY columns. Values in IDENTITY columns on differenttables will likely overlap. (That said,I have rarely justified GUIDs.)
Do identify all candidate keys, natural orotherwise and especially composite keys. Query optimisers love defineduniqueness. For composite alternate keys, aim to eithermake all columns in the key not null or else define a table check constraintsuch that either all columns are not null or all columns are null.
Avoid using composite keys as primary keys orthe source/target of foreign key constraints. From experience, these are also ‘from Satan’.
Do not use the same name for the primary keycolumn in all tables eg ‘ID’.
Take care if column is used identify a ‘default’data value. For example, customer has multipledelivery addresses; one row is the ‘default’ delivery address, identified by IsDefaultbit field. If multiple default deliveryaddress are added, queries return too many rows…
Use newer ‘date’ datatype to hold pure datesrather than older ‘smalldatetime’ and ‘datetime’. Its smaller and avoids time comparisonissues.
Instead of older design patterns, where a validfromand validto date pair are added to support history, consider 2016 TemporalTables.
Star schema database designs do not observetraditional normalisation. If not anOLTP database, consider star schemas in preference to improvising wholesale denormalization.
All the best.
Bredon
November 24, 2017 at 7:22 am
Bredon Hill - Friday, November 24, 2017 7:16 AMI too like these answers, learning all the time. Have a few bits and pieces I follow:
There is more ‘data integrity’ from foreign keyconstraints with GUIDs rather than simple IDENTITY columns. Values in IDENTITY columns on differenttables will likely overlap. (That said,I have rarely justified GUIDs.)
Do identify all candidate keys, natural orotherwise and especially composite keys. Query optimisers love defineduniqueness. For composite alternate keys, aim to eithermake all columns in the key not null or else define a table check constraintsuch that either all columns are not null or all columns are null.
Avoid using composite keys as primary keys orthe source/target of foreign key constraints. From experience, these are also ‘from Satan’.
Do not use the same name for the primary keycolumn in all tables eg ‘ID’.
Take care if column is used identify a ‘default’data value. For example, customer has multipledelivery addresses; one row is the ‘default’ delivery address, identified by IsDefaultbit field. If multiple default deliveryaddress are added, queries return too many rows…
Use newer ‘date’ datatype to hold pure datesrather than older ‘smalldatetime’ and ‘datetime’. Its smaller and avoids time comparisonissues.
Instead of older design patterns, where a validfromand validto date pair are added to support history, consider 2016 TemporalTables.
Star schema database designs do not observetraditional normalisation. If not anOLTP database, consider star schemas in preference to improvising wholesale denormalization.
All the best.
Bredon
Not sure I understand what you mean by point #1.
Do you mean replication between databases with the same tables in them? Or different tables in the same database having the same PK value using identities? If the latter that makes no difference and wouldn't affect the degree of referential integrity at all, would it?
November 24, 2017 at 7:43 am
If you number all your identities from 1 then there will be duplicated values in different tables. If an application developer gets hold of the wrong integer key and inserts it as, say, a foreign key value, then there is a high likelihood that database won't complain. We all make mistakes. Have positive experience of very appreciative developers from the few times I have used GUIDs, always with replication.
November 24, 2017 at 7:49 am
Bredon Hill - Friday, November 24, 2017 7:43 AMIf you number all your identities from 1 then there will be duplicated values in different tables. If an application developer gets hold of the wrong integer key and inserts it as, say, a foreign key value, then there is a high likelihood that database won't complain. We all make mistakes. Have positive experience of very appreciative developers from the few times I have used GUIDs, always with replication.
Hmm. While that's possible I suppose I've never encountered it before.
Live and learn, I guess. 🙂
November 24, 2017 at 1:09 pm
John Mitchell-245523 - Thursday, November 23, 2017 2:10 AMArsh - Wednesday, November 22, 2017 7:37 PMThank you Jeff, John, Lynn, Grant and Bill for the points. How about column lengths ? There's a debate with dev team especially on Varchar. Where 20 is enough, they ask for length to be 500 sometimes because SQL anyways uses only the actual data-length. My aurgument is keep only what is required or may grow realistically.. and also since SQL estimates and holds half of the column length in anticipation of data in a column .. isn't it better to use what is required? I believe that keeping things that you don't require , comes with a cost . what do you think ?Yes, use only what's required. This is for two reasons. First, as you hinted at, the memory grant for a query may be based on the declared width of a varchar column. Second, you're more likely to prevent bad data from getting into your database if you restrict its width to the absolute maximum of what it could be.
John
Agreed.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 26, 2017 at 4:35 pm
Adding my vote against the use of GUIDs except for certain circumstances. Replication comes to mind. With the sequence numbers, there's no longer any need to use them to maintain a database wide uniqueness. I have to deal with them at a client site. Pain in the neck. I will always use a business key in the OLTP designs if available. In my experience, these rarely change, though their descriptions might. I use integers (small and regular) for the keys with the OLAP designs. The 3NF is also very important in OLTP designs and too often not done correctly. OLAP designs have their own rules, even then different for fact and dimension tables. Just take the time to get it right.
November 27, 2017 at 9:13 am
RonKyle - Sunday, November 26, 2017 4:35 PMAdding my vote against the use of GUIDs except for certain circumstances. Replication comes to mind. With the sequence numbers, there's no longer any need to use them to maintain a database wide uniqueness. I have to deal with them at a client site. Pain in the neck. I will always use a business key in the OLTP designs if available. In my experience, these rarely change, though their descriptions might. I use integers (small and regular) for the keys with the OLAP designs. The 3NF is also very important in OLTP designs and too often not done correctly. OLAP designs have their own rules, even then different for fact and dimension tables. Just take the time to get it right.
I have run into natural keys used as a primary key getting broken as a result of a change causing them to no longer being unique. I agree it is good to have a natural key as an alternate key, and would encourage an SID for a primary key that is not changeable by any users.
November 28, 2017 at 3:24 am
November 28, 2017 at 6:22 am
Arsh - Tuesday, November 28, 2017 3:24 AMThank you everybody for putting in points for the design aspects. Would be great if you can also advise me on the design tools available in the market, and which ones are worth trying as per your experience. Thank you so much.
Anything from Red Gate, also a database modeling tool called ModelRight.
November 28, 2017 at 7:02 am
roger.plowman - Tuesday, November 28, 2017 6:22 AMAnything from Red Gate
Well, you get no argument from me on this point.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 16 through 30 (of 42 total)
You must be logged in to reply to this topic. Login to reply