February 3, 2014 at 10:47 pm
Jeff Moden (2/3/2014)
This guy isn't the only person of such ilk when it comes to indexes. I also see that Brent Ozar published an article about what a waste of time it is to even REORGANIZE indexes. While I, again, see some merit in his proposal, I don't believe that enough testing has been done to make it a standard practice where I work.
That is one I completely disagree with, and I have good solid reasons as to why. Tony tried arguing that one with me, not budging.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 4, 2014 at 2:39 am
L' Eomot Inversé (2/3/2014)
If I believed that I would be be leaping in with a better hash join that had a much greater probability of having clashes to deal with - it's pretty easy to handle clashes, and clashes at a much higher rate than likely guid clashes cost very little to handle in a join, and getting the probability down to that of guid clashes will cost more overall (including having a much longer hash to compare).
The hash equi-join can cheaply do a real equality check on the matching hash values, but I'm not sure the anti hash equi-join would do, or you'd have to re-check the entire non-matching side again against every row in the other side of the join. I'd assumed this used a hashing algorithm with a low, but non-zero probability of collision (I may be wrong here!).
L' Eomot Inversé (2/3/2014)
CPUs, Memory and storage all work with similar odds of undetectable mutation/corruption.
I hope not. And I'm pretty sure for most hardware that it is not quite that bad (although some hardware is not designed to be particularly error-free, or even to detect errors reliably, and you would be right where that hardware is concerned).
I'm not saying that those odds are high, just that GUID collisions is so improbible that it's on the same order of magnitude. Say you have a decent algorithm to generate GUIDS (I've seen no evidence that the MS one isn't) and you had 68 million rows with a randomly generated GUID, the chance of a single row of duplication is 4 * 10^-16 or 1 in 2,500,000,000,000,000 - e.g. if every company in the world did it, there probably wouldn't be a collision even between different company's data, let alone a collision in a linked system. ECC memory has a higher probability of undetectable corruption than that let alone non-ECC memory, which is laughably likely given various studies (including from Google).
There's got to be a level of probability that we just draw a line under and say it's not logical to code against.
February 4, 2014 at 3:26 am
Grant Fritchey (2/3/2014)
SQLRNNR (2/3/2014)
WayneS (2/3/2014)
SQLRNNR (2/3/2014)
rodjkidd (1/29/2014)
Talking of Grant, did anyone see the picture of him on the SQLCruise yesterday- almost as red as the livery of that company he works for!Rodders...
Why yes, yes I did see him. Tho, in person he didn't seem as RED.
I'd guess that it was either a reflection of his shirt, or he heard of yet another production DBA without a backup.
I think it was actually when his ssd died leading into a demo.
Yeah, that hurt.
I'm surprised an SSD had the audacity to fail in front of Grant 😀
Rodders...
February 4, 2014 at 5:31 am
GilaMonster (2/3/2014)
Jeff Moden (2/3/2014)
This guy isn't the only person of such ilk when it comes to indexes. I also see that Brent Ozar published an article about what a waste of time it is to even REORGANIZE indexes. While I, again, see some merit in his proposal, I don't believe that enough testing has been done to make it a standard practice where I work.That is one I completely disagree with, and I have good solid reasons as to why. Tony tried arguing that one with me, not budging.
I have to agree with Gail's disagreement. I've seen too many cases where reorganizing indexes made a significant difference in performance as tables grow. Seeing this makes me wish I'd saved some "before and after" benchmarking data.
February 4, 2014 at 5:37 am
Ed Wagner (2/4/2014)
GilaMonster (2/3/2014)
Jeff Moden (2/3/2014)
This guy isn't the only person of such ilk when it comes to indexes. I also see that Brent Ozar published an article about what a waste of time it is to even REORGANIZE indexes. While I, again, see some merit in his proposal, I don't believe that enough testing has been done to make it a standard practice where I work.That is one I completely disagree with, and I have good solid reasons as to why. Tony tried arguing that one with me, not budging.
I have to agree with Gail's disagreement. I've seen too many cases where reorganizing indexes made a significant difference in performance as tables grow. Seeing this makes me wish I'd saved some "before and after" benchmarking data.
Quite often that performance improvement is due to statistics being updated (on a rebuild), it's less likely that the index rebuild itself will make a significant improvement in performance.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 4, 2014 at 5:41 am
GilaMonster (2/4/2014)
Ed Wagner (2/4/2014)
GilaMonster (2/3/2014)
Jeff Moden (2/3/2014)
This guy isn't the only person of such ilk when it comes to indexes. I also see that Brent Ozar published an article about what a waste of time it is to even REORGANIZE indexes. While I, again, see some merit in his proposal, I don't believe that enough testing has been done to make it a standard practice where I work.That is one I completely disagree with, and I have good solid reasons as to why. Tony tried arguing that one with me, not budging.
I have to agree with Gail's disagreement. I've seen too many cases where reorganizing indexes made a significant difference in performance as tables grow. Seeing this makes me wish I'd saved some "before and after" benchmarking data.
Quite often that performance improvement is due to statistics being updated (on a rebuild), it's less likely that the index rebuild itself will make a significant improvement in performance.
Alas, I only wish I could do a rebuild. Unfortunately, with Standard Edition I can only do reorganization if I want them to remain online. I check the fragmentation of each index and if it's gotten out of hand, I reorganize it.
February 4, 2014 at 6:18 am
Grant Fritchey (2/3/2014)
Please, please, please, no piling on, but I'm curious what you guys think about this: Unreasonable Defaults. I think most of the fundamental info is fine. I'm just... concerned?... with the statements around clustered indexes and the recommendations to go with a heap instead. It's basically 180 degrees out from what I recommend. I've been thinking about it, and I'm still comfortable with my recommendations, but it is a consideration. Curious what you guys think.And again, this isn't an invite to beat on someone's head.
I don't much like it, because it uses selective bits of information to produce a "it's like this" answer where the only sensible answer is "it depends". In my experience, the balance has been in favour of having a clustered index; it's obvious that there are cases where it's better not to have one; it's not obvious that there are sufficient such cases to justify the recommendation to go with a heap; in fact the only good recommendation would be to look at your data and your workload and estimate (and preferably measure) whether clustered index or heap best meets your objectives.
Tom
February 4, 2014 at 6:33 am
Over my head. Could use some help here. I truly hate the hardware aspects of SQL Server, especially when combined with VMs.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 4, 2014 at 6:58 am
L' Eomot Inversé (2/4/2014)
Grant Fritchey (2/3/2014)
Please, please, please, no piling on, but I'm curious what you guys think about this: Unreasonable Defaults. I think most of the fundamental info is fine. I'm just... concerned?... with the statements around clustered indexes and the recommendations to go with a heap instead. It's basically 180 degrees out from what I recommend. I've been thinking about it, and I'm still comfortable with my recommendations, but it is a consideration. Curious what you guys think.And again, this isn't an invite to beat on someone's head.
I don't much like it, because it uses selective bits of information to produce a "it's like this" answer where the only sensible answer is "it depends". In my experience, the balance has been in favour of having a clustered index; it's obvious that there are cases where it's better not to have one; it's not obvious that there are sufficient such cases to justify the recommendation to go with a heap; in fact the only good recommendation would be to look at your data and your workload and estimate (and preferably measure) whether clustered index or heap best meets your objectives.
My thoughts exactly. He's calling summer on one sighting of a swallow. I'm in the middle of a greenfield indexing exercise right now (typically 4 tables per query, tables roughly 10-20m rows, rows are quite wide, down to 7 rows per page) and I'm frequently amazed at how fast SQL Server can pull selected data off a heap - but I haven't yet found a case for leaving it that way, and don't expect to. Most real-world stuff is way more complex than a wee test harness.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
February 4, 2014 at 7:10 am
edit: delete - a reply with only the quote, not the actual reply - finger trouble of some bizarre key combination?
Tom
February 4, 2014 at 7:52 am
Background to the following rant:
My company deals with a vendor who serves our needs by allowing us to connect directly to its server and code our own stored procedures in a dedicated database to access the data. The data lives in another database, and because we buy only a subset of the data, the vendor exposes that data to us through views, which are the only objects for which we have any permissions in that other database.
<Rant>
I woke up this morning to complaints that our applications were timing out because of long response times from this vendor's server. Fortunately, I insisted that the vendor give us SHOWPLAN permissions on both the dev and production servers. When I requested an estimated execution plan for the long-running proc, I found that the vendor has apparently added a WHERE clause to the definition of a view that the stored proc reference. The condition?
[columnA] <> '[somestring]' AND LEN(columnA) = [someint] AND ISNUMERIC([columnA]) = 1
So, now we are waiting for index scans (of 21.5 million rows) and evaluation of these three conditions EVERY TIME we SELECT from this view. No wonder our procs that used to run in several seconds are now taking several minutes.
The worst part (perhaps)? This was the vendor's response to our complaint that [columnA] included values outside the permissible domain ([columnA] is a standardized code established by a major quasi-governmental corporation) that were causing our app to return nonsensical results and/or throw errors.
</Rant>
Jason Wolfkill
February 4, 2014 at 8:50 am
HowardW (2/4/2014)
The hash equi-join can cheaply do a real equality check on the matching hash values, but I'm not sure the anti hash equi-join would do, or you'd have to re-check the entire non-matching side again against every row in the other side of the join. I'd assumed this used a hashing algorithm with a low, but non-zero probability of collision (I may be wrong here!).
Not sure what you mean here. The issue in an anti-join that would have the effect you describe would surely be false negatives in the mmatch test (ie things shown as matching that didn't match), and hash match has no false negatives because a hash function is deterministic
L' Eomot Inversé (2/3/2014)
HowardW
CPUs, Memory and storage all work with similar odds of undetectable mutation/corruption.I hope not. And I'm pretty sure for most hardware that it is not quite that bad (although some hardware is not designed to be particularly error-free, or even to detect errors reliably, and you would be right where that hardware is concerned).
HowardW (2/4/2014)
I'm not saying that those odds are high, just that GUID collisions is so improbible that it's on the same order of magnitude. Say you have a decent algorithm to generate GUIDS (I've seen no evidence that the MS one isn't) and you had 68 million rows with a randomly generated GUID, the chance of a single row of duplication is 4 * 10^-16 or 1 in 2,500,000,000,000,000 - e.g. if every company in the world did it, there probably wouldn't be a collision even between different company's data, let alone a collision in a linked system.
I haven't checked the arithmetic for that, because I saw no point (it doesn't look outrageous at first sight) as you are comparing with a number from a rather flawed article - detail below.
ECC memory has a higher probability of undetectable corruption than that let alone non-ECC memory, which is laughably likely given various studies (including from Google).
Unfortunately the arithmetic in that paper has a serious flaw. It is looking at the probability that a word gets two single bit hits without an intervening access, and assigns the probability 1 to a word, having been hit once, not being accessed before it is hit again. To see the effect of this assumption one would have to use some numbers for access frequency, which would also be assumptions, but the 0.5 billion words in the store described will be accessed millions if not billions of times per second (counting both CPU and peripheral access), so lets guess that the average time a word goes without being accessed is 10 seconds and that this is distributed exponentially. Using the article's figure of 72 bits per word (64 bits data +8 ecc) and 1.3e-12 hits/bit/hour you can see that the expected interval between two consecutive hits on a word is about 1.07e+10 hours, and 1 in 72 of these double hits is self-correcting, giving an interval of something over 1.08e+10 hours between double hits that generate an error; how many of the words in store remain untouched for 1.08e+10 hours? That's 3.9e13 seconds, which in an exponential distribution with mean 10 seconds will occur with probaility 1e-31 which is rather different from the 1 used to do the calculations. I don't claim any particular validity for this 10^-31 probability, but it's a great deal more plausible than the 1 used in the article. Or at least it is if I haven't forgotten too much mathematics :blink:.
Tom
February 4, 2014 at 9:14 am
WayneS (2/3/2014)
Jeff Moden (2/3/2014)
Personally and just as a non-confontational point of view, I think GUIDs as a clustered index is a really bad thing to do especially since the advent of SEQUENCE. One other problem that I have with them is that people think they're still "Globally Unique" and they haven't been since MS changed them from Type 1 GUIDs (machine sensitive) to Type 4 GUIDs (not much more than a really big random number). While the chances of duplicating a GUID between even several machines is astonomical, it is a possibility that can happen and a lot of people don't bother with unique constraints on supposed key columns (a basic fault in design knowledge, as well).Are GUIDs always bad, IMHO? Not on your life but I treat them much like Cursors and While loops. They have their uses but I generally avoid them.
On that same subject, I'll add to the list of favorite Kimberly Tripp links already posted above.
http://technet.microsoft.com/en-US/sqlserver/gg508879.aspx
To be sure, a good deal of the problem I have with GUIDs is that I'd rather read/troubleshoot a (say) 12 digit number than a 36 character hex representation. 😛
Jeff,
Do you happen to have any links describing the change from Type1 to Type4? Also, does part of the GUID identify the type of GUID it is? (If so, then this would reduce the number of possibilities that the GUID could produce.)
The 5 types are fully described here (RFC4122)]. The change was from using a (somewhat complicated) timestamp plus the 48bit MAC address of the system that generated the GUID to a system that used random (or pseudo-random) numbers. Type1 was flawed because (a) not all generators had a MAC address and (b) the rules for the timestamp couldn't work for systems that generated GUIDs too often (ie sometimes generated for than a few tens of millions per second).
The type field uses 4 of the 128 bits of the GUID. Other bits are fixed as well two more bits are fixed in types 1,3,4 and 5 (and are the same in all of these types), so these types have onlt 122 bits that are variable. I can't remember what version 2 did, and it isn't in the RFC
Tom
February 4, 2014 at 9:17 am
L' Eomot Inversé (2/4/2014)Type1 was flawed because (a) not all generators had a MAC address and (b) the rules for the timestamp couldn't work for systems that generated GUIDs too often (ie sometimes generated for than a few tens of millions per second).
Weren't there also security implications of putting the MAC address of your machine into an easily-readable numeric format? I seem to recall some sort of complaints about that.
February 4, 2014 at 9:26 am
paul.knibbs (2/4/2014)
L' Eomot Inversé (2/4/2014)Type1 was flawed because (a) not all generators had a MAC address and (b) the rules for the timestamp couldn't work for systems that generated GUIDs too often (ie sometimes generated for than a few tens of millions per second).
Weren't there also security implications of putting the MAC address of your machine into an easily-readable numeric format? I seem to recall some sort of complaints about that.
I remember some discussion of that. Don't remember whether it actually influenced anything (probably I never knew).
Tom
Viewing 15 posts - 42,766 through 42,780 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply