February 3, 2014 at 4:54 am
Grant,
Welcome back - hope you had fun?
To tell you the truth I think it was the way the picture had been tweeted. Can't remember who by, but I think there was some digital enhancements going on - but you looked RED !
Rodders...
February 3, 2014 at 5:06 am
Revenant (1/27/2014)
I wish that MSFT introduces a T-SQL construct like this:RUN ON THREAD a
(some T-SQL)
RUN ON THREAD b
(another T-SQL)
AWAIT a AND b
(runs after both a and b finish)
;
This is DEFAULT in Cosmos, for crying out loud! Why not in T-SQL, developed by the same division?
(You may replace 'on thread a' by 'parallel' or so.)
edits: posted before I finished (a key combination?)
I'm not sure it's a good idea. I've worked at both extremes: where parallelism and synchronisation are explicit, as you describe, and where it's completely behind the scenes, and also where it's mixed: both thoroughly hybrid - it's behind the scenes but user can explicitly split/join/prevent parallelism/etcetera - and slightly mixed where it's behind the scenes but user can tweak it a bit, as in MS SQL Server). I've found the mixed scenarios tend to allow the programmer to do "useful" :angry: things which are the equivalents of using WITH(NOLOCK) to boost performance when absolute accuracy is needed, or using SERIALIZABLE isolation level and/or lots of WITH(TABLOCKX) hints for an OLTP system withere all transactions are trivial and READ COMMITTED would be perfectly adequate, so I tend to deprecate them. I'm very happy doing simple parallel stuff completely by hand, and I'm equally happy parallelism all to the system when the system is well designed, and I'm reasonably happy to use mixed systems too, but I hate having to cope with the mess the avarerage developer makes of mixed system/programmer parallelism.
Tom
February 3, 2014 at 5:31 am
Sean Lange (1/31/2014)
wolfkillj (1/31/2014)
Ed Wagner (1/31/2014)
Luis Cazares (1/30/2014)
Would this help?http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx
Maybe even a post from a GUID lover:
http://blogs.msdn.com/b/sqlazure/archive/2010/05/05/10007304.aspx
Or even BOL on Using uniqueidentifier Data:
http://technet.microsoft.com/en-us/library/ms190215(v=sql.105).aspx
I especially like Kimberly's article Luis posted here. Sean, this may be a good first step in illustrating your point. There's the index fragmentation problem and also the space requirements, which Kimberly illustrates well. Another point is that all NCIs will inherit the bytes of the primary key for each row, so you're going to pay the price for space more than once. I got the opportunity to hear Kimberly speak once about keys and she really drove the point home about making sure your primary keys are narrow, unique, never-changing and ever-increasing.
Being a fan of integers myself, I found myself wondering how the two data types would stack up against each other in a very simple scenario. So, I created a couple of 1M row test table and compared a few simple tests.
Don't forget that since all those extra bytes of GUID primary keys embedded in non-clustered indexes result in fewer rows per page, it will also require more pages of the buffer cache (i.e., more RAM) to hold the same number of rows compared to a table with exactly the same data but an int or bigint primary key. GUID primary keys are chewing up both I/O and RAM, requiring an even stronger justification for incurring this cost. If I have time today, I'll look at the memory grants requested and granted for Ed's test data and queries - I expect we'll see some difference.
Thanks all. You have given me some direction to find some more solid evidence as to why this is a bad idea. Kimberly Tripp's article was one I remember reading but couldn't find it yesterday. I knew that using a guid as the clustered key would cause huge fragmentation but need something behind to back it up.
Do write up an article on it. I've used GUIDs and they've worked well enough. Not perfect mind you, but not the gotterdammerung nightmare people usually ascribe to them. I'd love to see where you see variations on the common wisdom around them.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 3, 2014 at 7:48 am
Grant Fritchey (2/3/2014)
wolfkillj (1/29/2014)
I created and used my first extended events session last week, and it was great - easy and extremely helpful. I just posted about it on my blog, if you're interested in reading about my experience:WHOOPP!!! Extended Events.
Right back at ya, Grant - the segment on extended events in your PASS Summit 2012 performance tuning pre-con blew me away.
Since I have your ear on this topic for a minute, do you know of any good references that describe the significance of each of the 100+ extended events - things like exactly what happens in the database to trigger each event (that's not always clear from their names, especially to those of us with a more superficial understanding of the inner workings of SQL Server) and what the data collected actually represents? There doesn't seem to be any BOL documentation on this (that I could find, at least). I've found and learned much from resources like Jonathan Kehayias's 31 Days of Extended Events blog series, including his queries to view the metadata for extended events (which itself is fairly cryptic). It would be nice to have single source that explains each extended event in plain terms, though.
If the thing does not yet exist and someone were to start putting one together, I'd be happy to contribute write-ups of the (very few) extended events that I understand well enough to explain to others.
Jason Wolfkill
February 3, 2014 at 7:54 am
wolfkillj (2/3/2014)
Grant Fritchey (2/3/2014)
wolfkillj (1/29/2014)
I created and used my first extended events session last week, and it was great - easy and extremely helpful. I just posted about it on my blog, if you're interested in reading about my experience:WHOOPP!!! Extended Events.
Right back at ya, Grant - the segment on extended events in your PASS Summit 2012 performance tuning pre-con blew me away.
Since I have your ear on this topic for a minute, do you know of any good references that describe the significance of each of the 100+ extended events - things like exactly what happens in the database to trigger each event (that's not always clear from their names, especially to those of us with a more superficial understanding of the inner workings of SQL Server) and what the data collected actually represents? There doesn't seem to be any BOL documentation on this (that I could find, at least). I've found and learned much from resources like Jonathan Kehayias's 31 Days of Extended Events blog series, including his queries to view the metadata for extended events (which itself is fairly cryptic). It would be nice to have single source that explains each extended event in plain terms, though.
If the thing does not yet exist and someone were to start putting one together, I'd be happy to contribute write-ups of the (very few) extended events that I understand well enough to explain to others.
Nope. You've just hit the two best sets of documentation, BOL and Jonathan. He is writing a book (and I'm helping tech edit it), but even that won't have a complete reference for what all of them are. We are stuck with Microsoft for that.
You can pull the explanations from the system tables that define the events. I don't have that precise table in my head, but a little google-fu ought to bring it up.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 3, 2014 at 7:58 am
wolfkillj (1/29/2014)
I created and used my first extended events session last week, and it was great - easy and extremely helpful. I just posted about it on my blog, if you're interested in reading about my experience:
And in other blog-related news, this noob blogger has been fascinated to watch my post on Data Types for Storing Latitude and Longitude Coordinates[/url] work its way into the top 10 results for Google searches that include some subset of the terms "SQL Server", "latitude" and/or "longitude", "data type(s)", and "storage/storing". I suppose most veteran bloggers wouldn't even pay much attention to this, but I'm getting a kick out of it, for the additional page views and exposure, sure, but more for the thought that I have shared some knowledge that may answer people's questions on this topic.
Jason Wolfkill
February 3, 2014 at 8:02 am
Grant Fritchey (2/3/2014)
wolfkillj (2/3/2014)
Grant Fritchey (2/3/2014)
wolfkillj (1/29/2014)
I created and used my first extended events session last week, and it was great - easy and extremely helpful. I just posted about it on my blog, if you're interested in reading about my experience:WHOOPP!!! Extended Events.
Right back at ya, Grant - the segment on extended events in your PASS Summit 2012 performance tuning pre-con blew me away.
Since I have your ear on this topic for a minute, do you know of any good references that describe the significance of each of the 100+ extended events - things like exactly what happens in the database to trigger each event (that's not always clear from their names, especially to those of us with a more superficial understanding of the inner workings of SQL Server) and what the data collected actually represents? There doesn't seem to be any BOL documentation on this (that I could find, at least). I've found and learned much from resources like Jonathan Kehayias's 31 Days of Extended Events blog series, including his queries to view the metadata for extended events (which itself is fairly cryptic). It would be nice to have single source that explains each extended event in plain terms, though.
If the thing does not yet exist and someone were to start putting one together, I'd be happy to contribute write-ups of the (very few) extended events that I understand well enough to explain to others.
Nope. You've just hit the two best sets of documentation, BOL and Jonathan. He is writing a book (and I'm helping tech edit it), but even that won't have a complete reference for what all of them are. We are stuck with Microsoft for that.
You can pull the explanations from the system tables that define the events. I don't have that precise table in my head, but a little google-fu ought to bring it up.
Jonathan's blog post An XEvent a Day (2 of 31) – Querying the Extended Events Metadata includes the code for queries that return the extended events metadata from the DMVs, for our listeners at home.
EDIT: Any word on when Jonathan's book will hit the streets (or "drop", as the kids today seem to say)?
Jason Wolfkill
February 3, 2014 at 8:09 am
wolfkillj (2/3/2014)
Grant Fritchey (2/3/2014)
wolfkillj (2/3/2014)
Grant Fritchey (2/3/2014)
wolfkillj (1/29/2014)
I created and used my first extended events session last week, and it was great - easy and extremely helpful. I just posted about it on my blog, if you're interested in reading about my experience:WHOOPP!!! Extended Events.
Right back at ya, Grant - the segment on extended events in your PASS Summit 2012 performance tuning pre-con blew me away.
Since I have your ear on this topic for a minute, do you know of any good references that describe the significance of each of the 100+ extended events - things like exactly what happens in the database to trigger each event (that's not always clear from their names, especially to those of us with a more superficial understanding of the inner workings of SQL Server) and what the data collected actually represents? There doesn't seem to be any BOL documentation on this (that I could find, at least). I've found and learned much from resources like Jonathan Kehayias's 31 Days of Extended Events blog series, including his queries to view the metadata for extended events (which itself is fairly cryptic). It would be nice to have single source that explains each extended event in plain terms, though.
If the thing does not yet exist and someone were to start putting one together, I'd be happy to contribute write-ups of the (very few) extended events that I understand well enough to explain to others.
Nope. You've just hit the two best sets of documentation, BOL and Jonathan. He is writing a book (and I'm helping tech edit it), but even that won't have a complete reference for what all of them are. We are stuck with Microsoft for that.
You can pull the explanations from the system tables that define the events. I don't have that precise table in my head, but a little google-fu ought to bring it up.
Jonathan's blog post An XEvent a Day (2 of 31) – Querying the Extended Events Metadata includes the code for queries that return the extended events metadata from the DMVs, for our listeners at home.
EDIT: Any word on when Jonathan's book will hit the streets (or "drop", as the kids today seem to say)?
Not a clue. I've gone through three chapters. I know he has three more done, but I haven't seen them.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 3, 2014 at 8:11 am
L' Eomot Inversé (2/3/2014)
Revenant (1/27/2014)
I wish that MSFT introduces a T-SQL construct like this:RUN ON THREAD a
(some T-SQL)
RUN ON THREAD b
(another T-SQL)
AWAIT a AND b
(runs after both a and b finish)
;
This is DEFAULT in Cosmos, for crying out loud! Why not in T-SQL, developed by the same division?
(You may replace 'on thread a' by 'parallel' or so.)
edits: posted before I finished (a key combination?)
I'm not sure it's a good idea. I've worked at both extremes: where parallelism and synchronisation are explicit, as you describe, and where it's completely behind the scenes, and also where it's mixed: both thoroughly hybrid - it's behind the scenes but user can explicitly split/join/prevent parallelism/etcetera - and slightly mixed where it's behind the scenes but user can tweak it a bit, as in MS SQL Server). I've found the mixed scenarios tend to allow the programmer to do "useful" :angry: things which are the equivalents of using WITH(NOLOCK) to boost performance when absolute accuracy is needed, or using SERIALIZABLE isolation level and/or lots of WITH(TABLOCKX) hints for an OLTP system withere all transactions are trivial and READ COMMITTED would be perfectly adequate, so I tend to deprecate them. I'm very happy doing simple parallel stuff completely by hand, and I'm equally happy parallelism all to the system when the system is well designed, and I'm reasonably happy to use mixed systems too, but I hate having to cope with the mess the avarerage developer makes of mixed system/programmer parallelism.
While all of what you say is true, with thoughtful and careful use, I'd love that type of parallel ability for a lot of the stuff I do. It would also be one more nail I could drive into the coffin I currently have SSIS in. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 3, 2014 at 8:20 am
Jeff Moden (2/3/2014)
L' Eomot Inversé (2/3/2014)
Revenant (1/27/2014)
I wish that MSFT introduces a T-SQL construct like this:RUN ON THREAD a
(some T-SQL)
RUN ON THREAD b
(another T-SQL)
AWAIT a AND b
(runs after both a and b finish)
;
This is DEFAULT in Cosmos, for crying out loud! Why not in T-SQL, developed by the same division?
(You may replace 'on thread a' by 'parallel' or so.)
edits: posted before I finished (a key combination?)
I'm not sure it's a good idea. I've worked at both extremes: where parallelism and synchronisation are explicit, as you describe, and where it's completely behind the scenes, and also where it's mixed: both thoroughly hybrid - it's behind the scenes but user can explicitly split/join/prevent parallelism/etcetera - and slightly mixed where it's behind the scenes but user can tweak it a bit, as in MS SQL Server). I've found the mixed scenarios tend to allow the programmer to do "useful" :angry: things which are the equivalents of using WITH(NOLOCK) to boost performance when absolute accuracy is needed, or using SERIALIZABLE isolation level and/or lots of WITH(TABLOCKX) hints for an OLTP system withere all transactions are trivial and READ COMMITTED would be perfectly adequate, so I tend to deprecate them. I'm very happy doing simple parallel stuff completely by hand, and I'm equally happy parallelism all to the system when the system is well designed, and I'm reasonably happy to use mixed systems too, but I hate having to cope with the mess the avarerage developer makes of mixed system/programmer parallelism.
While all of what you say is true, with thoughtful and careful use, I'd love that type of parallel ability for a lot of the stuff I do. It would also be one more nail I could drive into the coffin I currently have SSIS in. 😉
Well, I tried to insert an image of a really big hammer. https://www.sqlservercentral.com/Forums/Uploads/image-unavailable.png
February 3, 2014 at 8:25 am
Grant Fritchey (2/3/2014)
Sean Lange (1/31/2014)
wolfkillj (1/31/2014)
Ed Wagner (1/31/2014)
Luis Cazares (1/30/2014)
Would this help?http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx
Maybe even a post from a GUID lover:
http://blogs.msdn.com/b/sqlazure/archive/2010/05/05/10007304.aspx
Or even BOL on Using uniqueidentifier Data:
http://technet.microsoft.com/en-us/library/ms190215(v=sql.105).aspx
I especially like Kimberly's article Luis posted here. Sean, this may be a good first step in illustrating your point. There's the index fragmentation problem and also the space requirements, which Kimberly illustrates well. Another point is that all NCIs will inherit the bytes of the primary key for each row, so you're going to pay the price for space more than once. I got the opportunity to hear Kimberly speak once about keys and she really drove the point home about making sure your primary keys are narrow, unique, never-changing and ever-increasing.
Being a fan of integers myself, I found myself wondering how the two data types would stack up against each other in a very simple scenario. So, I created a couple of 1M row test table and compared a few simple tests.
Don't forget that since all those extra bytes of GUID primary keys embedded in non-clustered indexes result in fewer rows per page, it will also require more pages of the buffer cache (i.e., more RAM) to hold the same number of rows compared to a table with exactly the same data but an int or bigint primary key. GUID primary keys are chewing up both I/O and RAM, requiring an even stronger justification for incurring this cost. If I have time today, I'll look at the memory grants requested and granted for Ed's test data and queries - I expect we'll see some difference.
Thanks all. You have given me some direction to find some more solid evidence as to why this is a bad idea. Kimberly Tripp's article was one I remember reading but couldn't find it yesterday. I knew that using a guid as the clustered key would cause huge fragmentation but need something behind to back it up.
Do write up an article on it. I've used GUIDs and they've worked well enough. Not perfect mind you, but not the gotterdammerung nightmare people usually ascribe to them. I'd love to see where you see variations on the common wisdom around them.
Personally and just as a non-confontational point of view, I think GUIDs as a clustered index is a really bad thing to do especially since the advent of SEQUENCE. One other problem that I have with them is that people think they're still "Globally Unique" and they haven't been since MS changed them from Type 1 GUIDs (machine sensitive) to Type 4 GUIDs (not much more than a really big random number). While the chances of duplicating a GUID between even several machines is astonomical, it is a possibility that can happen and a lot of people don't bother with unique constraints on supposed key columns (a basic fault in design knowledge, as well).
Are GUIDs always bad, IMHO? Not on your life but I treat them much like Cursors and While loops. They have their uses but I generally avoid them.
On that same subject, I'll add to the list of favorite Kimberly Tripp links already posted above.
http://technet.microsoft.com/en-US/sqlserver/gg508879.aspx
To be sure, a good deal of the problem I have with GUIDs is that I'd rather read/troubleshoot a (say) 12 digit number than a 36 character hex representation. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
February 3, 2014 at 8:55 am
Jeff Moden (2/3/2014) One other problem that I have with them is that people think they're still "Globally Unique" and they haven't been since MS changed them from Type 1 GUIDs (machine sensitive) to Type 4 GUIDs (not much more than a really big random number). While the chances of duplicating a GUID between even several machines is astonomical, it is a possibility that can happen and a lot of people don't bother with unique constraints on supposed key columns (a basic fault in design knowledge, as well).
Totally with everyone about avoiding GUIDS for clustered indexes, but the fact that they're (pseudo)randomly generated rather than being explicitly unique isn't a valid argument against their use. We can talk all we want about the odds, but frankly if people don't accept them, it's not logical to rely on SQL Server for anything (or really, computing in general).
Hash joins in execution plans work with similar odds of collision between different hashed values. CPUs, Memory and storage all work with similar odds of undetectable mutation/corruption. Let alone the error rates businesses have to deal with if a human's involved in anything (untold billions of times more likely).
February 3, 2014 at 10:38 am
edit: accidentally quoting much and saying nothing. Fingers or some weird key combination, don't know which
Tom
February 3, 2014 at 10:49 am
Jeff Moden (2/3/2014)
While all of what you say is true, with thoughtful and careful use, I'd love that type of parallel ability for a lot of the stuff I do. It would also be one more nail I could drive into the coffin I currently have SSIS in. 😉
Oh, if only the reasonably competent used it it would be great. Or if no-one not reasonably competent used it where I had repair the damage and/or carry the can. Or even if it were used only in shops where effective unit test, integration test, system test, final in-house QA were done (insptead of lip-service) before delivering to a Beta program involving customer acceptance trials leading to a final review of all feedback before general release. It would be a brilliant tool for the people who wouldnt misuse it. I'm definitely in favour of those people having it - I just hate the thought of coping with people who will misuse it; but I guess I generally mind the misuse less than not having the tools, especially now that I won't end up dealing with the mess.
Tom
February 3, 2014 at 11:14 am
HowardW (2/3/2014)
Totally with everyone about avoiding GUIDS for clustered indexes, but the fact that they're (pseudo)randomly generated rather than being explicitly unique isn't a valid argument against their use. We can talk all we want about the odds, but frankly if people don't accept them, it's not logical to rely on SQL Server for anything (or really, computing in general).Hash joins in execution plans work with similar odds of collision between different hashed values.
If I believed that I would be be leaping in with a better hash join that had a much greater probability of having clashes to deal with - it's pretty easy to handle clashes, and clashes at a much higher rate than likely guid clashes cost very little to handle in a join, and getting the probability down to that of guid clashes will cost more overall (including having a much longer hash to compare).
CPUs, Memory and storage all work with similar odds of undetectable mutation/corruption.
I hope not. And I'm pretty sure for most hardware that it is not quite that bad (although some hardware is not designed to be particularly error-free, or even to detect errors reliably, and you would be right where that hardware is concerned).
Let alone the error rates businesses have to deal with if a human's involved in anything (untold billions of times more likely).
Yes. And of course the obvious thing that involves humans is application software design - that makes a very big contribution to error rates. Another is human malice - maybe not quite as big a contribution, but not altogether negligible. And perhaps deliberate use of software known not to be viable, which may be the largest contributor of all, or may not; some thinkit is, but they work on the theory that corporate management is too busy being dishonest to look after the bottome line; others believe it isn't, relying on the theory that even corporate management can't be that stupid.
I think the point about GUID clashes is simple: either you can deal with with clashes or you can't. If you can, you can probably get by with something much simpler and cheaper than a GUID. If you can't, you have to trade off the cost of coping with those failures you can't deal with against the cost of reducing their frequency. It's quite possible that the GUID is positioned right for that trade-off in many practical cases. That's just common sense, and it doesn't rely on the rather weak analogies with hash joins or with hardware reliability that you seem to be basing the same conclusion on.
Tom
Viewing 15 posts - 42,736 through 42,750 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply