Are the posted questions getting worse?

  • Jeff Moden (2/3/2014)


    Grant Fritchey (2/3/2014)


    Please, please, please, no piling on, but I'm curious what you guys think about this: Unreasonable Defaults. I think most of the fundamental info is fine. I'm just... concerned?... with the statements around clustered indexes and the recommendations to go with a heap instead. It's basically 180 degrees out from what I recommend. I've been thinking about it, and I'm still comfortable with my recommendations, but it is a consideration. Curious what you guys think.

    And again, this isn't an invite to beat on someone's head.

    I saw that article via a LinkedIn post (to which I responded to sometime over the weekend, IIRC) that he'd created the article.

    In some aspects, I agree but he sure did take a hell of a long time to say what we already know... that RID lookups can be terribly expensive whether you're looking up a single row or a million rows. So, IMHO, nothing new there. He just took way too long to explain the RID lookup problem.

    The thing that really concerns me is his almost casual suggestion (see the ***BAM*** comment he made) that you use INCLUDE to make covering indexes. While that certainly avoids what he calls the "Clustered Index Penalty", I'm hoping that people don't read into that and start making coving indexes for every bloody query because of the data duplication caused by every non-Clustered Index. It doesn't take very many such indexes to turn a large but managable table into a monster that will eat huge amounts of disk space, especially during extent splits when adding new data, that will also easily double or triple already long index maintenance times not to mention giving the database a much larger footprint and related duration for backups and restores.

    The other thing that he didn't test was INSERTs. I use Clustered Indexes for IDENTITY PKs like a lot of folks do but not for the same reasons. I use them to control the order of inserts to prevent page splits. Ostensibly, having a heap prevents that but even heaps will split on UPDATEs. Contrary to belief, disk space isn't as cheap as a lot of people think it is. Correct... the disks themselves don't cost that much.

    A possible glaring fault with his testing is the order of the HEAP itself. What order were the rows inserted in? If, as typical with a heap, the rows were inserted in order by date (sales_date, in this case) and the only thing you ever looked up ranges of data by was the sales_date, then I agree... the HEAP will have less reads than if a clustered index is present even if the clustered index is on the date. However, what happens if you lookup the 200,000 rows based on criteria for a different indexed column? If the index is covering, we know it won't matter but if the index is not covering, what happens to the number of reads then? I'm thinking full table scan or, worse yet, a page read per row much like a wayward Index Seek can be.

    To wit, I very much appreciate his findings and concerns but I believe his testing is grossly inadequate to support the conclusion that he's apparently reached. More testing needs to be done before anyone uses terms such as "Unreasonable Defaults" or {gasp} new "Best Practice". And, yes, I've seen this phenomena in my own testing where not even having an index for a date range lookup on a heap was almost as fast as traditional indexing but I've also seen where the addition of an index, even one that does a RID lookup, can be make the difference between night and day, so I'm not bagging everything this guy is raking.

    This guy isn't the only person of such ilk when it comes to indexes. I also see that Brent Ozar published an article about what a waste of time it is to even REORGANIZE indexes. While I, again, see some merit in his proposal, I don't believe that enough testing has been done to make it a standard practice where I work.

    And, to be absolutely clear... I'm not saying that either is wrong. I'm just saying that I've not seen enough testing to form a standard opinion of either practice other than "It Depends". Further testing may very well prove both individuals right.

    At my last company I had a system where I removed the PK on the table before doing daily, bulk inserts and added it back when I was done. After doing some further reading and attending a SQL Server user's meeting I changed it to remove the index drop and re-add and the inserts sped up appreciably.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • wolfkillj (2/4/2014)


    Background to the following rant:

    My company deals with a vendor who serves our needs by allowing us to connect directly to its server and code our own stored procedures in a dedicated database to access the data. The data lives in another database, and because we buy only a subset of the data, the vendor exposes that data to us through views, which are the only objects for which we have any permissions in that other database.

    <Rant>

    I woke up this morning to complaints that our applications were timing out because of long response times from this vendor's server. Fortunately, I insisted that the vendor give us SHOWPLAN permissions on both the dev and production servers. When I requested an estimated execution plan for the long-running proc, I found that the vendor has apparently added a WHERE clause to the definition of a view that the stored proc reference. The condition?

    [columnA] <> '[somestring]' AND LEN(columnA) = [someint] AND ISNUMERIC([columnA]) = 1

    So, now we are waiting for index scans (of 21.5 million rows) and evaluation of these three conditions EVERY TIME we SELECT from this view. No wonder our procs that used to run in several seconds are now taking several minutes.

    The worst part (perhaps)? This was the vendor's response to our complaint that [columnA] included values outside the permissible domain ([columnA] is a standardized code established by a major quasi-governmental corporation) that were causing our app to return nonsensical results and/or throw errors.

    </Rant>

    Update: Once the vendor removed the conditions from the view definitions and performance did not return to previous levels, we discovered that the vendor had also dropped an index that our procedure requires. Now that the index is back in place, all is well except that my forehead is sore from banging it against my desk.

    Jason Wolfkill

  • Grant,

    My personal concern with that article is bits and pieces like this:

    Unlike heap tables, clustered indexes are “living creatures” that move rows around as needed to maintain their properties (i.e.: the row order and tree balance). Consequently the non-clustered index can’t use the physical address as reference anymore because it could change at any time.

    Now, correct me if I'm wrong, but SQL Server never uses the Physical Addressing, that's an Oracle conceit. RID <> Physical Address, it's a logical address. He's Oracle centric and using SQL Server as a cheap install to do his work on and prove his assumptions.

    As mentioned earlier, he's trying to make a standard call across too many simultaneous data engines.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks for all the excellent replies about the clustered index/heap thing. You've all more or less reinforced my opinions of things through some great factual discussions. That's why I love the water cooler.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Evil Kraig F (2/4/2014)


    Grant,

    My personal concern with that article is bits and pieces like this:

    Unlike heap tables, clustered indexes are “living creatures” that move rows around as needed to maintain their properties (i.e.: the row order and tree balance). Consequently the non-clustered index can’t use the physical address as reference anymore because it could change at any time.

    Now, correct me if I'm wrong, but SQL Server never uses the Physical Addressing, that's an Oracle conceit. RID <> Physical Address, it's a logical address. He's Oracle centric and using SQL Server as a cheap install to do his work on and prove his assumptions.

    RID is pretty close to a physical address. It's File Number:Page Number:Slot index. Location within a database file, not location on a disk. Yes, with a clustered index you have a extra level of indirection, this is a good thing. As it is, if a row in a heap has to move, the RID changes, if a row in a clustered index moves from one page to another, the clustered index key doesn't change.

    Edit: Oh, and rows can and do move within a heap, so his starting premise there is incorrect. Since the starting premise is wrong, all the conclusions drawn are unfounded.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (2/4/2014)


    RID is pretty close to a physical address. It's File Number:Page Number:Slot index. Location within a database file, not location on a disk. Yes, with a clustered index you have a extra level of indirection, this is a good thing. As it is, if a row in a heap has to move, the RID changes, if a row in a clustered index moves from one page to another, the clustered index key doesn't change.

    Hopefully we can keep this short and sweet, or we can mosey to another thread, but help me understand something Gail (and others)?

    The Oracle mechanism of the physical addressing is a signficant improvement because it removes all translation structures for row recovery from the non-clustered lookup upon finding the correct values. With a RID, you still need to do a lookup form because you have to locate where the File/Page/Slot is on the drive, and then you can perform your I/O. Clustered index or RID, you still lose that optimization path, so it's moot... no?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • What a crazy day! Lots of snow falling around Kansas City. Pretty much the whole city is shut down today and quite probably tomorrow. The snow got started in earnest around 10am and is supposed to snow continuously for around 24 hours. I live out of the West side of KC and the last check of the weather folks had us right in the bulls-eye; expecting 10-12" snow.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (2/4/2014)


    What a crazy day! Lots of snow falling around Kansas City. Pretty much the whole city is shut down today and quite probably tomorrow. The snow got started in earnest around 10am and is supposed to snow continuously for around 24 hours. I live out of the West side of KC and the last check of the weather folks had us right in the bulls-eye; expecting 10-12" snow.

    Glad we are just getting another shot of cold.

    Might be another late year for open water.

    Drive safe!

  • Evil Kraig F (2/4/2014)


    The Oracle mechanism of the physical addressing is a signficant improvement because it removes all translation structures for row recovery from the non-clustered lookup upon finding the correct values. With a RID, you still need to do a lookup form because you have to locate where the File/Page/Slot is on the drive, and then you can perform your I/O. Clustered index or RID, you still lose that optimization path, so it's moot... no?

    I have no idea how Oracle works.

    With a physical RID, SQL has to just ask the OS for an 8k read at offset 8192*PageNumber in the file identified by File ID (assuming it's not in memory)

    With a clustered index key, it has to query the metadata to locate the root page for that index, read that page, crack the page, read the rows to identify which page is needed in the level below. Fetch that page, Repeat until the leaf page is read.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (2/4/2014)


    I have no idea how Oracle works.

    Well, I only know enough to be dangerous to myself and others, so I'm right there with you.

    With a physical RID, SQL has to just ask the OS for an 8k read at offset 8192*PageNumber in the file identified by File ID (assuming it's not in memory)

    With a clustered index key, it has to query the metadata to locate the root page for that index, read that page, crack the page, read the rows to identify which page is needed in the level below. Fetch that page, Repeat until the leaf page is read.

    Interesting. I didn't realize SQL worked that closely with the OS. I know SANs and the like, particularly across RAIDs, don't typically let the OS get that close whereas Oracle can as far as I understand. At this point though I'm no longer just trying to swim in the deep end but I fear a 'foot in mouth' typhoon. Hopefully one of our dual-system folks can help me understand this further.

    Thanks for the information, as always, Gail. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (2/4/2014)


    I didn't realize SQL worked that closely with the OS.

    Standard Windows API call that any app can use. http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469%28v=vs.85%29.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • L' Eomot Inversé (2/4/2014)


    WayneS (2/3/2014)


    Jeff Moden (2/3/2014)


    Personally and just as a non-confontational point of view, I think GUIDs as a clustered index is a really bad thing to do especially since the advent of SEQUENCE. One other problem that I have with them is that people think they're still "Globally Unique" and they haven't been since MS changed them from Type 1 GUIDs (machine sensitive) to Type 4 GUIDs (not much more than a really big random number). While the chances of duplicating a GUID between even several machines is astonomical, it is a possibility that can happen and a lot of people don't bother with unique constraints on supposed key columns (a basic fault in design knowledge, as well).

    Are GUIDs always bad, IMHO? Not on your life but I treat them much like Cursors and While loops. They have their uses but I generally avoid them.

    On that same subject, I'll add to the list of favorite Kimberly Tripp links already posted above.

    http://technet.microsoft.com/en-US/sqlserver/gg508879.aspx

    To be sure, a good deal of the problem I have with GUIDs is that I'd rather read/troubleshoot a (say) 12 digit number than a 36 character hex representation. 😛

    Jeff,

    Do you happen to have any links describing the change from Type1 to Type4? Also, does part of the GUID identify the type of GUID it is? (If so, then this would reduce the number of possibilities that the GUID could produce.)

    The 5 types are fully described here (RFC4122)]. The change was from using a (somewhat complicated) timestamp plus the 48bit MAC address of the system that generated the GUID to a system that used random (or pseudo-random) numbers. Type1 was flawed because (a) not all generators had a MAC address and (b) the rules for the timestamp couldn't work for systems that generated GUIDs too often (ie sometimes generated for than a few tens of millions per second).

    The type field uses 4 of the 128 bits of the GUID. Other bits are fixed as well two more bits are fixed in types 1,3,4 and 5 (and are the same in all of these types), so these types have onlt 122 bits that are variable. I can't remember what version 2 did, and it isn't in the RFC

    Tom - this info is awesome and fits in great with a presentation that I've just started doing. Thanks!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • GilaMonster (2/4/2014)


    Evil Kraig F (2/4/2014)


    I didn't realize SQL worked that closely with the OS.

    Standard Windows API call that any app can use. http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469%28v=vs.85%29.aspx

    I find it so very nice that people still reference the Windows API. With today's world of .NET and server-side everything, many developers have forgotten about the API exposed by the OS. I used to use it all the time, but it has been a while.

  • paul.knibbs (2/4/2014)


    L' Eomot Inversé (2/4/2014)Type1 was flawed because (a) not all generators had a MAC address and (b) the rules for the timestamp couldn't work for systems that generated GUIDs too often (ie sometimes generated for than a few tens of millions per second).

    Weren't there also security implications of putting the MAC address of your machine into an easily-readable numeric format? I seem to recall some sort of complaints about that.

    Yes... that was one of the advertised concerns because the source of a row, in theory, could be traced to a given machine because you could get the MAC address for the machine. Personally, I thought that was a feature rather than a fault. Proper security in the right areas should prevent that from being any kind of a back door into your machine.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Lange (2/4/2014)


    What a crazy day! Lots of snow falling around Kansas City. Pretty much the whole city is shut down today and quite probably tomorrow. The snow got started in earnest around 10am and is supposed to snow continuously for around 24 hours. I live out of the West side of KC and the last check of the weather folks had us right in the bulls-eye; expecting 10-12" snow.

    Sean,

    Snowed in? At least that ought to give you time to work on the UniqueIdentifier articles! :w00t::w00t:

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 42,781 through 42,795 (of 66,688 total)

You must be logged in to reply to this topic. Login to reply