DBA vs. Developer Question / Issue

  • Eric M Russell (2/19/2015)


    ZZartin (2/19/2015)


    Eric M Russell (2/19/2015)It's unclear how a table containing only Title + Author, and duplicates allowed, would be used by an application. If someone wanted to know what titles are available for purchase, they want to see a distinct list, and once a specific book has been purchased, there needs to be a ID to distinguigh WHICH book was purchased. There either needs to be one record per distinct book (primary key on BookID) or one record per Title + Author (primary key) with an additional inventory count column that can be decremented.

    If we're talking about non-relational lists that have been dumped into SQL Server, and nobody cares about duplicates, then as far as I'm concerned it's not even a real table, and the DBA shouldn't fret over it. It's like debating the nutritional value of a ball game hot dog.

    /shrug this got way too involved for what I thought was a simple theoretical.

    Title + Author (primary key) with an additional inventory count column

    Functionally this is the same result as what I originally described with more complicated modification functions.

    Without a primary key, the table isn't functional for anything, not even display in a listbox.

    Hmm... either way allows you to see and maintain how many books you have in a bookshelf with different methods of viewing and modifying that information in some cases easier or harder for one way vs. the other. Anyways like I said this is becoming far to complicated and off topic.

  • Eric M Russell (2/19/2015)


    ZZartin (2/19/2015)


    Eric M Russell (2/19/2015)It's unclear how a table containing only Title + Author, and duplicates allowed, would be used by an application. If someone wanted to know what titles are available for purchase, they want to see a distinct list, and once a specific book has been purchased, there needs to be a ID to distinguigh WHICH book was purchased. There either needs to be one record per distinct book (primary key on BookID) or one record per Title + Author (primary key) with an additional inventory count column that can be decremented.

    If we're talking about non-relational lists that have been dumped into SQL Server, and nobody cares about duplicates, then as far as I'm concerned it's not even a real table, and the DBA shouldn't fret over it. It's like debating the nutritional value of a ball game hot dog.

    /shrug this got way too involved for what I thought was a simple theoretical.

    Title + Author (primary key) with an additional inventory count column

    Functionally this is the same result as what I originally described with more complicated modification functions.

    Without a primary key, the table isn't functional for anything, not even display in a listbox.

    They're actually equivalent to a uniquely identifyable row with a count column. Since you can transform from one to another unambiguously, thats essentially how you'd handle it. Sure, its not good in a relational sense but that stuff is out there, so you might as well formulate a theory about it, right?

    I know because I've had to handle the exact same situation, and I'm not in the position to change it, and when I code for it, I convert to a unique row with a count column.

    One advantage of a clustered key is that it will benefit non-clustered indexes, because the clustered key is used as the bookmark for faster lookups.

    Still looking for an explanation for that one LOL

  • patrickmcginnis59 10839 (2/19/2015)


    One advantage of a clustered key is that it will benefit non-clustered indexes, because the clustered key is used as the bookmark for faster lookups.

    Still looking for an explanation for that one LOL

    In a heap - all indexes are non-clustered. They point to the Internal rowID managed by the heap. Once you've found the original spot, you then need to follow all of the forwarding pointers through the heap to get the "latest iteration" of the record. If there are lots of updates on said,heap this is substantially less efficient than what happens with a clustered index.

    When a table has a clustered index, the table is organized in the clustered key order, so a given revord is kept in one "location" (if you have BLOB/CLOB fields, technically one location for the short fields and one for the LOB fields).. Non-clustered index point to the clustered index key to allow you to do the lookup in a single shot (it points you to the page where the record is). This is more efficient in general than the retrieval process of a heap. One additional bonus if the clustered key is a natural key is that your non-clustered includex in this scenario is more likely to be covering, since it has the "sorting columns" from the NCI + the columns in the CI.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • patrickmcginnis59 10839 (2/19/2015)Ok I'll bite, why is the row lookup slower on heaps? There is still an (invisible) key to the row.

    OK

    When a table is non-clustered heap, SQL Server adds an 8 byte sequential uniqueifier to each record, and this uniqueifier is used as the bookmark in non-clustered indexes.

    However, when a table contains a unique clustered key, that key is instead used as the bookmark, and this key can potentially be more narrow than an auto-generated 8 bytes rowid, so the lookups can be optimized in that perhaps marginal way.

    More compelling is that adding a clustered key on a table can potentially avoid bookmark lookups entirely for some queries, if the selection criteria can be fullfilled by the clustered index. For example, consider a clustered index on CustomerID versus a non-clustered key on CustomerID. Avoiding non-clustered index bookmarks is one way to optimize lookups.

    Also, when a non-clustered index is created on a table with a unique clustered key, the clustered columns are included in the non-clusterered index and can be used for covering queries.

    For example, let's assume there is a Customer table with a unique clustered index on CustomerID and a non-clustered index on (FirstName, LastName). If you run the following query:

    select CustomerID, FirstName, LastName from Customer where FirstName = 'John' and LastName = 'Smith';

    All (3) columns, CustomerID + FirstName + LastName, are fetched from the non-clustered index and there is no need for SQL Server to bookmark fetch to the Customer table.

    In constast, if Customer is heap with only a non-clustered index on (FirstName, LastName), then SQL Server must bookmark lookup into the Customer table using an auto-rowid just to get the additional CustomerID column.

    It really is too funny when you think about it.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • edited, I found a citation for Matt's description.

    Matt Miller (#4) (2/19/2015)


    patrickmcginnis59 10839 (2/19/2015)


    One advantage of a clustered key is that it will benefit non-clustered indexes, because the clustered key is used as the bookmark for faster lookups.

    Still looking for an explanation for that one LOL

    In a heap - all indexes are non-clustered. They point to the Internal rowID managed by the heap. Once you've found the original spot, you then need to follow all of the forwarding pointers through the heap to get the "latest iteration" of the record. If there are lots of updates on said,heap this is substantially less efficient than what happens with a clustered index.

    edit: Ok I have finally found a description of the link walking thanks to Matt's and Grant's prodding, they're called "forwarding stubs" in the page linked here:

    https://www.simple-talk.com/sql/database-administration/sql-server-storage-internals-101/

  • Eric M Russell (2/19/2015)


    patrickmcginnis59 10839 (2/19/2015)Ok I'll bite, why is the row lookup slower on heaps? There is still an (invisible) key to the row.

    OK

    When a table is non-clustered heap, SQL Server adds an 8 byte sequential uniqueifier to each record, and this uniqueifier is used as the bookmark in non-clustered indexes.

    Its my understanding that this is not the case. There is no need for a uniquefier unless you have a non unique key, and by definition heaps have no keys at all as without nonclustered indexes, there is no key to index, and heaps ALWAYS START WITHOUT AN INDEX. To prove me wrong, I'm going to need a citation 🙂

  • patrickmcginnis59 10839 (2/19/2015)


    Eric M Russell (2/19/2015)


    patrickmcginnis59 10839 (2/19/2015)Ok I'll bite, why is the row lookup slower on heaps? There is still an (invisible) key to the row.

    OK

    When a table is non-clustered heap, SQL Server adds an 8 byte sequential uniqueifier to each record, and this uniqueifier is used as the bookmark in non-clustered indexes.

    Its my understanding that this is not the case. There is no need for a uniquefier unless you have a non unique key, and by definition heaps have no keys at all as without nonclustered indexes, there is no key to index, and heaps ALWAYS START WITHOUT AN INDEX. To prove me wrong, I'm going to need a citation 🙂

    The uniquifier is added to clustered indexes if they're not unique, not to other indexes or heaps.

    A heap's RID is unique because it's a location on the disk (although an RID can just be the location of a pointer to another RID to another RID, etc.). A heap table with a nonclustered index, the pointer in the nonclustered index is to the RID.

    If you had a unique key to a nonclustered index, then it's just unique based on the key. But, to get to the data from the nonclustered index, it's still pointing to the RID.

    For a full cite, see Kalen Delaney's book on SQL Server Internals.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (2/19/2015)


    patrickmcginnis59 10839 (2/19/2015)


    Eric M Russell (2/19/2015)


    patrickmcginnis59 10839 (2/19/2015)Ok I'll bite, why is the row lookup slower on heaps? There is still an (invisible) key to the row.

    OK

    When a table is non-clustered heap, SQL Server adds an 8 byte sequential uniqueifier to each record, and this uniqueifier is used as the bookmark in non-clustered indexes.

    Its my understanding that this is not the case. There is no need for a uniquefier unless you have a non unique key, and by definition heaps have no keys at all as without nonclustered indexes, there is no key to index, and heaps ALWAYS START WITHOUT AN INDEX. To prove me wrong, I'm going to need a citation 🙂

    The uniquifier is added to clustered indexes if they're not unique, not to other indexes or heaps.

    A heap's RID is unique because it's a location on the disk (although an RID can just be the location of a pointer to another RID to another RID, etc.). A heap table with a nonclustered index, the pointer in the nonclustered index is to the RID.

    If you had a unique key to a nonclustered index, then it's just unique based on the key. But, to get to the data from the nonclustered index, it's still pointing to the RID.

    For a full cite, see Kalen Delaney's book on SQL Server Internals.

    Thanks Grant! I've also finally understood Matt's point, I have found the linked list, they're the "forwarding stubs" described here:

    https://www.simple-talk.com/sql/database-administration/sql-server-storage-internals-101/

  • So, each Heap has 16 columns that will include player info such as Account#, first, last, dob, address, and then the rest that are regarding their play that includes number of pulls, money in, money won, jackpot won, etc.

    Because of this not sure what could be used as a primary key since there are multiple entries involved for players.

    The value of having an index, and what columns are in the index, will depend on what the developer's stored procedure does. So for example, if the developer's stored proc does a select on columns based on pulls, or last name, or dob, then those are the candidate columns to go in the index so that the selects run quicker.

    That is the reason to have an index. (Other than to comply with the rules mentioned above previously that all tables must have indexes. 🙂

    So you need to find out what queries are to be run against your tables and design indexes accordingly (based on the columns used in the where clauses in the sproc for example).

    Performance wise this may make something that currently takes an hour to run only take 30 seconds. Or it may not make any difference, depending on the queries in the stored proc. So you need to analyse the code in the stored proc.

    Also remember, when a table has many indexes then there is a performance hit when the table is being updated because the indexes ALSO have to be updated. This performance hit may not matter (if running overnight for example). Or it may be small enough to be outweighed by the performance improvement they bring.

    Can you arrange access to a test system with prod size tables where you can test the response times of with and without indexes using this developer's stored procedure?

    Rgds, Dave

  • A senior dev doesn't necessarily have the upper hand over a junior DBA. She has a different set of priorities to you, and in this case the structure of the table is more a DBA responsibility than a dev one. On the other hand you need to approach the problem carefully because you need to avoid attracting unnecessary criticism, which will damage your relationship or reputation in the future.

    First I would suggest you share your concerns with a senior DBA. You'll get credit for noticing the situation and they'll have the experience to know if anything stands in the way of your suggestion, or they'll have the ability to look and see. Or they'll give you the go-ahead to do as others have suggested - build a clustered version of one of the tables somewhere (a dev instance hopefully!), point a copy of the SP at it, and show how much faster it runs.

    Don't change the table under the dev's feet - that's unprofessional, and plain bad manners.

  • butcherking13 (2/18/2015)


    I was hoping the tenured DBA’s would chime in on this. I work with a developer who has more experience as I am a very junior DBA. She is creating several stored procedures that create reports based off single heaps. Each heap is a different customer, different data and has customer account numbers, but multiple entries so no primary key.

    I brought up creating a clustered index on the heaps to help performance since some have a few million rows, but she is against it and not sure why. In this situation would you just create the indexes without communication or her agreement? Is this normal?

    Thank you for any response!

    What reliable business data will reporting from a Heap provide?

    Not being sarcastic, just need to know to give you a good answer.

    If a Heap is required then you can't add a clustered Index. That would mean it is no longer a heap.

    You use a Non-clustered index on a heap table to improve performance. Could be why she is saying now.

    https://technet.microsoft.com/en-us/library/ms188270(v=sql.105).aspx

    Read this. Understand This. Then go back to your developer and ask for things that do not make her heap no longer a heap.

  • Luis Cazares (2/18/2015)


    An experienced developer that likes to work with heaps in SQL Server is as good as a newbie.

    I guess the correct option would be to prevent her on what you're going to do and create the required indexes. I can't see a reason of why would someone prefer a heap over a clustered index. I recently generated a clustered index on a table and a simple query went down from over ten minutes to a few seconds.

    Stated like a true Noob... lol...

  • Jeff Moden (2/18/2015)


    Guess I'll be one of the few dissenters here. Although it's a "best practice" to have clustered indexes on all permanent tables, remember that best practices are guidelines to help keep people, that might otherwise not know about something, out of deep Kimchie. The fact that the heaps the OP mentioned are very specialized may change some of the basic rules. An example of such a thing is the code that I created in the "Hierarchies on Sterioids #1" article where I actually document in the code that you don't actually want any form of index to be on the table at that point because of the negative impact having such an index has.

    The right thing to do is posted two posts above this one where it says...

    ...so I will do some tests with a few different heaps and document the difference.

    +1 to all that Jeff. Lot of opinions here from those that don't understand heap tables.

  • ZZartin (2/19/2015)


    Eric M Russell (2/19/2015)


    Grant Fritchey (2/19/2015)


    shane.green 1227 (2/19/2015)


    There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.

    So to stick up for the senior developer (even if the supplied information is a little lacking :-))

    A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.

    ...

    ...

    Welcome to the party and thanks for posting.

    No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.

    In the original post, it was stated that table contains no primary key. Regardless of what the table contains (datatypes or internal value structure), all tables at least need a primary key. Whether that key is also clustered or whether other non-unique indexes are included is optional.

    Ideally the table would have a narrow primary key that is also clustered and with the leading column sequentially incrmenting. For example on a table that contains a count of sales by Period and Region it would be: PeriodKey, SalesRegionKey.

    Why?

    Not all tables have a logically primary key, and throwing an identity column on a table just for the sake of making it have a primary key doesn't necessarily help anything.

    +1 one to all that.

  • PHYData DBA (2/25/2015)


    What reliable business data will reporting from a Heap provide?

    Not being sarcastic, just need to know to give you a good answer.

    If a Heap is required then you can't add a clustered Index. That would mean it is no longer a heap.

    The results of a query will not change between a heap and a table with a clustered index. That's part of the requirements for a relational database, that the structure not affect the results.

    The performance may be different, the operators used by the QP will be different, but the results can't be. If they are, that's a bug in SQL and the DBA should call CSS and log a case.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 31 through 45 (of 63 total)

You must be logged in to reply to this topic. Login to reply