DBA vs. Developer Question / Issue

  • I was hoping the tenured DBA’s would chime in on this. I work with a developer who has more experience as I am a very junior DBA. She is creating several stored procedures that create reports based off single heaps. Each heap is a different customer, different data and has customer account numbers, but multiple entries so no primary key.

    I brought up creating a clustered index on the heaps to help performance since some have a few million rows, but she is against it and not sure why. In this situation would you just create the indexes without communication or her agreement? Is this normal?

    Thank you for any response!

  • What were her reasons why? And what were your reasons for wanting to add a clustered index?

    A clustered index does not automatically improve performance and there may already be non clustered indexs on the tables that she is using.

  • best practice is every permanent table has a clustered index. clustered indexes enhance performance.

    that doesn't mean you need a unique, primary key, just a clustered index at a minimum.

    Leaving a table as a heap has downsides like not releasing space used for deleted/updated rows.

    this was just the first article i found on clustered vs heap, which lists more than i could eloquently express

    http://www.mssqltips.com/sqlservertip/1254/clustered-tables-vs-heap-tables/

    HEAP

    •Data is not stored in any particular order

    •Specific data can not be retrieved quickly, unless there are also non-clustered indexes

    •Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages

    •Since there is no clustered index, additional time is not needed to maintain the index

    •Since there is no clustered index, there is not the need for additional space to store the clustered index tree

    •These tables have a index_id value of 0 in the sys.indexes catalog view

    source: SQL Server 2005 books online

    Clustered Table

    •Data is stored in order based on the clustered index key

    •Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns

    •Data pages are linked for faster sequential access

    •Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES

    •Additional space is needed to store clustered index tree

    •These tables have a index_id value of 1 in the sys.indexes catalog view

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • An experienced developer that likes to work with heaps in SQL Server is as good as a newbie.

    I guess the correct option would be to prevent her on what you're going to do and create the required indexes. I can't see a reason of why would someone prefer a heap over a clustered index. I recently generated a clustered index on a table and a simple query went down from over ten minutes to a few seconds.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Her reason why she didn't want it was it wouldn't make enough difference because of the size of the heaps and calculations they do, but I don't think that is correct.

    Also, we use SQL Server 2014 BI so can't use column store index since BI edition doesn't support it.

    I just don't understand how it would hurt and everything I have read from Paul Randal states a clustered index.

    I guess I am looking for proper protocol on this type of thing from the veterans. I am leaning toward just doing it as I believe it will make a difference.

  • If you think it'll improve performance, and it likely will if there's no specific reason she thinks it won't, then prove her wrong that's what dev environments are for 🙂

  • In my experience, developers do not have the same big-picture view of server health and performance that DBAs have (and need to have to do their work). Seasoned DBAs know that shaving milliseconds off queries and reducing CPU usage here and there can add up to massive benefits on a busy server.

    Protecting data integrity through the use of unique indexes, foreign keys and other constraints is also sometimes forgotten by developers in their rush to deliver the next great feature.

    DBAs, on the other hand, have server health and data integrity as prime concerns.

    Developers who adopt an attitude of "It's only small, so there is no need" (.. to develop in line with best practice) are just saving themselves work. Lazy, in other words.

    Good luck. You're among friends here.

    Disclaimer: I'm a developer, with some DBA tendencies 🙂 And I can think of just one scenario where I would use a heap – where fast-and-loose data loads are needed.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Since neither of you are sure if the cluster will improve or hurt performance, I'd suggest an experiment to find out. The real question is, from the heap, are they filtering the data in any way, or simply doing a scan? If they're just doing a scan, it's possible that the heap will work as well as, maybe even better than, a clustered index.

    I'm absolutely of the opinion that the vast majority of tables should have a clustered index, but then, the vast majority of tables are read using some sort of filter most of the time. It's that filter that drives indexing. No filter, indexes may not help you much.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Oh, and I wouldn't just do something to a table without consulting with the developer. If I have to show them the before and after execution plans, resource use and execution time, fine. Validating my assumptions and testing is a good thing. Educating my co-workers is also a good thing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • butcherking13 (2/18/2015)


    I was hoping the tenured DBA’s would chime in on this. I work with a developer who has more experience as I am a very junior DBA. She is creating several stored procedures that create reports based off single heaps. Each heap is a different customer, different data and has customer account numbers, but multiple entries so no primary key.

    I brought up creating a clustered index on the heaps to help performance since some have a few million rows, but she is against it and not sure why. In this situation would you just create the indexes without communication or her agreement? Is this normal?

    Thank you for any response!

    The real problem is that she's creating reports from a flat denormalized table with no primary key, which means there is a potential for duplicates and inaccurate reporting. A table should always have a primary key, but the question about whether the table should be clustered is more subjective. One advantage of a clustered key is that it will benefit non-clustered indexes, because the clustered key is used as the bookmark for faster lookups.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I appreciate the answers and to clarify there is definitely filtering involved in the procedures so I will do some tests with a few different heaps and document the difference. Then I can move forward and approach the developer with my results.

    Thanks again for the responses!

  • Yeah, if you're filtering, you need a clustered index. That's pretty standard. And, it should be the most common access path to the data. Where possible you should also look into making it as narrow as you can, unique, and monotonically increasing. But, don't get overly hung on those. Focus on the common access path to the data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Guess I'll be one of the few dissenters here. Although it's a "best practice" to have clustered indexes on all permanent tables, remember that best practices are guidelines to help keep people, that might otherwise not know about something, out of deep Kimchie. The fact that the heaps the OP mentioned are very specialized may change some of the basic rules. An example of such a thing is the code that I created in the "Hierarchies on Sterioids #1" article where I actually document in the code that you don't actually want any form of index to be on the table at that point because of the negative impact having such an index has.

    The right thing to do is posted two posts above this one where it says...

    ...so I will do some tests with a few different heaps and document the difference.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.

    So to stick up for the senior developer (even if the supplied information is a little lacking :-))

    A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.

    The search criteria could change in a number of ways:

    1) different length search values

    2) varying positions in a file

    and so on

    So given that the rules change frequently it would be easier to pass each record through some function to determine if and what search criteria it meets. If you try to index the criteria the index will need to be rebuilt all the time.

    Another possibility is when you have multiple databases with the same structures, but different amounts of data (e.g. DB1.T1 has 100 rows; DB2.T1 has 1,000,000 rows). With indexes the plan for reading this table will be different between the two databases (one will use an index, the other won't), without indexes they will both be forced to perform full table scans (yes there are other ways of causing this to happen). This is useful when you are grabbing all the records of a lopsided index value (e.g. 1% = 1, 99% 2).

    I know a lot of you will disagree with me - but maybe you will get hit with this challenge sometime in the future.

    (Don't write me off as a Newbie just because it is my first post - I have been a DBA for 17 years: Oracle & SQL Server)

  • shane.green 1227 (2/19/2015)


    There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.

    So to stick up for the senior developer (even if the supplied information is a little lacking :-))

    A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.

    The search criteria could change in a number of ways:

    1) different length search values

    2) varying positions in a file

    and so on

    So given that the rules change frequently it would be easier to pass each record through some function to determine if and what search criteria it meets. If you try to index the criteria the index will need to be rebuilt all the time.

    Another possibility is when you have multiple databases with the same structures, but different amounts of data (e.g. DB1.T1 has 100 rows; DB2.T1 has 1,000,000 rows). With indexes the plan for reading this table will be different between the two databases (one will use an index, the other won't), without indexes they will both be forced to perform full table scans (yes there are other ways of causing this to happen). This is useful when you are grabbing all the records of a lopsided index value (e.g. 1% = 1, 99% 2).

    I know a lot of you will disagree with me - but maybe you will get hit with this challenge sometime in the future.

    (Don't write me off as a Newbie just because it is my first post - I have been a DBA for 17 years: Oracle & SQL Server)

    Welcome to the party and thanks for posting.

    No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 63 total)

You must be logged in to reply to this topic. Login to reply