Commercial Software with All Tables as Heaps and Terrible Performance

  • Hello both

    Kristen-173977

    In using the software it does seem a bit faster but it is a bit hard to benchmark (like your traffic light example!) as the test system is not being as heavily used across college but also the server has less resources and is running an additional component of the software not present on the live server. I was a bit disappointed the difference was not more visible.

    I can run a query once and it takes about 10 seconds to run. Then I run it again and it takes 5 minutes. I do wonder if it is the table heaps that are causing this variance in performance as if data has no pre-defined order presumably it will take different lengths of time to retrieve based on how fragmented it is?

    I am hoping primary keys/clustered indexes are the solution to all our woes as we have tried everything else and it hasn't really made much difference.

    GilaMonster

    Thanks a lot for your kind offer. As the system holds student data I'm not sure what would be required for this to be able to be agreed but I could ask. Also would you charge for your service and if so could you give me an estimate? We do need some kind of help as I don't know how much further I am going to get with this.

  • Please can I ask if anyone knows the answers to these questions:

    Primary Key vs Clustered Index

    Would a clustered index or a primary key (which seems to always create a clustered index anyway) be the better option generally in terms of performance and best practice?

    I have always added primary keys over clustered indexes but was just wondering if there was a case for one over the other.

    Foreign Key

    Am I likely to gain any performance improvement from adding foreign keys to the database (which do not exist either) or are they purely to maintain a good data model without any orphaned data? As we already seem to have orphaned data it looks like I will struggle to add these although I did manage to add a few.

    With (No Lock)

    I am being told I need to add No Lock to every query I write to avoid SELECT queries impacting other parts of the system. Is this really the case as I have never had to add no lock at all for any other system I have queried?

    I was wondering if no lock is necessary due to the fact that we have no clustered indexes which in turn means that a table scan is occurring for the duration of the query (which can be extensive with the way things are).

    Scalar Functions

    I have a few scalar functions which for example look at a student's enrolments on courses and pick the main one and I am bring told to avoid scalar functions completely due to the poor way they perform and to use cursors instead to insert data into temporary tables and then query those. From my understanding though cursors are bad and best avoided.

    Thanks

    Robin

  • For Heap tables, wasted page space (ie: ghost records) resulting from deletes and updates is not reclaimed, so that's one consideration. Also for Heap tables the file number, page number, slot number are used as the RowID in non-clustered indexes, unlike Clustered tables which use the clustering key as the RowID. What this means in terms of performance is that there are scenarios where a Clustered key based RowID can be used to cover columns in a query, so Clustered tables can improve performance in that indirect way.

    But how Heap vs. Clustered tables impacts performance in your database depends on the case usage of the application queries and whether the tables are frequently deleted or updated.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • robinwilson (9/23/2015)


    Primary Key vs Clustered Index

    Would a clustered index or a primary key (which seems to always create a clustered index anyway) be the better option generally in terms of performance and best practice?

    I have always added primary keys over clustered indexes but was just wondering if there was a case for one over the other.

    It's not an either-or. They're different things.

    Primary key - the column or set of columns which uniquely identifies a row in the table. Must be unique, must be non-nullable.

    Clustered index - index built on top of the actual data pages.

    You can have a heap with a primary key. You can have a table with a clustered index that has no primary key. You can have a table where the primary key is enforced by a clustered index. You can have a table where the primary key is enforced by a non-clustered index and the clustered index is on other columns.

    Foreign Key

    Am I likely to gain any performance improvement from adding foreign keys to the database (which do not exist either)

    .

    Potentially, yes. Don't expect it to be orders of magnitude, but there should be some gain. The problem will be in cleaning out all the bad data first.

    With (No Lock)

    I am being told I need to add No Lock to every query I write to avoid SELECT queries impacting other parts of the system.

    Gah!! No, no, no, no, no!!!! Terrible practice.

    Scalar Functions

    I have a few scalar functions which for example look at a student's enrolments on courses and pick the main one and I am bring told to avoid scalar functions completely due to the poor way they perform and to use cursors instead to insert data into temporary tables and then query those. From my understanding though cursors are bad and best avoided.

    Scalar functions are terrible, because they behave a bit like cursors in the background, they're executed one row at a time.

    And what I offered earlier is not free advice. I do performance tuning of systems for a living. If your bosses are willing to get someone in (for loose definitions of 'in', since I probably live on the other side of the world), drop me a PM with an email address and I'll have my boss talk to your bosses and discuss costs and things like that.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • robinwilson (9/23/2015)


    As the system holds student data I'm not sure what would be required for this to be able to be agreed but I could ask. Also would you charge for your service and if so could you give me an estimate? We do need some kind of help as I don't know how much further I am going to get with this.

    I also happen to know of several consultants with room for another client. As Gail would do, we would also charge.

    FTR - I deal with student data, health data, financial data, and many other types of data. For more info, you can look at SQL Solutions Group[/url] to contact us for more info.

    Gail is an excellent resource and I like to think my group of consultants is excellent as well.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • robinwilson (9/23/2015)


    Please can I ask if anyone knows the answers to these questions:

    Primary Key vs Clustered Index

    Would a clustered index or a primary key (which seems to always create a clustered index anyway) be the better option generally in terms of performance and best practice?

    I have always added primary keys over clustered indexes but was just wondering if there was a case for one over the other.

    What Gail said. Additionally, the PK may not be the best choice of columns to use for your CI. The CI should be carefully considered for each table.

    Foreign Key

    Am I likely to gain any performance improvement from adding foreign keys to the database (which do not exist either) or are they purely to maintain a good data model without any orphaned data? As we already seem to have orphaned data it looks like I will struggle to add these although I did manage to add a few.

    With a trusted Foreign Key, the Query Optimizer can make certain query plan improvements. See http://explainextended.com/2009/10/15/constraints-and-the-optimizer-in-sql-server-foreign-key/[/url] for more details.

    With (No Lock)

    I am being told I need to add No Lock to every query I write to avoid SELECT queries impacting other parts of the system. Is this really the case as I have never had to add no lock at all for any other system I have queried?

    I was wondering if no lock is necessary due to the fact that we have no clustered indexes which in turn means that a table scan is occurring for the duration of the query (which can be extensive with the way things are).

    What Gail said.

    Folks use NOLOCK because of their poor performing queries that scan entire tables, creating huge blocking situations, and this was a way to work around it. However, it comes at a cost - the accuracy of the data being reported is very suspect. As in, if you need accuracy then it can't be trusted. If you are having blocking problems, a band-aid is to use the snapshot isolation level until you can get those queries optimized properly. This is available starting with SQL 2005. The proper thing to do is to optimize those queries. Using the NOLOCK directive (it's not really a hint, but a command) will require SQL to use the specified "hint".

    Scalar Functions

    I have a few scalar functions which for example look at a student's enrolments on courses and pick the main one and I am bring told to avoid scalar functions completely due to the poor way they perform and to use cursors instead to insert data into temporary tables and then query those. From my understanding though cursors are bad and best avoided.

    Check out my blog post[/url] where I evaluate the performance of the various functions. Summary: use Inline Table-Valued Functions where possible.

    And don't use cursors (or their evil, worse cousin, a WHILE LOOP). They are best avoided.

    I notice that you posted in the SQL 7/2000 version forums. If you are on 2005, the window functions allow code to eliminate many cursors. And SQL 2012 enhances this even more, to the point that >99% of all cursors can be eliminated. From what it sounds like you are being told to use a cursor for, this (IMHO) just sounds like a poor performing query that can be optimized.

    Thanks

    Robin

    As Gail and Jason said, getting some experts in to help you out may be your best action. Gail, Jason and I are all MCMs - as Jason said, excellent resources to work with. We can all help you out, but it's not free (I work with Jason, so you can use the contact info that he's already posted).

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (9/23/2015)


    However, it comes at a cost - the accuracy of the data being reported is very suspect. As in, if you need accuracy then it can't be trusted.

    Interestingly, in this case, since all tables are heaps there won't be any page splits to cause missing rows/duplicate rows under nolock.

    I still don't recommend it. If there are blocking problems, and tuning the code/indexes doesn't help (it usually does), then the database setting read_committed_snapshot should be considered and tested.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I drafted this hours ago, got interrupted with something else, apologies if it now overlaps other posts.

    robinwilson (9/23/2015)


    Please can I ask if anyone knows the answers to these questions:

    Primary Key vs Clustered Index

    Would a clustered index or a primary key (which seems to always create a clustered index anyway) be the better option generally in terms of performance and best practice?

    I have always added primary keys over clustered indexes but was just wondering if there was a case for one over the other.

    Primary Key does not have to use a Clustered Index. You can have a C.I. and a different, non-clustered, index for your PKey.

    PKey requires all columns to be not-null. Clustered index is same as non-clustered - Null Columns allowed, if UNIQUE then only one row may be NULL. Ideally your clustered index will be [declared as] unique (otherwise SQL has to add a tie-break ID, which makes the keys wider)

    I think it is important that Clustered Index almost never changes. (If it changes all corresponding non-clustered index entries have to be updated)

    If you add a new Unique Index (clustered or otherwise) which turns out to then be NOT Unique it will cause a run-time error when users attempt to add a DUPe. Probably not a problem you are going to hit, given a carefully chosen set of key field(s), but I can imagine that causing any accidental, new, errors will weaken the case for the good work you are trying to do.

    Foreign Key

    Am I likely to gain any performance improvement from adding foreign keys to the database (which do not exist either) or are they purely to maintain a good data model without any orphaned data? As we already seem to have orphaned data it looks like I will struggle to add these although I did manage to add a few.

    Adding a foreign key will break the APP if it tries to create orphans. The Good is that you won't have any more orphans, the Bad is that users will get a new error. You could go with a Data Validation Report instead, reporting all orphans, to encourage people to fix-up the data over time (and more importantly any new ones that come into existence).

    With (No Lock)

    I am being told I need to add No Lock to every query I write to avoid SELECT queries impacting other parts of the system. Is this really the case as I have never had to add no lock at all for any other system I have queried?

    I hate seeing NOLOCK in any code. Gail has made an important point that this is not currently a threat because all your tables are heaps (although I'm not sure about page splits on any non clustered indexes?) and it could become (or become more of??) an issue once you start adding clustered indexes, but that apart my view is that NOLOCK brings a serious risk of, sooner or later, generating a report where either some records are totally missing or included in the report twice. Both situations are totally unacceptable to me, and I've yet to find anywhere where that is considered acceptable! [for an end-user report]

    NOLOCK will include rows that are subsequently rolled back. I doubt that is a real-world problem in most organisations.

    For me the more serious issue is index page splits; if someone else makes an INSERT which causes an index page split (i.e. the index page was already full) then under certain circumstances your query running with NOLOCK will either read both the original full index page and then the new second-half-page (because it was split and written before you got to that index page), or it will read the new first-half already-split index page and skip the second-half page because it was not written in time. For me that risk is unacceptable and we only use NOLOCK for DBA reports on Production Tables where we do not want to cause any blocking to the APP; the DBAs are fully aware that the data in such reports may be inaccurate / incomplete.

    I was wondering if no lock is necessary due to the fact that we have no clustered indexes which in turn means that a table scan is occurring for the duration of the query (which can be extensive with the way things are).

    I doubt that the need is any greater / lesser because of that. However, the faster that insert / update / delete operations run, and SELECT queries too, the smaller the chance of blocking, so if you dramatically improve performance you may also reduce the risk of blocking.

    We use READ COMMITTED SNAPSHOT [Isolation] which solves that problem (its a database setting, so easy to implement) but there are a few situations where existing code might not work the same (I would class them as rare, but that will be of no comfort if you have one of them in your code!). RCSI also has an impact, which can be significant, on TEMPDB. If it works for you you'd have to then work on getting rid of any NOLOCKS in the existing code, otherwise you would still have the page-split risk mentioned earlier (but only for code that includes NOLOCK), but at least your Selects would no longer block any Inserts/Updates/etc.

    Scalar Functions

    I have a few scalar functions which for example look at a student's enrolments on courses and pick the main one and I am bring told to avoid scalar functions completely due to the poor way they perform and to use cursors instead to insert data into temporary tables and then query those

    It depends πŸ™‚ A Scalar function on an expression in a SELECT is probably OK (probably no different to actually hard-coding the logic in the SELECT). A Scalar UDF, or indeed any other (SQL inbuilt) function, in the WHERE or a JOIN is potentially disastrous if it prevents the query being SARGable (and pretty much all use of functions does cause that) - i.e. a perfectly suitable Index on a Column(s) is not used because SQL cannot estimate what the value from the function will be. An answer, sometimes, is to run the function ahead of the query and store the value in a @WorkingVariable and use that in the query.

    WHERE MyColumn = @MyWorkingVariable

    will (given all the normal caveats πŸ™‚ ) use a suitable index on MyColumn

    WHERE MyColumn = AnyFunction(@SomeValue)

    WHERE AnyFunction(MyColumn) = @SomeValue

    will never (given all the normal caveats πŸ™‚ ) use any index on MyColumn

    I think it is worth always using WITH SCHEMABINDING in Scalar UDFs as it changes something in the query plan (sorry, forgotten what, but I can look it up if you need it) which helps with performance.

    From my understanding though cursors are bad and best avoided.

    If the operation can be performed as Set Based, instead of Row By Agonising Row ("RBAR") then definitely avoid the cursor. If it can't then personally I prefer to do it differently, but it is still a row-by-row loop and thus much the same as a cursor.

    For example, backing up all databases. This definitely needs a loop that makes the backup one-by-one. It doesn't matter if you use a Cursor on sys.databases ... or Select the database names into a #TEMP table with an IDENTITY column and then use a WHILE loop to select the rows one by one, its still a loop but you definitely can't do that Set-Based πŸ™‚

    Cursors have a bad name and are not usually looked at in a favourable light πŸ™‚ stick to set-based whenever possible.

  • Kristen-173977 (9/23/2015)


    A Scalar function on an expression in a SELECT is probably OK (probably no different to actually hard-coding the logic in the SELECT).

    No it's not OK, and it's way, way different.

    Functions in the select execute once per row, and they have an overhead. Data-accessing scalar functions in the select are essentially a hidden cursor inside a select and they are horrific in terms of performance.

    http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/

    WHERE MyColumn = @MyWorkingVariable

    will (given all the normal caveats πŸ™‚ ) use a suitable index on MyColumn

    WHERE MyColumn = AnyFunction(@SomeValue)

    will never (given all the normal caveats πŸ™‚ ) use any index on MyColumn

    Both of those will very happily use an index on MyColumn, should such an index exist.

    What won't use an index on MyColumn is this:

    WHERE AnyFunction(MyColumn) = @SomeValue

    Functions applied to a column prevent index seeks, not functions applied to a constant, parameter or variable.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/23/2015)


    What won't use an index on MyColumn is this:

    WHERE AnyFunction(MyColumn) = @SomeValue

    Thanks Gail, that was what was in my mind .. but not what I typed.

  • GilaMonster (9/23/2015)


    Functions in the select execute once per row, and they have an overhead. Data-accessing scalar functions in the select are essentially a hidden cursor inside a select and they are horrific in terms of performance.

    Sorry, my mistake I was thinking of scalar functions which manipulate the parameters rather than data-accessing scalar functions. I've re-read the original post and see that wasn't what was being discussed at all.

  • Thanks all for the information and detailed responses which is all really useful.

    Heaps

    I was told this by the software supplier:

    I accept that clustered indexes do seem like a simple solution and with system tables, especially the extremely large audit table, we have resorted to implementing this approach, but wider usage will bloat the database size and can have detrimental impacts and so this has been avoided. Hopefully as is being shown efficient queries don’t need to rely on clustered indexes

    However it would see that more "bloat" comes from having heaps on tables due to the way data is paged and left after records are deleted or updated.

    However "efficient" my query is it would seem a table scan has to occur when no clustered index exists. A few random tables do actually have clustered indexes - one being user permissions and another being an audit table. We never query these though.

    Paid Support

    Sorry I wasn't meaning to imply that I expected the work to be free and understand we would be paying for a proper professional service. This may be something we would consider and thanks for the offers.

    Code, Cursors and NO LOCK

    We are being told that what we are trying to do is too complex for the system to handle.

    I sent them code samples and they have sent them back with NO LOCK on every table join and cursors generating temporary tables, which interestingly they then add a clustered index to and then they query these tables as part of the main query.

    It does run marginally faster but then this is probably because of the temporary clustered index.

    SQL Server 2014

    Apologies for posting in the SQL Server 2000 forum (not sure how that happened but must have been a memory allocation error on my part!) but it is SQL Server 2014 I am running on. It was 2008 R2 but we upgraded it along with the hardware when trying to work out how to resolve the issues with the database (the system was below the minimum spec initially).

    READ_COMMITTED_SNAPSHOT

    This option does look like a possible option and would seem better than the suggestion of using NO LOCK on everything. I have read up on this and can't really see a downside apart from possibly a higher level of activity in temp db. We do see a lot of locking in activity monitor.

    Table-Value Functions

    It looks like I need to re-write all the scalar-value functions as table-value ones in that case. I thought as I was only returning one value then scalar was better but I see now that is not the case. Have table-valued functions sort of replaced scalar-value ones as is there anything you can only do with a scalar one?

    Thanks for all the replies.

    Robin

  • robinwilson (9/23/2015)


    Heaps

    wider usage will bloat the database size

    However it would see that more "bloat" comes from having heaps on tables due to the way data is paged and left after records are deleted or updated.

    If you have a lot of deletions then that will be an issue, but surely any index could be described as "bloat"? - the file will be smaller without it. On that basis no one would ever create an index! by the same token creating an index that is never used is definitely bloat, for the rest there is a cost-benefit equation as to whether the time saved on queries is "paid for" in comparison to the increased time during insert/update etc. and the disk space to hold the index.

    Seems to me to be simply a matter of trying it on a test copy-database - if the improvement in speed is significant then keep the index, if not then drop it.

    A few random tables do actually have clustered indexes - one being user permissions and another being an audit table. We never query these though.

    Perhaps those tables are frequently used in JOINs in the APP?

    Code, Cursors and NO LOCK

    We are being told that what we are trying to do is too complex for the system to handle.

    Do you have trouble getting your head through the door at the end of each working day? πŸ˜€

    I sent them code samples and they have sent them back with NO LOCK on every table join and cursors generating temporary tables, which interestingly they then add a clustered index to and then they query these tables as part of the main query.

    It does run marginally faster but then this is probably because of the temporary clustered index.

    For me this would come down to trying the queries on a copy-database with indexes in place.

    READ_COMMITTED_SNAPSHOT

    This option does look like a possible option and would seem better than the suggestion of using NO LOCK on everything. I have read up on this and can't really see a downside apart from possibly a higher level of activity in temp db. We do see a lot of locking in activity monitor.

    I wonder what the liability is for Vendors who use/demand NOLOCK ...

    I think RCS is easier implemented, by the developers / vendor, from the outset - then there are no worries about subtle code issues (i.e. side effects because RCSI view of the data is as-of the start of the statement).

    Perhaps another option is to set the DB to ALLOW_SNAPSHOT_ISOLATION and then to explicitly use SET TRANSACTION ISOLATION LEVEL SNAPSHOT in your queries. You won't block Writers (and Writers won't block you); if there are any subtle code issues [i.e. which would be a problem if you set RCSI on the database] in Vendors APP they won't be effected (I think???). Importantly you won't need to use NOLOCK πŸ™‚

    Have table-valued functions sort of replaced scalar-value ones as is there anything you can only do with a scalar one?

    The type I was, wrongly, thinking of was fn_VolumeOfCube(@length, @width, @height) ... we use them for things like fn_String2DateSafe(@MyString) which takes care of a blank-string being converted to NULL and other similar presentational issues in SELECT clause and for things line "Tomorrow" for date column ranges.

  • robinwilson (9/23/2015)


    Thanks all for the information and detailed responses which is all really useful.

    Heaps

    I was told this by the software supplier:

    I accept that clustered indexes do seem like a simple solution and with system tables, especially the extremely large audit table, we have resorted to implementing this approach, but wider usage will bloat the database size and can have detrimental impacts and so this has been avoided. Hopefully as is being shown efficient queries don’t need to rely on clustered indexes

    Wahahahahahahahaha... *rotfl*

    No, clustered indexes won't bloat the DB, a well-chosen clustered index doesn't have detrimental effects and that really sounds like someone trying to use fancy words to hide that they don't have a clue.

    The SQL storage engine is implemented based on the assumption that all tables will have clustered indexes. It should be the norm in a SQL server DB, not the exception.

    Code, Cursors and NO LOCK

    We are being told that what we are trying to do is too complex for the system to handle.

    Again, hahahahahahaha!

    One client I have uses a single SQL instance and single sql DB to run an online accounting package with tens of thousands of users. I have multiple clients using SQL for stock market trading systems. Insurance processing systems, HR systems, online gambling, etc.

    I doubt what you're doing is too complex for SQL Server.

    Too complex for the vendor's design, now that's a possibility.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Kristen-173977 (9/23/2015)


    GilaMonster (9/23/2015)


    Functions in the select execute once per row, and they have an overhead. Data-accessing scalar functions in the select are essentially a hidden cursor inside a select and they are horrific in terms of performance.

    Sorry, my mistake I was thinking of scalar functions which manipulate the parameters rather than data-accessing scalar functions.

    Same thing, they still execute once per row, they still have an overhead. They are not in-line functions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 31 through 45 (of 69 total)

You must be logged in to reply to this topic. Login to reply