Boost performance of your SQL Server tables with a TURBO button

Question

Boost performance of your SQL Server tables with a TURBO button

Steve Rezhener

SSChasing Mays

Points: 615
More actions
December 8, 2020 at 12:00 am

#3812127

Comments posted to this topic are about the item Boost performance of your SQL Server tables with a TURBO button

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 1

Yes it's a valid / good idea BUT ... if anyone dev or other has written a select * then adding the column will break the query.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Parker Smith SSChasing Mays Points: 619 More actions · Answer 2

I think for a first pass, just identifying your HEAP tables is enough - in most working environments, you will probably want to build the clustered index on existing columns. This is a great tool for preparing you to do that!

JoeyD Grasshopper Points: 16 More actions · Answer 3

If you never query the tables by the ID column you are adding to use in the clustered index, how does it help queries?

If I have a heap table with a [Name] column and do queries like WHERE [Name] = 'Smith' and then add the clustered index on a new ID INT IDENTITY(1,1), the query using the [Name] column is still going to do a table scan, isn't it?

Thomas Franz Hall of Fame Points: 3972 More actions · Answer 4

It does not help you in any way to add an identity column to your heaps, it will just add a useless information / memory + disk space (beside the fact, that almost every table in the wild will already have some sort of id column).

Contrary what you say at the end of your post, it will NOT add automatical a clustered index. And even if it would, it would be useless, since not a single one of your existing queries would use this additional column, so it will still to have to do a whole table scan.

It makes much more sense to

just identify the heap tables (or tables that gets a lot of scans)
find out how they are queried - either by checking your code (often the best / easiest way) or the Query Store (if available / enabled on your SQL Server) or using something as Brent Ozar's sp_blitz-procedures or even checking the recommended indexes sys tables (but you need to be very careful, since those suggestions are very often very bad too)
create fitting indexes for the most important queries
if possible modify other queries, so that they can use the created index(es) too

PS: you didn't even add the schema name to your query / create script nor did you QUOTENAME() the table name (so it will struggle, if you run over an unusal name with e.g. a space or hyphen in it).

I don't want to be rude and I appreciate your will to help the community, but this article will do more harm than it could do good for people with to less knowledge to find the many weak points in it.

colin.Leversuch-Roberts wrote:

if anyone dev or other has written a select * then adding the column will break the query.

In this case it is the developers fault and he deserves the trouble that it causes :-). The only case where you are allowed to use a SELECT * is, when you have an subquery and wants all of its (unchanged) columns in the same order in the outer query. But even in this case it has to be a SELECT sub.*

This reply was modified 5 years, 2 months ago by Thomas Franz.

God is real, unless declared integer.

Thomas Franz Hall of Fame Points: 3972 More actions · Answer 5

Regarding to your script:

instead of the bloating CASE statment for the latest_access, you could simply use a short subselect that you "call" via CROSS APPLY after the latest INNER JOIN
don't filter for pars.index_id = 0, particularly not in the WHERE condition. The correct way to do it, is to add AND pars.index_id = inds.index_id to the join of [pars]
an heap table has always the index_id = 0 (this is why your filter for pars.index_id) worked. You should better filter for this instead of the type_desc (and add a fitting comment for later), so that the SQL server does not needs to join and filter the description table (or however it gets the description) when executing your query.
don't mix join and where conditions the pars.row_count belongs to the join and should be placed there (it does not effect the result or how the server executes the query, it is just a form of good style to write everything to where it belongs)

So your final script should look more as those (but I still do NOT support the whole idea / concept, I just want to show, how I would make the script (hopefully) a bit better):

SELECT OBJECT_SCHEMA_NAME(inds.object_id)                                        AS schema_name
     , OBJECT_NAME(inds.object_id)                                               AS table_name
     , pars.row_count
     , calc.latest_access
     , stas.user_lookups + stas.user_scans + stas.user_seeks + stas.user_updates AS total_access
     , 'ALTER TABLE ' + OBJECT_SCHEMA_NAME(inds.object_id) + '.' + OBJECT_NAME(inds.object_id)
       + ' ADD ID INT NOT NULL IDENTITY (1,1) '                                  AS addid_script
  FROM sys.indexes                      AS inds
 INNER JOIN sys.dm_db_partition_stats   AS pars
    ON pars.object_id = inds.object_id
   AND pars.index_id = inds.index_id
   AND pars.row_count > 100
 INNER JOIN sys.dm_db_index_usage_stats AS stas
    ON stas.object_id = inds.object_id
   AND stas.index_id = inds.index_id
 CROSS APPLY
      (SELECT MAX(t.date) AS latest_access
         FROM (VALUES (stas.last_user_update), (stas.last_user_lookup), (stas.last_user_seek), (stas.last_user_scan)
              ) AS t (date) 
      )                                 AS calc
 WHERE inds.index_id = 0 -- heaps will have always id 0 -- indstype_desc = 'heap'
 ORDER BY pars.row_count DESC
        , schema_name
        , table_name;

God is real, unless declared integer.

Michael L John One Orange Chip Points: 27079 More actions · Answer 6

JoeyD wrote:

If you never query the tables by the ID column you are adding to use in the clustered index, how does it help queries?
If I have a heap table with a [Name] column and do queries like WHERE [Name] = 'Smith' and then add the clustered index on a new ID INT IDENTITY(1,1), the query using the [Name] column is still going to do a table scan, isn't it?

Maybe. I'm guessing it will do a key lookup.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

JoeyD Grasshopper Points: 16 More actions · Answer 7

Just adding the ID column, it still does a table scan.

Adding the ID column AND a clustered index on the ID column, it then does a clustered index scan.

But since the index is not on the [Name] column, on a large table I don't think there is any positive impact.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 8

To the best of my knowledge (and I could be wrong), inserts into heaps follow no order. They're almost always added to the last logical page of the heap. That means two things... heaps will frequently not matter (more so than a lot of people may expect) and the article you cited is actually incorrect. 😉

Even with the citing of an article that does have some examples and especially since you made some pretty extreme claims of performance, it would have been nice if you included some code that clearly and definitely proved the performance increases you claim especially if you claim that that simple addition of an IDENTITY column as a clustered index would cause such an improvement in the panacea way that you seem to describe for most cases. In the process, you might also find that it's more frequent to not be true than you might suspect.

Also, the indexes in the back of a book are more like filtered non-clustered indexes, IMHO, than they are clustered indexes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

trimjib SSC Veteran Points: 234 More actions · Answer 9

The notion that book indexes and table of contents are analogous to SQL clustered index and non-clustered index doesn't match up well with SQL indexes. Both clustered and non-clustered indexes navigate a tree. They differ in only the ordering of the leaf data. Non-ordered is a heap. A heap can have a non-clustered index and leaves won't be ordered on that index.

The Primary Key is a non-clustered index with unique values. Its just SSMS that defaults to making the PK a clustered index.

In my experience the heap is used when insertion performance is paramount. Adding any index slows the process. If scan performance is an issue, a separate query table with appropriate indexes can be used - possibly using async replication. The other usage is for very small tables that scan time is miniscule plus developer too lazy to bother with an index.

Identity(1,1) generally reflects the order in which sometime was added. I suspect a datetime field or such would work better. Seen a lot of identity(1,1) and datetime fields on same table.

As others have pointed out, unless the identity is the clustered key and is in a where clause, or used in a join, or the query uses a non-clustered index and has to lookup a value on the clustered index, the identity doesn't help query performance.

And then there is OLAP (BI) verses OLTP considerations.

Index or not, Identity or not, it always comes down to "it depends". One solution just doesn't fit all.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 10

trimjib wrote:

A heap can have a non-clustered index and leaves won't be ordered on that index.

If I'm reading that correctly, I'm really sure that's not true. All non-clustered indexes are based on a B-TREE that first reflect the order of the keys in the non-clustered index and then, at the very least, also have the RIDs of the heap in the leaf level but only after the non-clustered index. That means that the leaf level will, in fact, be at least in the logical order dictated by the keys of the non-clustered index and that does make them subject to the same fragmentation as if there were a clustered index and for the same reasons.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)