December 8, 2020 at 12:00 am
Comments posted to this topic are about the item Boost performance of your SQL Server tables with a TURBO button
December 8, 2020 at 11:02 am
Yes it's a valid / good idea BUT ... if anyone dev or other has written a select * then adding the column will break the query.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
December 8, 2020 at 12:23 pm
I think for a first pass, just identifying your HEAP tables is enough - in most working environments, you will probably want to build the clustered index on existing columns. This is a great tool for preparing you to do that!
December 8, 2020 at 12:43 pm
If you never query the tables by the ID column you are adding to use in the clustered index, how does it help queries?
If I have a heap table with a [Name] column and do queries like WHERE [Name] = 'Smith' and then add the clustered index on a new ID INT IDENTITY(1,1), the query using the [Name] column is still going to do a table scan, isn't it?
December 8, 2020 at 12:57 pm
It does not help you in any way to add an identity column to your heaps, it will just add a useless information / memory + disk space (beside the fact, that almost every table in the wild will already have some sort of id column).
Contrary what you say at the end of your post, it will NOT add automatical a clustered index. And even if it would, it would be useless, since not a single one of your existing queries would use this additional column, so it will still to have to do a whole table scan.
It makes much more sense to
PS: you didn't even add the schema name to your query / create script nor did you QUOTENAME() the table name (so it will struggle, if you run over an unusal name with e.g. a space or hyphen in it).
I don't want to be rude and I appreciate your will to help the community, but this article will do more harm than it could do good for people with to less knowledge to find the many weak points in it.
if anyone dev or other has written a select * then adding the column will break the query.
In this case it is the developers fault and he deserves the trouble that it causes :-). The only case where you are allowed to use a SELECT * is, when you have an subquery and wants all of its (unchanged) columns in the same order in the outer query. But even in this case it has to be a SELECT sub.*
God is real, unless declared integer.
December 8, 2020 at 1:24 pm
Regarding to your script:
So your final script should look more as those (but I still do NOT support the whole idea / concept, I just want to show, how I would make the script (hopefully) a bit better):
SELECT OBJECT_SCHEMA_NAME(inds.object_id) AS schema_name
, OBJECT_NAME(inds.object_id) AS table_name
, pars.row_count
, calc.latest_access
, stas.user_lookups + stas.user_scans + stas.user_seeks + stas.user_updates AS total_access
, 'ALTER TABLE ' + OBJECT_SCHEMA_NAME(inds.object_id) + '.' + OBJECT_NAME(inds.object_id)
+ ' ADD ID INT NOT NULL IDENTITY (1,1) ' AS addid_script
FROM sys.indexes AS inds
INNER JOIN sys.dm_db_partition_stats AS pars
ON pars.object_id = inds.object_id
AND pars.index_id = inds.index_id
AND pars.row_count > 100
INNER JOIN sys.dm_db_index_usage_stats AS stas
ON stas.object_id = inds.object_id
AND stas.index_id = inds.index_id
CROSS APPLY
(SELECT MAX(t.date) AS latest_access
FROM (VALUES (stas.last_user_update), (stas.last_user_lookup), (stas.last_user_seek), (stas.last_user_scan)
) AS t (date)
) AS calc
WHERE inds.index_id = 0 -- heaps will have always id 0 -- indstype_desc = 'heap'
ORDER BY pars.row_count DESC
, schema_name
, table_name;
God is real, unless declared integer.
December 8, 2020 at 1:30 pm
If you never query the tables by the ID column you are adding to use in the clustered index, how does it help queries?
If I have a heap table with a [Name] column and do queries like WHERE [Name] = 'Smith' and then add the clustered index on a new ID INT IDENTITY(1,1), the query using the [Name] column is still going to do a table scan, isn't it?
Maybe. I'm guessing it will do a key lookup.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
December 8, 2020 at 1:39 pm
Just adding the ID column, it still does a table scan.
Adding the ID column AND a clustered index on the ID column, it then does a clustered index scan.
But since the index is not on the [Name] column, on a large table I don't think there is any positive impact.
December 8, 2020 at 11:16 pm
To the best of my knowledge (and I could be wrong), inserts into heaps follow no order. They're almost always added to the last logical page of the heap. That means two things... heaps will frequently not matter (more so than a lot of people may expect) and the article you cited is actually incorrect. 😉
Even with the citing of an article that does have some examples and especially since you made some pretty extreme claims of performance, it would have been nice if you included some code that clearly and definitely proved the performance increases you claim especially if you claim that that simple addition of an IDENTITY column as a clustered index would cause such an improvement in the panacea way that you seem to describe for most cases. In the process, you might also find that it's more frequent to not be true than you might suspect.
Also, the indexes in the back of a book are more like filtered non-clustered indexes, IMHO, than they are clustered indexes.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2020 at 11:55 pm
The notion that book indexes and table of contents are analogous to SQL clustered index and non-clustered index doesn't match up well with SQL indexes. Both clustered and non-clustered indexes navigate a tree. They differ in only the ordering of the leaf data. Non-ordered is a heap. A heap can have a non-clustered index and leaves won't be ordered on that index.
The Primary Key is a non-clustered index with unique values. Its just SSMS that defaults to making the PK a clustered index.
In my experience the heap is used when insertion performance is paramount. Adding any index slows the process. If scan performance is an issue, a separate query table with appropriate indexes can be used - possibly using async replication. The other usage is for very small tables that scan time is miniscule plus developer too lazy to bother with an index.
Identity(1,1) generally reflects the order in which sometime was added. I suspect a datetime field or such would work better. Seen a lot of identity(1,1) and datetime fields on same table.
As others have pointed out, unless the identity is the clustered key and is in a where clause, or used in a join, or the query uses a non-clustered index and has to lookup a value on the clustered index, the identity doesn't help query performance.
And then there is OLAP (BI) verses OLTP considerations.
Index or not, Identity or not, it always comes down to "it depends". One solution just doesn't fit all.
December 9, 2020 at 12:15 am
A heap can have a non-clustered index and leaves won't be ordered on that index.
If I'm reading that correctly, I'm really sure that's not true. All non-clustered indexes are based on a B-TREE that first reflect the order of the keys in the non-clustered index and then, at the very least, also have the RIDs of the heap in the leaf level but only after the non-clustered index. That means that the leaf level will, in fact, be at least in the logical order dictated by the keys of the non-clustered index and that does make them subject to the same fragmentation as if there were a clustered index and for the same reasons.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply