An important operator to understand in execution plans is a scan. A Scan can be good and bad, so understanding the difference between them is a basic skill needed by a DBA or developer that is going to do performance tuning.
The Table Scan means that a table does not have a clustered index and the optimal query plan decided to scan all leaf level pages in the table. The table is stored as a Heap because there is not clustered index. A table can have only one clustered index.
We are going to drop the primary key/clustered index on the DatabaseLog table of the AdventureWorks database.
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[DatabaseLog]') AND name = N'PK_DatabaseLog')
ALTER TABLE [dbo].[DatabaseLog] DROP CONSTRAINT [PK_DatabaseLog]
GO
Then, we are going to execute the following code with Include Actual Execution Plan query option activated. In SQL Server Management Studio open a query window, either right-click the query window and choose Include Actual Execution Plan or select Include Actual Execution Plan from the Query menu.
Execute this code:
SELECT [DatabaseLogID] ,[PostTime] ,[DatabaseUser] ,[Event]
,[Schema] ,[Object] ,[TSQL] ,[XmlEvent]
FROM [dbo].[DatabaseLog]
You will get the following execution plan table scan
The optimizer says a full Table Scan is the best plan. The operator Table Scan indicates that the table does not have a clustered index. By default, creating a primary key will make the key a clustered index, but that is not always the case. I am working on a Fact table at work where a Unique Constraint is the clustered index, but the primary key is a 7 column compound key of Surrogate Keys.
By adding a Clustered Index to table DatabaseLog (Primary Key), we can change this Table scan to a clustered index scan. A clustered index changes the logical/physical structure of the table to follow an index tree.
ALTER TABLE [dbo].[DatabaseLog]
ADD CONSTRAINT [PK_DatabaseLog]
PRIMARY KEY CLUSTERED
([DatabaseLogID] ASC)
We get the following if we execute the SELECT query again like above:
So, what is the difference in Cost? None.
So what is the big deal, they both costs the same. Well, when you start using a WHERE clause and the no clustered index table still does a full scan, but the clustered index might do an Clustered Index Seek if the density/cardinality gives the optimizer a better choice, you will see the performance difference. More on this is a later blog.
My advice, ALWAYS have a primary AND/OR a clustered index on all tables. The decision on making the primary key a clustered index is dependent on your experience in managing and planning the correct implementation.
All right, lastly we will see a Index Scan. First, we will add a Non-clustered index to the table DatabaseLog.
CREATE NONCLUSTERED INDEX [idxDatabaseLog_DatabaseUserScehmaObject]
ON [dbo].[DatabaseLog]
( [DatabaseUser], [Schema], [Object] )
Now, the query optimizer creates a plan to scan the new non-clustered index to retrieve the data needed in the SELECT statement. If we add a WHERE clause to the SELECT, it will change to a Index Seek.
SELECT [DatabaseUser], [Schema], [Object]
FROM [dbo].[DatabaseLog]
WHERE DatabaseUser = 'dbo'
Reading execution plans can be a lot easier than understanding them to help performance tuning. I know I spent many hours thinking I finally got it, before some article on SQL Server Central website or SQL Server Magazine changed the understanding I have about T-SQL statements and indexing for performance. But, you have to start somewhere, then keep improving your understanding.