Microsoft SQL Server has the feature to use “Included columns” for the creation of non-clustered indexes. This feature and its benefits and disadvantages will be explained in the following article.
Included columns can be used to create a covering indexes without including all the data into the key columns. This covering index has the advantage that the index contains all the columns that are needed for a query. This means key and non-key columns are included, so that it is not necessary to use lookup operations or to access the physical table. This means a reduced number of I/O operations to get the data.
The creation of Included Columns is only possible for non-clustered indexes and not for clustered indexes. Included columns in a non-clustered index can only be the non-key columns. In the included columns the storage only happens on the leaf level of the index, contrary to the key column of an index. The key column of an index is stored at all levels.
The non-key columns don’t have the same limitations as the index key columns in terms of the data type. Only the legacy data types, like TEXT, NTEXT, and IMAGE are not allowed. Even the large object data types(LOB) are allowed as non-key columns. These are the varchar (max), nvarchar(max) , varbinary(max), and XML data types.
The problem in using the LOB data type is that it can result in performance issues. The column values are copied into the leaf level of the non-clustered index. Due to this fact, the index can have large disk space requirements. Also, the buffer cache efficiency can be lower because fewer rows fit in memory, which can lead to increased I/O demands. It is possible to use up to 16 index keys in a Non-clustered index. On the Included columns you don’t have a limitation on the number of keys, but it is not recommended to use more keys than needed.
If you want to decide whether the usage of included columns in your indexes makes sense, you need to consider the characteristics of the database. The disadvantages are reduced performance of data modification operations and as mentioned before the lower buffer cache efficiency and the increased I/O demands in case of the use of LOB datatypes. Make sure, that the disadvantages of the Included columns are lower as the possible gains in query performance.
Let’s examine the table, Users, from the Stack Overflow database in the following as example. We can enter our query here: https://data.stackexchange.com/stackoverflow/query/new
SELECT Id, DisplayName, Location FROM Users WHERE Views > 100 AND UpVotes = 3
Now we create an index for this query:
CREATE INDEX idx_Users_ViewsUpVotes ON Users (Views, UpVotes)
This index has the columns that are needed for the WHERE clause, but it doesn’t cover the 3 columns from the SELECT. A lookup to the clustered index needs to be performed by SQL Server to get the 3 columns. To create a covering index, we need to add these 3 columns to the index. In this case it would make sense to use included columns. We don’t add the columns as key columns, because it would make the index wider than needed. Further we don’t use these columns for filtering or indexing, therefore we don’t need them as key columns.
The optimal index in this case looks like that:
CREATE INDEX idx_Users_ViewsUpVotes ON Users (Views, UpVotes) Include (Id, DisplayName, Location)
This index makes the query very efficient, because it contains all the needed columns and therefore it is not necessary to do lookups. The disadvantage is, that the index is a little bit larger and that more work needs to be done for data modifications.
Summary
As a conclusion you need to weigh between reading and updating the data. A covering index includes all the columns, that are used by a query. If a query can be covered with a covering index, then it can help ensure good performance.
But not all queries should be covered, because if you cover all the possible queries, it will negatively influence the size of the database and the performance of data modifications like insert, update, or delete operations. In other words, less disk space and less maintenance work are required for indexes with fewer number of columns. On the other hand a covering index will improve the reading performance.
In general it is recommended to test many scenarios to find the best solution for your specific situation. Indexes should regularly be reviewed to avoid unused indexes or to add new needed indexes.