Can NULLs Exist in the Columns of a Non-Clustered Index?

Question

Post reply

Can NULLs Exist in the Columns of a Non-Clustered Index?

Gail Wanabee

SSCrazy Eights

Points: 8471
More actions
January 9, 2019 at 2:42 pm

#381170
A fairly credible source (SolarWinds, DPA help function) had this advice for determining the causes of table scans:

Investigate Non-Clustered Indexes That Might Contain Nullable Columns as Search Criteria:
A non-clustered index on a nullable column does not contain entries for rows with NULL values. Therefore, and for example, if a "SELECT" query uses a WHERE clause that references a nullable column in an existing non-clustered index, SQL Server will not use the non-clustered index for its search. Instead, SQL Server will perform a full table scan for the search. In another example, if a query contains "WHERE col1 IS NULL" and the col1 column allows NULLs, a non-clustered index on col1 will not be used for the search.
Fixes for cases involving nullable columns:
- Modify the table definition of the referenced column to be NOT NULL.
- Do not use a nullable column as a search component of a non-clustered index.
This is the first time in 20+ years of using SQL Server that I have read this. Can someone verify the accuracy/truth of this? I've attempted to contrive a test to prove or disprove it with inconclusive results.
Thanks

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 77014 More actions · Answer 1

I ran the following test on a SQL 2016 box and the query used an index seek, so this claim is false.

CREATE TABLE #TEST ( id TINYINT NULL )

CREATE NONCLUSTERED INDEX PK_Test ON #Test(id)

INSERT #TEST (id)
SELECT NULL
UNION ALL
SELECT TOP 100 ROW_NUMBER() OVER(ORDER BY @@VERSION)
FROM sys.columns

SELECT *
FROM #TEST
WHERE id IS NULL

DROP TABLE #TEST

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 2

Thanks for the query, Drew.
I believe you are correct. However, I commented out this line of code: --CREATE NONCLUSTERED INDEX PK_Test ON #Test(id) and re-executed the query.
The execution results are identical, so I'm not sure your code proves that the non-clustered index actually contains a NULL.
Can you think of any other way to test this? I've tried a half a dozen implementations but never was able to prove the actual number of rows that physically exist in the non-clustered index.

In my implementations, I thought using a table hint specifying an index would force the Optimizer to use my non-clustered index. But then, I read this in Books Online:
"Table Hints" - If the collection of hinted indexes do not include all columns referenced by the query, a fetch is performed to retrieve the remaining columns after the SQL Server Database Engine retrieves all the indexed columns.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Gail Wanabee - Wednesday, January 9, 2019 2:42 PM
A fairly credible source (SolarWinds, DPA help function) had this advice for determining the causes of table scans:

Investigate Non-Clustered Indexes That Might Contain Nullable Columns as Search Criteria:
A non-clustered index on a nullable column does not contain entries for rows with NULL values. Therefore, and for example, if a "SELECT" query uses a WHERE clause that references a nullable column in an existing non-clustered index, SQL Server will not use the non-clustered index for its search. Instead, SQL Server will perform a full table scan for the search. In another example, if a query contains "WHERE col1 IS NULL" and the col1 column allows NULLs, a non-clustered index on col1 will not be used for the search.
Fixes for cases involving nullable columns:
Modify the table definition of the referenced column to be NOT NULL.
Do not use a nullable column as a search component of a non-clustered index.
This is the first time in 20+ years of using SQL Server that I have read this. Can someone verify the accuracy/truth of this? I've attempted to contrive a test to prove or disprove it with inconclusive results.
Thanks

Completely false, total garbage. The only time an index has fewer rows than the table is when the index is filtered.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

Gail Wanabee - Wednesday, January 9, 2019 4:39 PM
I believe you are correct. However, I commented out this line of code: --CREATE NONCLUSTERED INDEX PK_Test ON #Test(id) and re-executed the query.
The execution results are identical, so I'm not sure your code proves that the non-clustered index actually contains a NULL.

The execution plan makes it very clear that there's a seek on the nonclustered index happening.

If you comment out the create index line, you'll get a table scan instead.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

drew.allen SSC Guru Points: 77014 More actions · Answer 5

Gail Wanabee - Wednesday, January 9, 2019 4:39 PM
Thanks for the query, Drew.
I believe you are correct. However, I commented out this line of code: --CREATE NONCLUSTERED INDEX PK_Test ON #Test(id) and re-executed the query.
The execution results are identical, so I'm not sure your code proves that the non-clustered index actually contains a NULL.
Can you think of any other way to test this? I've tried a half a dozen implementations but never was able to prove the actual number of rows that physically exist in the non-clustered index.
In my implementations, I thought using a table hint specifying an index would force the Optimizer to use my non-clustered index. But then, I read this in Books Online:
"Table Hints" - If the collection of hinted indexes do not include all columns referenced by the query, a fetch is performed to retrieve the remaining columns after the SQL Server Database Engine retrieves all the indexed columns.

The execution plans cannot possibly be identical. Remember, there are two parts of the query: inserting the data and selecting the data. The first part will be essentially the same, because that part of the query hasn't changed, but the second part of the query cannot be identical, because one instance uses the index that doesn't exist in the other instance. Here is a better test:

CREATE TABLE #TEST ( id TINYINT NULL )

INSERT #TEST (id)
SELECT NULL
UNION ALL
SELECT TOP 100 ROW_NUMBER() OVER(ORDER BY @@VERSION)
FROM sys.columns

SELECT *
FROM #TEST
WHERE id IS NULL

CREATE NONCLUSTERED INDEX PK_Test ON #Test(id)

SELECT *
FROM #TEST
WHERE id IS NULL

DROP INDEX PK_Test ON #Test

SELECT *
FROM #TEST
WHERE id IS NULL

DROP TABLE #TEST

If you look at the execution plan, you'll see that the first SELECT uses a TABLE SCAN, the second uses an INDEX SEEK and the third goes back to a TABLE SCAN.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 6

Gail Wanabee

SSCrazy Eights

Points: 8471

January 12, 2019 at 12:07 pm

#2018767

Gila, Drew,

Thank you both for your responses.

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 7

Gail Wanabee - Wednesday, January 9, 2019 2:42 PM
A fairly credible source (SolarWinds, DPA help function) had this advice for determining the causes of table scans:

Investigate Non-Clustered Indexes That Might Contain Nullable Columns as Search Criteria:
A non-clustered index on a nullable column does not contain entries for rows with NULL values. Therefore, and for example, if a "SELECT" query uses a WHERE clause that references a nullable column in an existing non-clustered index, SQL Server will not use the non-clustered index for its search. Instead, SQL Server will perform a full table scan for the search. In another example, if a query contains "WHERE col1 IS NULL" and the col1 column allows NULLs, a non-clustered index on col1 will not be used for the search.
Fixes for cases involving nullable columns:
Modify the table definition of the referenced column to be NOT NULL.
Do not use a nullable column as a search component of a non-clustered index.
This is the first time in 20+ years of using SQL Server that I have read this. Can someone verify the accuracy/truth of this? I've attempted to contrive a test to prove or disprove it with inconclusive results.
Thanks

Please provide a link to the article of which you speak.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 8

It wasn't an article, Jeff.

I copied the text excerpt from a Help function write-up in Solarwinds DPA (Database Performance Analyzer) product which my company uses to monitor our production OLTP server.

I was investigating a blocking chain problem. Over 80 queries were involved in a blocking event that persisted for over 3 hours and 36 minutes in the middle of the night. For one of the queries involved in the blocking, DPA identified 3 table scans on 3 different non-clustered indexes, even though the query plan DPA showed for the query did not have table scans on the referenced table. I wondered how that was possible? So I clicked on the Help function for that section of the UI and it displayed a long write-up on what could be causing the table scans (in general) and on non-clustered indexes (specifically), and how to prevent them.

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 9

Gail Wanabee - Saturday, January 12, 2019 4:54 PM
It wasn't an article, Jeff.
I copied the text excerpt from a Help function write-up in Solarwinds DPA (Database Performance Analyzer) product which my company uses to monitor our production OLTP server.
I was investigating a blocking chain problem. Over 80 queries were involved in a blocking event that persisted for over 3 hours and 36 minutes in the middle of the night. For one of the queries involved in the blocking, DPA identified 3 table scans on 3 different non-clustered indexes, even though the query plan DPA showed for the query did not have table scans on the referenced table. I wondered how that was possible? So I clicked on the Help function for that section of the UI and it displayed a long write-up on what could be causing the table scans (in general) and on non-clustered indexes (specifically), and how to prevent them.

Interesting. Thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

I chatted with someone from SolarWinds. Please contact their customer support people and open a request to get this help file corrected

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eric M Russell SSC Guru Points: 125622 More actions · Answer 11

By default, non-clustered indexes do in fact keep track of NULL values, and can answer for both "id IS NULL" or "id IS NOT NULL" predicates, regardless of the null-ability constraint on the column definition.
However, if you have a FILTERED non-clustered index (ex: "WHERE id > 1000"), and the query uses a predicate not answerable by the filter expression (ex: "id IS NOT NULL" or "id = 120"), then that would be a scenario where the non-clustered index is ignored.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho