July 12, 2007 at 1:50 pm
Hello,
I was just asked a question by my boss. Is there an advantage to indexing a column if you are going to search on it like this:
SELECT * FROM tblA
WHERE column1 IS NULL
Assuming that it is selective (few NULL values compared to the overall number of rows in the table).
Is it likely that creating an index in tblA.column1 will help the query?
July 12, 2007 at 2:37 pm
Why don't you run a simple test and show results to your boss?
_____________
Code for TallyGenerator
July 12, 2007 at 3:20 pm
"Non-sargable search arguments in the WHERE clause, such as "IS NULL", "", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and "LIKE '%500'" generally prevents (but not always) the query optimizer from using an index to perform a search." So no...it won't help because it most likely will not be used anyway...
src: http://www.sql-server-performance.com/transact_sql.asp
Ben Sullins
bensullins.com
Beer is my primary key...
July 12, 2007 at 4:10 pm
Ben, I would suggest to you as well to run a simple test.
You cannot believe how much stuff you can find in Internet is absolute cr.p.
And you probably missed this: "generally prevents (but not always)"
_____________
Code for TallyGenerator
July 13, 2007 at 5:39 am
I agree that test is the best way to find out.
I will not quote any authorities (true or would-be), but based on my experience, if the table is large and does not contain many NULLs in the column, such index would help a lot.
July 13, 2007 at 7:17 am
I would try to simulate it, but the scenario involves data to be populated in a table in the future. We're talking about 100s of thousands of rows, with only a few hundred rows will NULL values in the column we're looking for.
When my boss asked me the question, I thought the index wouldn't help. I just wanted to find out if anyone had any experience or knew of any definitive answer (I tried googling it, but I got a bunch of non-related results).
Ben, thanks for posting a constructive answer.
July 13, 2007 at 9:40 am
Your best bet here really is to run a test of your own as Sergiy and Vladan have stated. While you may find sources that say that it may or may not help, I would advise taking it with a grain of salt and testing it for yourself especially since this would be an easy mock up for you. Even with 1 million rows, you could create a simple test to prove or disprove the effects of adding the index with relative ease. As a rule, even when it makes total sense that a certain indexing scenario should provide improvements, I always run a test so I have proof that a change that I recommend is going to help.
July 13, 2007 at 11:50 am
I've got a small table with about 9k rows, and it has a column that has about 3k of nulls. When I select top 50 where column is a null, I get nice index seek against the column index. Remove the top and it changes to a clustered scan. No final answer there, but it would appear in my test that the index does help for the Top N cases.
I think its fair to point out the value of testing, but it's also nice to just ask a question and have a conversation about it.
July 13, 2007 at 6:07 pm
It's not exacly about TOP N.
It's about shortest way to the target.
When you do SELECT * in a query involving non-clustered index seek SQL Server needs to perform bookmark lookup in order to return all requested values.
In case of TOP 50 the cost of index seek + lookup is smaller than the cost of clustered index scan.
If you don't have TOP 50 statistics will tell optimiser that it's gonna be lookup for 30% of rows and it's gonna be faster to perform single clustered index scan than index seek with following bookmark lookup.
Change your index used in WHERE clause to be clustered and it will be "clustered index seek" in both occasions.
_____________
Code for TallyGenerator
July 13, 2007 at 9:35 pm
And, if you do have a clustered index in the right spot...
GO
--===== Create and populate a 1,000,000 row test table.
-- Column RowNum has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Takes about 42 seconds to execute.
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeCSV = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Create an index on the test column
CREATE INDEX IX_JBMTest_SomeInt ON dbo.JBMTest (SomeInt)
--===== Change some of the data in the test column to NULL (more than 30%)
UPDATE dbo.JBMTest
SET SomeInt = NULL
WHERE SomeInt <= 16000
--======================================================================
-- Run some tests... "*" sucks...
--======================================================================
SET STATISTICS IO ON
SELECT * --===== Test1
FROM dbo.JBMTest
WHERE SomeInt IS NULL
SELECT RowNum,SomeInt --===== Test2
FROM dbo.JBMTest
WHERE SomeInt IS NULL
SELECT TOP 50 * --===== Test3
FROM dbo.JBMTest
WHERE SomeInt IS NULL
SELECT TOP 50 RowNum,SomeInt --===== Test4
FROM dbo.JBMTest
WHERE SomeInt IS NULL
SET STATISTICS IO OFF
--Jeff Moden
Change is inevitable... Change for the better is not.
July 13, 2007 at 11:27 pm
Of course if you drop the indexes altogether, 3 of the 4 tests should be slightly faster than with the indexes (the second won't be, as the columns are covered). That obviously only applies in certain data scenarios, but this happens to be one of them.
July 17, 2007 at 12:09 pm
Exactly. For example, having read the cited article before, I was never able to understand why something like "in" and "exists" are sargable but "not in" and "not exists" are not. The "not" only takes the result of the "in" or "exists" and inverts it. If one is sargable, the other has to be.
As for the specific question, I would fully expect an index to help. Null values would appear before any other value so the optimizer should know that as soon as a non-null appears, the search is over.
However, yes, do perform a test. My expectations have been dashed before...
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply