Non Clustered Index on Phone column

Question

Non Clustered Index on Phone column

sunilkmr284

Old Hand

Points: 309
More actions
August 18, 2018 at 3:48 am

#376457

Dear All,
I have a Lead table with below fields.
LeadId int (Not a primary key)
Telephone
...
...Table have 4 millions records
Client always search the data with the Telephone number.
I have created only Non clustered index on Telephone Number field.
Should I have also added the primary key constraint on leadId?
If I add the primary key on LeadId column, performance will be increase or not?

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

sunilkmr284 - Saturday, August 18, 2018 3:48 AM
Dear All,
I have a Lead table with below fields.
LeadId int (Not a primary key)
Telephone
...
...Table have 4 millions records
Client always search the data with the Telephone number.
I have created only Non clustered index on Telephone Number field.
Should I have also added the primary key constraint on leadId?
If I add the primary key on LeadId column, performance will be increase or not?

Can you post the full DDL (create table) script and information on what of the data you want to bring back and how the users will do the search?
😎

There is a big difference between handling small result sets and large result sets produced by very generic queries such as searching for anything starting with one or two digits.

sunilkmr284 Old Hand Points: 309 More actions · Answer 2

Eirikur Eiriksson - Saturday, August 18, 2018 5:14 AM
sunilkmr284 - Saturday, August 18, 2018 3:48 AM
Dear All,
I have a Lead table with below fields.
LeadId int (Not a primary key)
Telephone
...
...Table have 4 millions records
Client always search the data with the Telephone number.
I have created only Non clustered index on Telephone Number field.
Should I have also added the primary key constraint on leadId?
If I add the primary key on LeadId column, performance will be increase or not?
Can you post the full DDL (create table) script and information on what of the data you want to bring back and how the users will do the search?
😎
There is a big difference between handling small result sets and large result sets produced by very generic queries such as searching for anything starting with one or two digits.

USE ContactDB
GO

/****** Object: Table [dbo].[tbl_Lead] Script Date: 08/18/2018 16:56:57 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[tbl_Lead](
    [LeadID] [int] IDENTITY(1,1) NOT NULL,
    [Company Name] [varchar](100) NULL,
    [Title] [varchar](50) NULL,
    [First Name] [varchar](50) NULL,
    [Last Name] [varchar](50) NULL,
    [Address1] [varchar](100) NULL,
    [Address2] [varchar](100) NULL,
    [Address3] [varchar](100) NULL,
    [Town] [varchar](50) NULL,
    [County] [varchar](50) NULL,
    [PostCode] [varchar](20) NULL,
    [Telephone] [varchar](20) NULL,
    [Email] [varchar](150) NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Searching
CREATE PROCEDURE [dbo].[Lead_Get]
    @PhoneNo varchar(50)
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

select * from tbl_lead where Telephone=@PhoneNo
end

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 3

sunilkmr284 - Saturday, August 18, 2018 5:30 AM
Eirikur Eiriksson - Saturday, August 18, 2018 5:14 AM
sunilkmr284 - Saturday, August 18, 2018 3:48 AM
Dear All,
I have a Lead table with below fields.
LeadId int (Not a primary key)
Telephone
...
...Table have 4 millions records
Client always search the data with the Telephone number.
I have created only Non clustered index on Telephone Number field.
Should I have also added the primary key constraint on leadId?
If I add the primary key on LeadId column, performance will be increase or not?
Can you post the full DDL (create table) script and information on what of the data you want to bring back and how the users will do the search?
😎
There is a big difference between handling small result sets and large result sets produced by very generic queries such as searching for anything starting with one or two digits.
USE ContactDB
GO
/****** Object: Table [dbo].[tbl_Lead] Script Date: 08/18/2018 16:56:57 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tbl_Lead](
    [LeadID] [int] IDENTITY(1,1) NOT NULL,
    [Company Name] [varchar](100) NULL,
    [Title] [varchar](50) NULL,
    [First Name] [varchar](50) NULL,
    [Last Name] [varchar](50) NULL,
    [Address1] [varchar](100) NULL,
    [Address2] [varchar](100) NULL,
    [Address3] [varchar](100) NULL,
    [Town] [varchar](50) NULL,
    [County] [varchar](50) NULL,
    [PostCode] [varchar](20) NULL,
    [Telephone] [varchar](20) NULL,
    [Email] [varchar](150) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Searching
CREATE PROCEDURE [dbo].[Lead_Get]
    @PhoneNo varchar(50)
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;
select * from tbl_lead where Telephone=@PhoneNo
end

I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 4

Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 5

Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM
Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.
If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎

I did say if it was unique.
If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM
Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM
Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.
If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎
I did say if it was unique.
If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case. That's all indexes really are is a copy of the data sorted in a different order.

We also don't know what the op means by improving the performance. Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number. Even a HEAP will do fine in this area. If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

sunilkmr284 - Saturday, August 18, 2018 3:48 AM
Dear All,
I have a Lead table with below fields.
LeadId int (Not a primary key)
Telephone
...
...Table have 4 millions records
Client always search the data with the Telephone number.
I have created only Non clustered index on Telephone Number field.
Should I have also added the primary key constraint on leadId?
If I add the primary key on LeadId column, performance will be increase or not?

Performance of WHAT??? How is the table being used and what are the current issues with performance?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 8

Jeff Moden - Sunday, August 19, 2018 1:36 PM
Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM
Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM
Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.
If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎
I did say if it was unique.
If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.
An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case. That's all indexes really are is a copy of the data sorted in a different order.
We also don't know what the op means by improving the performance. Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number. Even a HEAP will do fine in this area. If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.

frederico_fonseca SSCoach Points: 16308 More actions · Answer 9

Jonathan AC Roberts - Sunday, August 19, 2018 1:59 PM
Jeff Moden - Sunday, August 19, 2018 1:36 PM
Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM
Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM
Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.
If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎
I did say if it was unique.
If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.
An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case. That's all indexes really are is a copy of the data sorted in a different order.
We also don't know what the op means by improving the performance. Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number. Even a HEAP will do fine in this area. If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".
The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.

Possibly the fastest - but is really the best option? without knowing the amount of updates/inserts there are on the table we don't even know if having the clustered index being the one based on the telephone isn't the better (which would give the same performance as a index with all columns - and just one tiny index to contain the leadid (which we weren't even told if it is used at all).

Having a index with all the columns is in most of the cases I've seen an overkill. And on this case with 4million rows on a table it does add a lot to the size.

We also don't know what other indexes, if any, exist on the table - the DDL posted is from a "create script" from ssms which by default will not script these, neither the primary key (which from what the OP said is not the leadid - may note have one though)

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 10

frederico_fonseca - Sunday, August 19, 2018 2:38 PM
Jonathan AC Roberts - Sunday, August 19, 2018 1:59 PM
Jeff Moden - Sunday, August 19, 2018 1:36 PM
Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM
Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM
Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.
If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎
I did say if it was unique.
If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.
An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case. That's all indexes really are is a copy of the data sorted in a different order.
We also don't know what the op means by improving the performance. Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number. Even a HEAP will do fine in this area. If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".
The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.
Possibly the fastest - but is really the best option? without knowing the amount of updates/inserts there are on the table we don't even know if having the clustered index being the one based on the telephone isn't the better (which would give the same performance as a index with all columns - and just one tiny index to contain the leadid (which we weren't even told if it is used at all).
Having a index with all the columns is in most of the cases I've seen an overkill. And on this case with 4million rows on a table it does add a lot to the size.
We also don't know what other indexes, if any, exist on the table - the DDL posted is from a "create script" from ssms which by default will not script these, neither the primary key (which from what the OP said is not the leadid - may note have one though)

In my initial answer I did say "If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.". So that covers the amount updates/inserts. A clustered index on phone number would get fragmented but on an identity column it would be less prone to fragmentation. And I did say "I would try this index", I didn't say this is definitely the best option.

sunilkmr284 Old Hand Points: 309 More actions · Answer 11

sunilkmr284

Old Hand

Points: 309

August 20, 2018 at 3:00 am

#2002438

Thank you all for suggestion me.

codyleaf SSC Enthusiast Points: 116 More actions · Answer 12

I did notice that you have a field of the Telephone as varchar(20) and your lookup @PhoneNo is varchar(50). I would recommend that you have the same type and size so the optimizer will use the intended index correctly all the time. It sounds like this would not make a difference but I have had plenty of cases where that is all it took to speed up the query. Just make sure there is no margin for error as you have here.

Cody L.

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 13

Jonathan AC Roberts - Sunday, August 19, 2018 3:09 PM
frederico_fonseca - Sunday, August 19, 2018 2:38 PM
Jonathan AC Roberts - Sunday, August 19, 2018 1:59 PM
Jeff Moden - Sunday, August 19, 2018 1:36 PM
Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM
Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM
Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM
I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.
If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
😎
I did say if it was unique.
If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.
An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case. That's all indexes really are is a copy of the data sorted in a different order.
We also don't know what the op means by improving the performance. Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number. Even a HEAP will do fine in this area. If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".
The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.
Possibly the fastest - but is really the best option? without knowing the amount of updates/inserts there are on the table we don't even know if having the clustered index being the one based on the telephone isn't the better (which would give the same performance as a index with all columns - and just one tiny index to contain the leadid (which we weren't even told if it is used at all).
Having a index with all the columns is in most of the cases I've seen an overkill. And on this case with 4million rows on a table it does add a lot to the size.
We also don't know what other indexes, if any, exist on the table - the DDL posted is from a "create script" from ssms which by default will not script these, neither the primary key (which from what the OP said is not the leadid - may note have one though)
In my initial answer I did say "If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.". So that covers the amount updates/inserts. A clustered index on phone number would get fragmented but on an identity column it would be less prone to fragmentation. And I did say "I would try this index", I didn't say this is definitely the best option.

Okay, IF I was querying the table by the phone number most frequently and pulling back all the data each time then I would cluster the table on the phone number with the LeadId column as the second column of the index to make it unique.

Joe Torre SSChampion Points: 10248 More actions · Answer 14

@sunilkmr284,
If you are joining to other tables frequently by LeadID then I would create a clustered PK on that. Clustering on the phone number may lead to fragmentation and affect performance negatively. As Jeff already stated, a non-clustered index should perform well for singleton searches. What is the cardinality? Run this query and let us know the results:
SELECT Count(*) TotalCount, Count(DISTINCT Telephone) PhoneCount ROM dbo.tbl_Lead;