Clustered index on varchar column

Question

Clustered index on varchar column

faboudib

SSC Veteran

Points: 268
More actions
August 10, 2007 at 2:22 am

#197245

I have a table that should support a large amount of records. On populating the table with a large amount of sample data, I found that most of the query performance was degraded significantly. I changed the clustered index to a varchar column which most of the queries use in their WHERE condition. The queries are now much faster. However, I'm concerned that this action would have repercussions in the long run.
Does putting my clustered index on a varchar column cause any problems relating to size, performance, or any other aspects?
Thanks.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

ChiragNS One Orange Chip Points: 26137 More actions · Answer 1

faboudib

If the values in this column will frequently change then this may not be a good candidate for creating a clustered index. In general its not a very good practice to create clustered index on varchar columns but again it depends.

"Keep Trying"

faboudib SSC Veteran Points: 268 More actions · Answer 2

If you mean by "If the values in this column will frequently change" that I would update this column, then the column can be safely assumed to be read only.

ChiragNS One Orange Chip Points: 26137 More actions · Answer 3

hi

If by "read only" you mean that the values in this column will change rarely then you may have clustered index on this column. Clustered index works best on numeric and date columns. But since your queries use this varchar column frequently then this may suffice.

Another thing you can try is to have the clustered index on another numeric or date time columns and have a non clustered index on this varchar column. Check out the performance of your queries in this situation.

"Keep Trying"

faboudib SSC Veteran Points: 268 More actions · Answer 4

Hi Chiraq,

Thanks for your help. My varchar column never changes. Furthermore, I tried both cases to see which is faster (varchar column and autonumber PK column as clustered index). Having the varchar column as my clustered index instead of a non-clustered index increases my query's performance three fold. So I will stick with it as my clustered index.

Best regards, and thanks for your help.

philcart SSC-Forever Points: 47796 More actions · Answer 5

faboudib

I'd like to see the table structure and query before recommending that you have a clustered index on a varchar column.

--------------------
Colt 45 - the original point and click interface

faboudib SSC Veteran Points: 268 More actions · Answer 6

Here it is:

CREATE TABLE [dbo].[Card] (

[CardId] [bigint] IDENTITY (1, 1) NOT NULL ,

[SerialNumber] [nvarchar] (50) NOT NULL ,

[StatusId] [smallint] NOT NULL ,

[PurchaseOrderId] [int] NOT NULL

) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Card] ADD

CONSTRAINT [PK_Card] PRIMARY KEY

(

[CardId]

) ON [PRIMARY]

GO

philcart SSC-Forever Points: 47796 More actions · Answer 7

And the query you're using ...

--------------------
Colt 45 - the original point and click interface

faboudib SSC Veteran Points: 268 More actions · Answer 8

select * from card where serialnumber between @fromserial and @toserial and purchaseorderid = @poid.

updates are done using the primary key but only after the cardid is acquired by its serial number.

philcart SSC-Forever Points: 47796 More actions · Answer 9

My varchar column never changes

So why have the CardID column at all?

updates are done using the primary key but only after the cardid is acquired by its serial number

Why not update the record once you've found it via the serial number. Why make a second call to the database for the update?

What was the non-clustered index you used?

--------------------
Colt 45 - the original point and click interface

faboudib SSC Veteran Points: 268 More actions · Answer 10

There are some validations that need to be done according to some of the fields present in the card record.

The non-clustered index was card id.

After I found the card by it's serial number once, the update shouldn't once again have to find the card by its serial number (varchar => more expensive), rather the update uses the cardid to find the record to be updated. I don't know if you're getting my point.

philcart SSC-Forever Points: 47796 More actions · Answer 11

Ok, based on the query you provided a non-clustered index on card id won't help.

For the way you're explaining the updates, this is how I see it,

1) first return the matching cardid for the given serial number

2) update StatusId and/or PurchaseOrderId fields based on the cardid

Why isn't it just,

update StatusId and/or PurchaseOrderId fields for the given serial number.

The "(varchar => more expensive)" argument isn't really valid as you've already searched for the record by the varchar field anyway.

Firstly, I'd change the primary key definition to non-clustered. By default SQL Server will create the primary key as you've provided as a clustered index.

Secondly, I'd create a non-clustered index including SerialNumber and PurchaseOrderId. This covers both of your search criteria.

--------------------
Colt 45 - the original point and click interface

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

1) clustered indexes generally help with BETWEEN queries.

2) they do have a down side in that the clustering key is carried as the pointer for every non-clustered index on the table. Thus large varchars can make for fat (read inefficient) NC indexes.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Sergiy SSC Guru Points: 110208 More actions · Answer 13

Based on this query you need clustered index on 2 columns: (purchaseorderid , serialnumber )

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 14

How many Cards your system inserts per second?

Because if you've got 1 new Card per second (3600 per hour, 86400 per day) you gonna reach the limit of "int" capacity in 120 years. At that point you gonna need "bigint" for your identity column.

_____________
Code for TallyGenerator