Clustered index on varchar column

  • I have a table that should support a large amount of records. On populating the table with a large amount of sample data, I found that most of the query performance was degraded significantly. I changed the clustered index to a varchar column which most of the queries use in their WHERE condition. The queries are now much faster. However, I'm concerned that this action would have repercussions in the long run.

    Does putting my clustered index on a varchar column cause any problems relating to size, performance, or any other aspects?

    Thanks.

  • faboudib

    If the values in this column will frequently change then this may not be a good candidate for creating a clustered index. In general its not a very good practice to create clustered index on varchar columns but again it depends.

    "Keep Trying"

  • If you mean by "If the values in this column will frequently change" that I would update this column, then the column can be safely assumed to be read only.

  • hi

    If by "read only" you mean that the values in this column will change rarely then you may have clustered index on this column. Clustered index works best on numeric and date columns. But since your queries use this varchar column frequently then this may suffice.

    Another thing you can try is to have the clustered index on another numeric or date time columns and have a non clustered index on this varchar column. Check out the performance of your queries in this situation.

    "Keep Trying"

  • Hi Chiraq,

    Thanks for your help. My varchar column never changes. Furthermore, I tried both cases to see which is faster (varchar column and autonumber PK column as clustered index). Having the varchar column as my clustered index instead of a non-clustered index increases my query's performance three fold. So I will stick with it as my clustered index.

    Best regards, and thanks for your help.

  • faboudib

    I'd like to see the table structure and query before recommending that you have a clustered index on a varchar column.

    --------------------
    Colt 45 - the original point and click interface

  • Here it is:

    CREATE TABLE [dbo].[Card] (

    [CardId] [bigint] IDENTITY (1, 1) NOT NULL ,

    [SerialNumber] [nvarchar] (50) NOT NULL ,

    [StatusId] [smallint] NOT NULL ,

    [PurchaseOrderId] [int] NOT NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Card] ADD

    CONSTRAINT [PK_Card] PRIMARY KEY

    (

    [CardId]

    ) ON [PRIMARY]

    GO

  • And the query you're using ...

    --------------------
    Colt 45 - the original point and click interface

  • select * from card where serialnumber between @fromserial and @toserial and purchaseorderid = @poid.

    updates are done using the primary key but only after the cardid is acquired by its serial number.

  • My varchar column never changes

    So why have the CardID column at all?

    updates are done using the primary key but only after the cardid is acquired by its serial number

    Why not update the record once you've found it via the serial number. Why make a second call to the database for the update?

    What was the non-clustered index you used?

    --------------------
    Colt 45 - the original point and click interface

  • There are some validations that need to be done according to some of the fields present in the card record.

    The non-clustered index was card id.

    After I found the card by it's serial number once, the update shouldn't once again have to find the card by its serial number (varchar => more expensive), rather the update uses the cardid to find the record to be updated. I don't know if you're getting my point.

  • Ok, based on the query you provided a non-clustered index on card id won't help.

    For the way you're explaining the updates, this is how I see it,

    1) first return the matching cardid for the given serial number

    2) update StatusId and/or PurchaseOrderId fields based on the cardid

    Why isn't it just,

    update StatusId and/or PurchaseOrderId fields for the given serial number.

    The "(varchar => more expensive)" argument isn't really valid as you've already searched for the record by the varchar field anyway.

    Firstly, I'd change the primary key definition to non-clustered. By default SQL Server will create the primary key as you've provided as a clustered index.

    Secondly, I'd create a non-clustered index including SerialNumber and PurchaseOrderId. This covers both of your search criteria.

    --------------------
    Colt 45 - the original point and click interface

  • 1) clustered indexes generally help with BETWEEN queries.

    2) they do have a down side in that the clustering key is carried as the pointer for every non-clustered index on the table. Thus large varchars can make for fat (read inefficient) NC indexes.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Based on this query you need clustered index on 2 columns: (purchaseorderid , serialnumber )

    _____________
    Code for TallyGenerator

  • How many Cards your system inserts per second?

    Because if you've got 1 new Card per second (3600 per hour, 86400 per day) you gonna reach the limit of "int" capacity in 120 years. At that point you gonna need "bigint" for your identity column.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply