August 10, 2007 at 2:22 am
I have a table that should support a large amount of records. On populating the table with a large amount of sample data, I found that most of the query performance was degraded significantly. I changed the clustered index to a varchar column which most of the queries use in their WHERE condition. The queries are now much faster. However, I'm concerned that this action would have repercussions in the long run.
Does putting my clustered index on a varchar column cause any problems relating to size, performance, or any other aspects?
Thanks.
August 10, 2007 at 3:57 am
faboudib
If the values in this column will frequently change then this may not be a good candidate for creating a clustered index. In general its not a very good practice to create clustered index on varchar columns but again it depends.
"Keep Trying"
August 10, 2007 at 4:41 am
If you mean by "If the values in this column will frequently change" that I would update this column, then the column can be safely assumed to be read only.
August 13, 2007 at 12:29 am
hi
If by "read only" you mean that the values in this column will change rarely then you may have clustered index on this column. Clustered index works best on numeric and date columns. But since your queries use this varchar column frequently then this may suffice.
Another thing you can try is to have the clustered index on another numeric or date time columns and have a non clustered index on this varchar column. Check out the performance of your queries in this situation.
"Keep Trying"
August 13, 2007 at 12:45 am
Hi Chiraq,
Thanks for your help. My varchar column never changes. Furthermore, I tried both cases to see which is faster (varchar column and autonumber PK column as clustered index). Having the varchar column as my clustered index instead of a non-clustered index increases my query's performance three fold. So I will stick with it as my clustered index.
Best regards, and thanks for your help.
August 13, 2007 at 7:16 am
faboudib
I'd like to see the table structure and query before recommending that you have a clustered index on a varchar column.
--------------------
Colt 45 - the original point and click interface
August 13, 2007 at 11:31 pm
Here it is:
CREATE TABLE [dbo].[Card] (
[CardId] [bigint] IDENTITY (1, 1) NOT NULL ,
[SerialNumber] [nvarchar] (50) NOT NULL ,
[StatusId] [smallint] NOT NULL ,
[PurchaseOrderId] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Card] ADD
CONSTRAINT [PK_Card] PRIMARY KEY
(
[CardId]
) ON [PRIMARY]
GO
August 13, 2007 at 11:35 pm
And the query you're using ...
--------------------
Colt 45 - the original point and click interface
August 13, 2007 at 11:47 pm
select * from card where serialnumber between @fromserial and @toserial and purchaseorderid = @poid.
updates are done using the primary key but only after the cardid is acquired by its serial number.
August 13, 2007 at 11:52 pm
My varchar column never changes
So why have the CardID column at all?
updates are done using the primary key but only after the cardid is acquired by its serial number
Why not update the record once you've found it via the serial number. Why make a second call to the database for the update?
What was the non-clustered index you used?
--------------------
Colt 45 - the original point and click interface
August 14, 2007 at 2:21 am
There are some validations that need to be done according to some of the fields present in the card record.
The non-clustered index was card id.
After I found the card by it's serial number once, the update shouldn't once again have to find the card by its serial number (varchar => more expensive), rather the update uses the cardid to find the record to be updated. I don't know if you're getting my point.
August 14, 2007 at 4:30 am
Ok, based on the query you provided a non-clustered index on card id won't help.
For the way you're explaining the updates, this is how I see it,
1) first return the matching cardid for the given serial number
2) update StatusId and/or PurchaseOrderId fields based on the cardid
Why isn't it just,
update StatusId and/or PurchaseOrderId fields for the given serial number.
The "(varchar => more expensive)" argument isn't really valid as you've already searched for the record by the varchar field anyway.
Firstly, I'd change the primary key definition to non-clustered. By default SQL Server will create the primary key as you've provided as a clustered index.
Secondly, I'd create a non-clustered index including SerialNumber and PurchaseOrderId. This covers both of your search criteria.
--------------------
Colt 45 - the original point and click interface
August 14, 2007 at 9:29 am
1) clustered indexes generally help with BETWEEN queries.
2) they do have a down side in that the clustering key is carried as the pointer for every non-clustered index on the table. Thus large varchars can make for fat (read inefficient) NC indexes.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 15, 2007 at 11:22 pm
Based on this query you need clustered index on 2 columns: (purchaseorderid , serialnumber )
_____________
Code for TallyGenerator
August 15, 2007 at 11:26 pm
How many Cards your system inserts per second?
Because if you've got 1 new Card per second (3600 per hour, 86400 per day) you gonna reach the limit of "int" capacity in 120 years. At that point you gonna need "bigint" for your identity column.
_____________
Code for TallyGenerator
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply