September 22, 2011 at 5:59 pm
Hello All,
I have a table and the DDL is show below:
CREATE
TABLE [dbo].[XYZ]
([cacheId] [numeric](10, 0) NULL, [TrackingNumber] [varchar](255) NULL, [FormID] [varchar](255) NULL, [CCIPortID] [varchar](255) NULL,
[OriginStationID] [varchar](255) NULL, [DestinationStationID] [varchar]
(255) NULL, [DestinationZipCode] [varchar](255) NULL, [CarrierOID] [varchar](255) NULL, [SentToSPSS] [bit]NULL, [ProcessedBySPSS] [bit]
NULL, [SpecialSorted] [varchar](255) NULL, [SepUID] [varchar](255) NULL, [UpdatedTrackingIndex] [bit]NULL, [OkToSendToSorter] [bit]
NULL, [RouteID] [varchar](255) NULL, [SentToSorterCount] [numeric]
(10, 0) NULL, [time_created$] [datetime]NULL, [time_last_modified$] [datetime]NULL, [parent$_id$] [numeric](19, 0) NULL, [id$] [numeric]
(19, 0) NOT NULL, [extId$] [varchar](255) NULL, [state$] [char]
(1) NULL) ON [PRIMARY]
GO
Here is the index info:
Index Type Column Name
id$ clustered index
sendtoSPSS filtered index
OkToSendSortor filtered index
trackingindex NC
Fill factor: 80% and pad index is enabled on all indexes
here is the DMLs that are frequently executed on the XYZ table
Select Statement:
select T.cacheId, T.id$, T.extId$, T.state$, T.time_created$, T.time_last_modified$, T.parent$_id$, T.TrackingNumber, T.FormID, T.CCIPortID, T.OriginStationID, T.DestinationStationID, T.DestinationZipCode, T.CarrierOID, T.SentToSPSS, T.ProcessedBySPSS, T.SpecialSorted, T.SepUID, T.UpdatedTrackingIndex, T.OkToSendToSorter, T.RouteID, T.SentToSorterCount from XYZ T
Update Statement:
update XYZ set cacheId=?, id$=?, extId$=?, state$=?, time_created$=?, time_last_modified$=?, parent$_id$=?, TrackingNumber=?, FormID=?, CCIPortID=?, OriginStationID=?, DestinationStationID=?, DestinationZipCode=?, CarrierOID=?, SentToSPSS=?, ProcessedBySPSS=?, SpecialSorted=?, SepUID=?, datedTrackingIndex=?, OkToSendToSorter=?, RouteID=?, SentToSorterCount=?
where id$=?
Delete Statement: delete from XYZ where id$=?
Insert Statement:
insert into XYZ(cacheId, id$, extId$, state$, time_created$, time_last_modified$, parent$_id$, TrackingNumber, FormID, CCIPortID, OriginStationID, DestinationStationID, DestinationZipCode, CarrierOID, SentToSPSS, ProcessedBySPSS, SpecialSorted, SepUID, UpdatedTrackingIndex, OkToSendToSorter, RouteID,
SentToSorterCount) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
My Question is: My Index are frequently fragmented and I can understand that there should be non-clustered index on id$ column and if so which column is best for Clustered index.
here is one more select statement that executes on same XYZ table:
select t4.TrackingNumber, t6.EffectiveInterceptZipCode, t6.EffectiveDiverterCode from XYZ t4 with (nolock), table-1 t5 with (nolock), table-2 t6 with (nolock),(Select distinct t1.TrackingNumber from XYZ t1 with (nolock), table-1 t2 with (nolock),table-2 t3 with (nolock) where exists (select * from (select count(1) as shipcount from XYZ tt1where tt1.TrackingNumber=t1.TrackingNumber) tt2 where tt2.shipcount>1)and t2.valPid$=0 and t1.SentToSPSS = 0 and t1.OkToSendToSorter = 1and t3.StatusExpirationTimeStamp_tm > GETDATE() and t1.id$=t2.pid$and t2.pid$=t3.parent$_id$ and t3.id$=t2.id$ ) t7
where t5.valPid$=0 and t6.StatusExpirationTimeStamp_tm > GETDATE()
and t4.TrackingNumber=t7.TrackingNumber and t4.id$=t5.pid$
and t5.pid$=t6.parent$_id$ and t6.id$=t5.id$
order by TrackingNumber, DuplicateRecordRanking
The reason I create the Filtered indexes on sendtoSPSS and OkToSendToSorter colums is the data in those columns is either 0 or 1.
Please advice
Thanks
September 23, 2011 at 2:26 am
Have a read through this: http://www.sqlservercentral.com/articles/Indexing/68563/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2011 at 2:29 am
You're not giving enough information to get sensible advice.
First of all, give yourself a chance to get an answer and format the code you post. You can make it more readable using the IFCode tags [ code]your code goes here[/code]. I intentionally put a space between "[" and "code]" to make it display, but you should not include it.
However, your code, once formatted and surrounded in IFCode looks like this:
CREATE TABLE [dbo].[XYZ]
(
[cacheId] [numeric](10, 0) NULL,
[TrackingNumber] [varchar](255) NULL,
[FormID] [varchar](255) NULL,
[CCIPortID] [varchar](255) NULL,
[OriginStationID] [varchar](255) NULL,
[DestinationStationID] [varchar](255) NULL,
[DestinationZipCode] [varchar](255) NULL,
[CarrierOID] [varchar](255) NULL,
[SentToSPSS] [bit] NULL,
[ProcessedBySPSS] [bit] NULL,
[SpecialSorted] [varchar](255) NULL,
[SepUID] [varchar](255) NULL,
[UpdatedTrackingIndex] [bit] NULL,
[OkToSendToSorter] [bit] NULL,
[RouteID] [varchar](255) NULL,
[SentToSorterCount] [numeric](10, 0) NULL,
[time_created$] [datetime] NULL,
[time_last_modified$] [datetime] NULL,
[parent$_id$] [numeric](19, 0) NULL,
[id$] [numeric](19, 0) NOT NULL,
[extId$] [varchar](255) NULL,
[state$] [char](1) NULL
)
ON [PRIMARY]
GO
Statements:
--Select Statement:
SELECT T.cacheId,
T.id$,
T.extId$,
T.state$,
T.time_created$,
T.time_last_modified$,
T.parent$_id$,
T.TrackingNumber,
T.FormID,
T.CCIPortID,
T.OriginStationID,
T.DestinationStationID,
T.DestinationZipCode,
T.CarrierOID,
T.SentToSPSS,
T.ProcessedBySPSS,
T.SpecialSorted,
T.SepUID,
T.UpdatedTrackingIndex,
T.OkToSendToSorter,
T.RouteID,
T.SentToSorterCount
FROM XYZ T
--Update Statement:
UPDATE XYZ
SET cacheId = ?,
id$ = ?,
extId$ = ?,
state$ = ?,
time_created$ = ?,
time_last_modified$ = ?,
parent$_id$ = ?,
TrackingNumber = ?,
FormID = ?,
CCIPortID = ?,
OriginStationID = ?,
DestinationStationID = ?,
DestinationZipCode = ?,
CarrierOID = ?,
SentToSPSS = ?,
ProcessedBySPSS = ?,
SpecialSorted = ?,
SepUID = ?,
datedTrackingIndex = ?,
OkToSendToSorter = ?,
RouteID = ?,
SentToSorterCount = ?
WHERE id$ = ?
--Delete Statement:
DELETE
FROM XYZ
WHERE id$=?
--Insert Statement:
INSERT INTO XYZ
(
cacheId,
id$,
extId$,
state$,
time_created$,
time_last_modified$,
parent$_id$,
TrackingNumber,
FormID,
CCIPortID,
OriginStationID,
DestinationStationID,
DestinationZipCode,
CarrierOID,
SentToSPSS,
ProcessedBySPSS,
SpecialSorted,
SepUID,
UpdatedTrackingIndex,
OkToSendToSorter,
RouteID,
SentToSorterCount
)
VALUES (
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?
)
--here is one more select statement that executes on same XYZ table:
SELECT t4.TrackingNumber,
t6.EffectiveInterceptZipCode,
t6.EffectiveDiverterCode
FROM XYZ t4 WITH ( NOLOCK ),
[table-1] t5 WITH ( NOLOCK ),
[table-2] t6 WITH ( NOLOCK ),
( SELECT DISTINCT
t1.TrackingNumber
FROM XYZ t1 WITH ( NOLOCK ),
[table-1] t2 WITH ( NOLOCK ),
[table-2] t3 WITH ( NOLOCK )
WHERE EXISTS ( SELECT *
FROM ( SELECT COUNT(1) AS shipcount
FROM XYZ tt1
WHERE tt1.TrackingNumber = t1.TrackingNumber
) tt2
WHERE tt2.shipcount > 1 )
AND t2.valPid$ = 0
AND t1.SentToSPSS = 0
AND t1.OkToSendToSorter = 1
AND t3.StatusExpirationTimeStamp_tm > GETDATE()
AND t1.id$ = t2.pid$
AND t2.pid$ = t3.parent$_id$
AND t3.id$ = t2.id$
) t7
WHERE t5.valPid$ = 0
AND t6.StatusExpirationTimeStamp_tm > GETDATE()
AND t4.TrackingNumber = t7.TrackingNumber
AND t4.id$ = t5.pid$
AND t5.pid$ = t6.parent$_id$
AND t6.id$ = t5.id$
ORDER BY TrackingNumber,
DuplicateRecordRanking
Some general advice:
1) Don't use NOLOCK: you may end up with inconsistent data
2) Don't use $ and other symbols in your column names
3) Don't use "Table-1" as a table name: use an entity name. You already know it's a table, you don't need to state it again in its name.
4) A good clustering key is small, unique and ever-increasing. The column id$ looks small and (maybe) unique. Is it ever-increasing? If so, use it.
Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.
Hope this helps
Gianluca
-- Gianluca Sartori
September 23, 2011 at 7:33 am
Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.
Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.
September 23, 2011 at 7:54 am
okbangas (9/23/2011)
Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.
Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.
A composite key can be a good clustering key.
Also a uniqueidentifier can be a good clustering key.
Neither are perfect, but both can be good.
-- Gianluca Sartori
September 23, 2011 at 8:28 am
Hello Gianluca Sartori,
Thanks for the reply
The name table-1 is just a dummy name, it's not the real name of the table
We are using Nolocks because in my application select, Insert, delete and Updates happens contineously and some times simultaniously too and the business is okay with, not getting the latest data when we use NOLOCKS
The symbol $ is not recommneded by me and this application is going into production soon and I guess I can't remove $ symbol at this point of time
That id$ is not the identity column that we normaly use. That is a system generated integer and is always unique and here is the sample data in that id$ column
(3888088106,691956266,9855616042,10318775338,11919868970,11987993642,13390480426,27798766634,31674041386)
That id$ is the only column which has the unique data and the other columns has some duplicate data.
Thats where I am confused, ID$ column has clustered index and it is the only column which has the unique data but that id$ column is used in WHERE clause as (where id$ = "some data") but any column which has the where condition as (where id$ = "some data") it is good if it has nonclustered index on it.
But it has clustered index. thats where I am getting confusion.
Please advice
September 23, 2011 at 8:35 am
Gianluca Sartori (9/23/2011)
Also a uniqueidentifier can be a good clustering key.
Can be, maybe, if the person designing the table understands the complexities. Usually however it's about the worst clustered index possible (wideish and completely random)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2011 at 8:38 am
okbangas (9/23/2011)
Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.
Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.
Totally agree!
September 23, 2011 at 9:00 am
The best clustered index is really dependent on the data and the access paths to it. For instance, if you have a datatime column in your database that is frequently used in range queries, you may find this to be a better candidate for a clustered index than a unique or primary key.
Since we don't know your application telling you which column (or columns) would make a better clustered index is nothing more than guess work. You need to look at your data and the queries that access the table, and the frequency in which those queries are run to make a good decision on the clustered index.
September 23, 2011 at 9:05 am
Hi Gila,
Thanks for the reply.
Here is the sample Insert statement of the XYZ table:
INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])
VALUES (CAST(13 AS Numeric(10, 0)), N'868605045285', N'868605045285', N'CCII-ALL', N'CVRKK', N'MAZA', N'00604', N'2944556900', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F660168CC26 AS DateTime), CAST(0x00009F660168CC26 AS DateTime), CAST(3888071673 AS Numeric(19, 0)), CAST(3888088106 AS Numeric(19, 0)), NULL, N'C')
INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])
VALUES (CAST(93 AS Numeric(10, 0)), N'873270570640', N'873270570640', N'CCII-ALL', N'HRLA', N'QXZA', N'4061', N'2944555749', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F6600B5D89E AS DateTime), CAST(0x00009F6600B5D89E AS DateTime), CAST(9691939833 AS Numeric(19, 0)), CAST(9691956266 AS Numeric(19, 0)), NULL, N'C')
INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])
VALUES (CAST(91 AS Numeric(10, 0)), N'871711250160', N'871711250160', N'CCII-ALL', N'HRLA', N'QXZA', N'4061', N'2944555745', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F6600B5D854 AS DateTime), CAST(0x00009F6600B5D854 AS DateTime), CAST(9855599609 AS Numeric(19, 0)), CAST(9855616042 AS Numeric(19, 0)), NULL, N'C')
So, considering the DML statements which column is better for clustered index. As I mentioned except the id$ column, rest of the columns has duplicates but this ID$ column is used in where condition.
Kindly advice. I am seeing frequest fragmentation of indexes on the XYZ tables and because of this deadlocks are occuring in the application.
Please advice.
Thanks a TON
September 23, 2011 at 9:17 am
As I said above, we really can't tell you what would be the best column (or columns) for you to use for a clustered index. You have to make that decision based on your analysis of the data, the different access paths (queries run against the data), the frequency in which each of these queries are run.
The best clustered index may or may not be a unique or primary key. It may be one of the datetime columns because of frequent range searchs on that column.
September 23, 2011 at 9:35 am
mpalaparthi (9/23/2011)
Kindly advice. I am seeing frequest fragmentation of indexes on the XYZ tables and because of this deadlocks are occuring in the application.
Fragmentation is not going to be the cause of the deadlocks. Poorly written queries or poor indexing is the likely cause.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2011 at 9:36 am
Lynn Pettis (9/23/2011)
The best clustered index is really dependent on the data and the access paths to it. For instance, if you have a datatime column in your database that is frequently used in range queries, you may find this to be a better candidate for a clustered index than a unique or primary key.
Funny you mention it. Datetime can be ever inceeasing, bit rarely unique, causing sql server to add a four byte identifier to each row. Om such a scenario, it may actually be benedicial to have a unique composite clustering key, the datetime and an identity column. The latter may be used as a primay key as well 🙂
September 23, 2011 at 9:41 am
okbangas (9/23/2011)
Funny you mention it. Datetime can be ever inceeasing, bit rarely unique, causing sql server to add a four byte identifier to each row. Om such a scenario, it may actually be benedicial to have a unique composite clustering key, the datetime and an identity column. The latter may be used as a primay key as well 🙂
So you'd add a 4-byte integer to every row to avoid having SQL add a 4-byte integer to only the duplicate rows?
The uniquifier's only added to rows with a duplicate value of the clustering key, not to every row.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2011 at 10:16 am
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply