October 10, 2007 at 7:39 am
Hi all.....
I "inherited" a project and have a couple questions about how some of the tables were designed that I need to change and thought I'd check here to see if I'm overlooking any pitfalls.
Right now the only primary key is on the identity column and since the datasets are pretty big (10 - 100k records) some of the queries are running pretty sluggish. I see a few indexes, but it doesn't seem to make much difference.
In this table are three columns that will never have duplicate data, so I intend to remove the existing key on the identity column and make those three columns primary keys. Then delete the existing indexes which are based on those three columns anyway. (apparently the indexes were created with the index tuning wizard)
From what I've read and come to understand (and here's where I need the insight) is that when you set primary keys, that also acts as an index and "should" make queries execute faster because of how the data's stored in the db.
Other than duplicate checking that needs to be done prior to inserts, what am I overlooking in this?
Thanks
Frank
October 10, 2007 at 7:49 am
Frank,
What are you querying on? Maybe you can give us the table schema and let us know how the queries are structured.
If there are indexes on those columns, and you query on them, it should run fast. The PK doesn't affect performance, it's all about integrity.
If you're querying on the identity field, it would be faster than the 3 columns because the index is smaller, more rows per page, runs quicker. If you're querying on non-PK fields (identity or otherwise), your change won't help. If anything, you'll run slowed because you're scanning a table with a larger index.
Couple more questions:
- is the PK clustered?
- are the other indexes compound or single? (send the DDL in).
October 10, 2007 at 8:38 am
Steve Jones - Editor (10/10/2007)
Frank,What are you querying on? Maybe you can give us the table schema and let us know how the queries are structured.
If there are indexes on those columns, and you query on them, it should run fast. The PK doesn't affect performance, it's all about integrity.
If you're querying on the identity field, it would be faster than the 3 columns because the index is smaller, more rows per page, runs quicker. If you're querying on non-PK fields (identity or otherwise), your change won't help. If anything, you'll run slowed because you're scanning a table with a larger index.
Couple more questions:
- is the PK clustered?
- are the other indexes compound or single? (send the DDL in).
Steve,
Thanks for the quick reply.. Here's the info.
Here's the create statement for this table:
[VaultDBTapes] (
[VaultTapeId] [int] IDENTITY (1, 1) NOT NULL ,
[GroupId] [int] NOT NULL ,
[VaultDBId] [int] NOT NULL ,
[RuleException] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[RuleExceptionDir] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[VaultDBLocID] [int] NOT NULL ,
[TapeNum] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Container] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Slot] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ExpDate] [datetime] NULL ,
[ShipDate] [datetime] NULL ,
[ReturnDate] [datetime] NULL ,
[OffsiteReceiveDate] [datetime] NULL ,
[OffsiteReturnDate] [datetime] NULL ,
[LastModified] [datetime] NULL ,
[ModifiedBy] [int] NULL ,
[Comments] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TrackingNo] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ToLocationID] [int] NULL ,
[OldVaultTapeId] [int] NULL ,
[ImportDir] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DRShip] [datetime] NULL ,
[DRReturn] [datetime] NULL ,
[DRLocation] [int] NULL ,
[ATDR] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL CONSTRAINT [DF_VaultDBTapes_ATDR] DEFAULT ('N'),
[InternalLocCode] [int] NULL ,
[InternalDate] [datetime] NULL ,
[InternalComments] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_VaultDBTapes] PRIMARY KEY CLUSTERED
(
[GroupId],
[VaultDBId],
[TapeNum]
) WITH FILLFACTOR = 90 ON [PRIMARY] ,
CONSTRAINT [FK_VaultDBTapes_VaultDB] FOREIGN KEY
(
[VaultDBId]
) REFERENCES [VaultDB] (
[VaultDBId]
),
CONSTRAINT [FK_VaultDBTapes_VaultDBLoc] FOREIGN KEY
(
[VaultDBLocID]
) REFERENCES [VaultDBLoc] (
[VaultDBLocID]
),
CONSTRAINT [FK_VaultDBTapes_VaultDBLoc1] FOREIGN KEY
(
[ToLocationID]
) REFERENCES [VaultDBLoc] (
[VaultDBLocID]
)
) ON [PRIMARY]
Regarding the query itself (one of the many anyway), the fields that are always present (or should be, I'm still finding missing pieces) for any query on this table are GroupID, VaultDBID, and TapeNum. Some queries will also query non PK fields like dates and locations etc, but the PK fields will always be in the query.
I'll have to read up on a cpl things you pointed out but what I do remember reading is that the order the data is stored physically on the disk is by the PK info and from an experiment I made earlier, that does seem to make a huge impact on queries.
Q1 = where VaultDBID = 883 and TapeNum = '90000003'
doesnt run as fast as
Q2 = where VaultDBID = 883 and TapeNum = '90000003' and groupid = 150
Profiler shows a duration average of 398 (miliseconds?) for Q1 and 0 for Q2. When I did this I ran Q1 and Q1 checked the times then repeated 4 times with the same results.
Thanks again
Frank
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply