SQL Server 2005 Enterprise Edition - SP3: PKey has Identity Spec on and millions of rows...

Question

Post reply

SQL Server 2005 Enterprise Edition - SP3: PKey has Identity Spec on and millions of rows...

SQL_ME_RICH

SSChampion

Points: 11200
More actions
April 24, 2012 at 2:56 pm

#256962

Hey everyone -
So, I am working on a project currently where I have 2 tables in particular that between them constitutes nearly 1 billion rows of data (well - maybe a few hundred million short of that, but fairly close).
I have 2 tests that are going to be performed for the existing database. Here's the layout...
1. Filegroups have been put in place for the system and user db's that give it much better DiskIO than was previously in place, and
2. In addition to the above configuration, we have taken 1 of the 2 large tables, and partitioned it into 24 different partitions/Filegroups.
Now - here is where I need some advice. On both of the 2 large tables, the Primary Key/Clustered index has the Identity Specification turned on. It starts at 1 and increments by 1. I am wanting to suggest to them to recreate the Clustered Index for the PKey to a Non-Clustered Index, and put the Clustered index on one of the other columns (preferably a datetime data type).
One of the tables has the datetime data type available to it, but the other does not. Will my idea still work if I have the Clustered Index moved to a different column that is not a timedate data type? Also - and this is more sidebar than anything, but the partitioned test db did not include the Partition Key in the Primary Key for the Clustered Index on that partitioned table. Is that going to make a big difference in performance? Does it matter? I thought I had seen previous discussions where this had been said that it needs to be in there.
Sorry for the lack of DDL to show you the tables, but they are kinda funny about that around here. I may be able to just recontruct them for a better visual of the tables, function, scheme, indexes if anyone would like to have a better point of reference.
Thank you all for your time on this!
Rich

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 1

Perhaps this will help me out a bit in my search for ideas...

The 2 tables in question in my first test db look like this...

/****** Object: Table [dbo].[TESTTBL1] Script Date: 04/13/2012 10:52:48 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[TESTTBL1](

[LocationId] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

[Location] [nvarchar](128) NOT NULL,

[MinLat] [int] NOT NULL,

[MinLong] [int] NOT NULL,

[GeocoderId] [tinyint] NOT NULL,

[EffectiveGeocoderId] [tinyint] NOT NULL,

CONSTRAINT [PK_TESTTBL1_INT5] PRIMARY KEY CLUSTERED

(

[LocationId] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [FG5]

) ON [FG5]

GO

CREATE NONCLUSTERED INDEX [IX_BW_LOCATION_GEOCODERID] ON [dbo].[TESTTBL1]

(

[GeocoderId] ASC,

[EffectiveGeocoderId] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [NC3]

GO

CREATE NONCLUSTERED INDEX [IX_MinLat_MinLong5] ON [dbo].[TESTTBL1]

(

[MinLat] ASC,

[MinLong] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [NC3]

GO

/****** Object: Table [dbo].[TESTTBL2] Script Date: 04/13/2012 10:51:18 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[TESTTBL2](

[ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

[EVENT_CODE] [int] NOT NULL,

[LAT_COORD] [float] NOT NULL,

[LONG_COORD] [float] NOT NULL,

[ODOMETER] [int] NOT NULL,

[SPEED] [smallint] NOT NULL,

[GPS_RECORD_DATE] [datetime] NOT NULL,

[LocationId] [int] NULL,

[AWARE_DEVICE_FK] [int] NOT NULL,

[NUMBER_SATELLITE] [tinyint] NULL,

[ALTITUDE] [smallint] NULL,

[ACCELERATE] [tinyint] NULL,

[DECELERATE] [tinyint] NULL,

[RPM] [smallint] NULL,

[HEADING] [smallint] NULL,

[GPS_RECOVERED] [bit] NULL,

[HDOP] [float] NULL,

[FuelLevel] [tinyint] NULL,

[ReadStatus] [tinyint] NOT NULL,

CONSTRAINT [PK_TESTTBL2] PRIMARY KEY CLUSTERED

(

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [FG2]

) ON [FG2]

GO

CREATE NONCLUSTERED INDEX [IX_DEVICE_AND_DATE] ON [dbo].[TESTTBL2]

(

[AWARE_DEVICE_FK] ASC,

[GPS_RECORD_DATE] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [NC4]

GO

CREATE NONCLUSTERED INDEX [IX_EVENT_CODE] ON [dbo].[TESTTBL2]

(

[EVENT_CODE] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [NC4]

GO

CREATE NONCLUSTERED INDEX [IX_RECORDDATE_EVENT_DEVICE_AND_ID] ON [dbo].[TESTTBL2]

(

[GPS_RECORD_DATE] ASC,

[EVENT_CODE] ASC,

[AWARE_DEVICE_FK] ASC,

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [NC4]

GO

Now - in this first test db, the main difference between the way the database exists today out in our PROD environment, and the way we have it in our TEST environment is 2-fold.

1. The system db's are distrubuted to a RAID-5 Array, where they currently now reside on our PROD system on the same drive (a mirrored pair though - I believe RAID 1+0) on Drive C of the PROD box, and...

2. Filegroups have now been put together to split up all the various tables for better balance and IO, in a RAID-5 Array. Both Tables and Non-Clustered Indexes are managed this way.

Unfortunately because it is RAID-5 and not RAID 1+0, etc, I realize that there is no way to specify which disk a certain Filegroup resides on, but it's better than how it is currently configured today in PROD. The .MDF file and the .LDF files are on a separate controller than the RAID-5 Array, but I am not certain as to what RAID level that device is configured with.

Now the way that the next test db differs from this one, is that we have created additional Filegroups (24) to accomodate a table partition that has been put in place for TESTTBL2 (because it is the one with the highest row count, and gets hammered the hardest). The DDL for the 2 tables in the second test db is basically the same (except for Filegroup and Non-Clustered Index numeration), but here is the DDL for the Partition Function, Scheme, and reconstructed index for the TESTTBL2...

TESTTBL2

/****** Object: Table [dbo].[TESTTBL2] Script Date: 04/25/2012 11:26:04 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[TESTTBL2](

[ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

[EVENT_CODE] [int] NOT NULL,

[LAT_COORD] [float] NOT NULL CONSTRAINT [DF__BW_RECORD__LAT_C__2B8A53B1] DEFAULT ((-1)),

[LONG_COORD] [float] NOT NULL CONSTRAINT [DF__BW_RECORD__LONG___2C7E77EA] DEFAULT ((-1)),

[ODOMETER] [int] NOT NULL CONSTRAINT [DF__BW_RECORD__ODOME__2D729C23] DEFAULT ((-1)),

[SPEED] [smallint] NOT NULL CONSTRAINT [DF__BW_RECORD__SPEED__2E66C05C] DEFAULT ((0)),

[GPS_RECORD_DATE] [datetime] NOT NULL CONSTRAINT [DF__BW_RECORD__GPS_R__2F5AE495] DEFAULT (getutcdate()),

[LocationId] [int] NULL,

[AWARE_DEVICE_FK] [int] NOT NULL,

[NUMBER_SATELLITE] [tinyint] NULL,

[ALTITUDE] [smallint] NULL,

[ACCELERATE] [tinyint] NULL,

[DECELERATE] [tinyint] NULL,

[RPM] [smallint] NULL,

[HEADING] [smallint] NULL,

[GPS_RECOVERED] [bit] NULL CONSTRAINT [DF__BW_RECORD__GPS_R__304F08CE] DEFAULT ((0)),

[HDOP] [float] NULL CONSTRAINT [DF__BW_RECORD___HDOP__31432D07] DEFAULT ((-1)),

[FuelLevel] [tinyint] NULL,

[ReadStatus] [tinyint] NOT NULL CONSTRAINT [DF__BW_RECORD__ReadS__63CEACD4] DEFAULT ((1)),

CONSTRAINT [PK_TESTTBL2] PRIMARY KEY CLUSTERED

(

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [NC6]

) ON [NC6]

Partition Function

CREATE PARTITION FUNCTION RecordsPartitionFunction(datetime)

AS

RANGE LEFT FOR VALUES ( '20110430 23:59:59.997', -- Apr 2011

'20110531 23:59:59.997', -- May 2011

'20110630 23:59:59.997', -- Jun 2011

'20110731 23:59:59.997', -- Jul 2011

'20110831 23:59:59.997', -- Aug 2011

'20110930 23:59:59.997', -- Sep 2011

'20111031 23:59:59.997', -- Oct 2011

'20111130 23:59:59.997', -- Nov 2011

'20111231 23:59:59.997', -- Dec 2011

'20120131 23:59:59.997', -- Jan 2012

'20120229 23:59:59.997', -- Feb 2012

'20120331 23:59:59.997', -- Mar 2012

'20120430 23:59:59.997', -- Apr 2012

'20120531 23:59:59.997', -- May 2012

'20120630 23:59:59.997', -- Jun 2012

'20120731 23:59:59.997', -- Jul 2012

'20120831 23:59:59.997', -- Aug 2012

'20120930 23:59:59.997', -- Sep 2012

'20121031 23:59:59.997', -- Oct 2012

'20121130 23:59:59.997', -- Nov 2012

'20121231 23:59:59.997', -- Dec 2012

'20130131 23:59:59.997', -- Jan 2013

'20130228 23:59:59.997', -- Feb 2013

'20130331 23:59:59.997') -- Mar 2013

GO

Partition Scheme

CREATE PARTITION SCHEME [RecordsPartitionScheme]

AS

PARTITION RecordsPartitionFunction TO

( [FG11], [FG12], [FG13], [FG14], [FG15], [FG16],

[FG17], [FG18], [FG19], [FG20],[FG21],[FG22],

[FG23],[FG24],[FG25],[FG26],[FG27],[FG28],

[FG29],[FG30],[FG31],[FG32],[FG33],[FG34],

[PRIMARY] )

The main thing I am concerned with is that we decided not to include the Partition Key with the old PKey column to create the new PKey this time around, but that is easy enough to change as we test - I suppose. :w00t:

I guess I am wanting to know if there will be any benefit to changing the PKey in both scenarios from Clustered to Non-Clustered (since it is an Identity Specified column with literally hundreds of millions of rows in both TESTTBL1 and TESTTBL2) and placing the Clustered Index on a datetime column (which is only available - unfortunately - to TESTTBL2).

Am I just swinging in the wind here, so-to-speak, or could we see a benefit from doing this?. Again - TESTTBL2 is the largest concern right now because of ID issues (described in another thread about a week ago - can reference it if anyone wants a point of reference). Also - one other well needed point. There is little to no Cardinality in this database (about 90+ tables, of which only 15 have Fkeys on them). I know - I know, but trust me - this is a weird scheme that has been designed for speed, and not absolute accuracy of the data (which is mind boggling I know - but this is what they want) :w00t:

Again - any and all insight is welcomed. I appreciate your time in reading and evaluating all of this for me.

Thanks!

Rich

GSquared SSC Guru Points: 260824 More actions · Answer 2

Any particular reason to move the clustered index from an identity column to a datetime column? Usually, both of those just indicate insert-sequence, and accomplish the same thing structurally.

Unless the datetime column is appropriate as the leading edge for a lot of queries, and the ID is not.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 3

The datetime column is used in one query in particular with the DeviceID, but moreover - it is supposed to be the Partition Key column in the second database (but currently is not paired with it).

Do you see any advantage in the first database in having it take on the Clustered Index, while making the ID column's PKey Non-Clustered?

Also - if so, can that be done in the second test database as well (that's the one where this table has a partition function and scheme splitting it up for hopeful performance and management ease).

GSquared SSC Guru Points: 260824 More actions · Answer 4

Honestly, partitioning is complex enough and very sensitive to the actual data it's being used on, that all I can suggest is testing it out. You'll find out more from a couple of day's worth of tests on something like that than you'll find out from any amount of online discussion.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 5

We definitely have...It looks like it is going to work out nicely, but it took some debate on getting the partition key in with the previous lone column that made up the Primary Key of the original table.

I think we have a lock on it now. Thanks for all your insight though.