Primary Key Violation

Question

Post reply

Primary Key Violation

todd 87764

SSC Enthusiast

Points: 173
More actions
March 20, 2014 at 2:46 pm

#305559

I have a large, wide table (lots of columns and currently 300M rows but could grow to over a billion and beyond). The table has a primary key on an ID column, let's say AbcID, which is a nchar(14) data type. I am trying to partition the table on a date file (AbcDate), which I'm able to do but I'm having trouble reestablishing the primary key.
Here are the steps I am using, all of which work except for the last step because of a primary key violation error:
1. Add a new filegroup, DatabaseArchivePartition.
2. Add a new data file, DatabaseArchive.NDF, to the new archive partition filegroup.
3. Create the partition function, LargeTable_PartitionFunction, that will be used to split out the archive data from the LargeTable.
4. Create the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.
5. Rename the table LargeTable to LargeTable_PartitionStaging, from which to load the new partitioned table.
6. Create the new LargeTable table, specifying how to partition data.
7. Add Clustered Index on AbcDate.
8. Add Primary Key on AbcID (this column "IS" unique).
9. Copy data from Staging table into the new partitioned LargeTable table.
If I create a non-unique index on the AbcID column everything works fine, but if it is unique then the primary key violation error occurs. I have double and triple checked the data and the AbcID is truly unique.
What am I doing wrong?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 1

Double check your data before applying the primary key

SELECT COUNT(*), AbcID

FROM StagingTable

GROUP BY AbcID

HAVING COUNT(*) > 1

Verify there are no nulls also. Primary Key cannot be null.

Wes
(A solid design is always preferable to a creative workaround)

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 2

It has to be either duplicate data or NULL values. With a unique index you can have a single NULL value. As was stated, with a primary key, you can't have any.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 3

Already did this, as I said I triple checked and there are NO duplicate IDs and there are no nulls.

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 4

Here is the code, although the names have been changed to protect the innocent:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

USE master

go

PRINT 'Adding filegroup, DatabaseArchivePartition, for archive partition data.'

ALTER DATABASE MyDatabase

ADD FILEGROUP DatabaseArchivePartition; -- separate filegroup to hold the archive table partitions

GO

PRINT 'Adding new data file, MyDatabaseArchive.NDF, to the new archive partition filegroup.'

ALTER DATABASE MyDatabase

ADD FILE

( NAME = MyDatabaseArchive

,FILENAME = 'H:\Databases\MyDatabaseArchive.NDF'

,SIZE = 10240 MB

,FILEGROWTH = 10%

)

TO FILEGROUP DatabaseArchivePartition;

GO

USE MyDatabase

GO

PRINT 'Creating the partition function that will be used to split out the archive data from the student_attendance table.'

--drop PARTITION FUNCTION LargeTable_PartitionFunction

CREATE PARTITION FUNCTION LargeTable_PartitionFunction

(DATETIME)

AS RANGE LEFT FOR VALUES

('2013/08/01 00:00:00.000');

GO

PRINT 'Creating the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.'

--drop PARTITION SCHEME LargeTable_PartitionScheme

CREATE PARTITION SCHEME LargeTable_PartitionScheme

AS PARTITION LargeTable_PartitionFunction

TO ([PRIMARY], DatabaseArchivePartition);

GO

PRINT 'Renaming table LargeTable to LargeTable_PartitionStaging (partition staging, from which we will load the new table).'

-- rename the LargeTable table

EXEC sp_rename 'dbo.LargeTable', 'LargeTable_PartitionStaging';

go

PRINT 'Creating new LargeTable table, specifying how to partition data.'

CREATE TABLE [dbo].LargeTable

(

AbcID [nchar](14) NOT NULL

,SomeColumn varchar(20) NULL

,SomeOtherColumn varchar(50) NULL

,AbcDate [datetime] NOT NULL --...dozens of other columns

)

ON LargeTable_PartitionScheme(AbcDate);

GO

PRINT 'Adding Clustered Index on AbcDate.'

CREATE CLUSTERED INDEX CX_LargeTable_AbcDate

ON LargeTable(AbcDate)

ON [PRIMARY];

GO

PRINT 'Adding Primary Key on AbcID'

ALTER TABLE [dbo].Largetable

ADD CONSTRAINT PK_LargeTable_AbcID PRIMARY KEY NONCLUSTERED (AbcID)

ON [PRIMARY];

GO

/* the following will work...

PRINT 'Adding indexes on AbcID and AbcDate.'

CREATE INDEX IX_LargeTable_AbcDate ON LargeTable(AbcDate) ON [PRIMARY];

go

CREATE INDEX IX_LargeTable_AbcID ON LargeTable(AbcID) ON [PRIMARY];

go

*/

PRINT 'Copying data from Staging table into the new partitioned LargeTable table.' + CHAR(13)

INSERT INTO LargeTable SELECT * FROM LargeTable_PartitionStaging

go

--check partition counts

SELECT * FROM sys.partitions WHERE OBJECT_NAME(OBJECT_ID)='LargeTable';

GO

SELECT $PARTITION.LargeTable_PartitionFunction(AbcDate) AS PARTITIONID,

COUNT(* ) AS ROW_COUNT

FROM dbo.LargeTable

GROUP BY $PARTITION.LargeTable_PartitionFunction(AbcDate)

ORDER BY PARTITIONID;

GO

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 5

Let me ask this... Does the process of partitioning a table add the partition key to the primary key? This is the only thing I can think of that might be happening that would cause there to be duplicate primary keys.

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 6

todd 87764 (3/20/2014)
Let me ask this... Does the process of partitioning a table add the partition key to the primary key? This is the only thing I can think of that might be happening that would cause there to be duplicate primary keys.

The partitioning column is added to all unique indexes including PKs if the indexes are aligned to the data (you lose the ability to SWITCH in and out if they aren't). Even if the entire partitioning column had only 1 unique value (a made up scenario, for sure), adding it to what used to be a unique column would still create all unique values rather than duplicates.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 7

todd 87764 (3/20/2014)
Already did this, as I said I triple checked and there are NO duplicate IDs and there are no nulls.

Does the original table have a PK or UNIQUE index on the ABCID column?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 8

Yes, the original table has a primary key on the same AbcID column. I added a step to drop the primary key constraint before renaming the table to staging but still get the primary key violation when copying the data from staging to the new table. I'm baffled.

canonzone Newbie Points: 3 More actions · Answer 9

todd 87764 (3/20/2014)
I have a large, wide table (lots of columns and currently 300M rows but could grow to over a billion and beyond). The table has a primary key on an ID column, let's say AbcID, which is a nchar(14) data type. I am trying to partition the table on a date file (AbcDate), which I'm able to do but I'm having trouble reestablishing the primary key.
Here are the steps I am using, all of which work except for the last step because of a primary key violation error:
1. Add a new filegroup, DatabaseArchivePartition.
2. Add a new data file, DatabaseArchive.NDF, to the new archive partition filegroup.
3. Create the partition function, LargeTable_PartitionFunction, that will be used to split out the archive data from the LargeTable.
4. Create the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.
5. Rename the table LargeTable to LargeTable_PartitionStaging, from which to load the new partitioned table.
6. Create the new LargeTable table, specifying how to partition data.
7. Add Clustered Index on AbcDate.
8. Add Primary Key on AbcID (this column "IS" unique).
9. Copy data from Staging table into the new partitioned LargeTable table.
If I create a non-unique index on the AbcID column everything works fine, but if it is unique then the primary key violation error occurs. I have double and triple checked the data and the AbcID is truly unique.
What am I doing wrong?

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 10

Not sure what that post was all about, canonzone??

I have been trying to figure out what is going on and have queried the renamed (staging) table as well as the new table and here is some interesting findings:

If I use a regular non-unique index on AbcID my script runs completely and the final Insert statement shows a result of the correct number of rows inserted (same number of rows that exist in the renamed staging table). SSMS also shows the same correct number of rows when you right-click the table, select Properties then Storage and look at Row Count. And the query, SELECT COUNT(*) FROM LargeTable, also returns the correct count of total rows.

But when I query the new table that I inserted into for unique AbcID's and counts for those records where the AbcID is duplicated the results show over a half million rows where the AbcID is duplicated - EVEN THOUGH THE ORIGINAL TABLE (LargeTable_PartitionStaging) HAS NO DUPLICATE AbcID's:

--results in zero/zip/zilch/nada duplicate AbcID's

SELECT COUNT(*), isnull(AbcID, '') as AbcID

FROM LargeTable_PartitionStaging

GROUP BY AbcID

HAVING COUNT(*) > 1

ORDER BY AbcID

--results in over 500k duplicate AbcID's,

--even though the table was populated by inserting from the Staging table that has no duplicates

SELECT COUNT(*), isnull(AbcID, '') as AbcID

FROM LargeTable

GROUP BY AbcID

HAVING COUNT(*) > 1

ORDER BY AbcID

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 11

Oh and I ran DBCC CHECKDB on the database before and after the table renaming and new table creations and there are no problems showing in the database.

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 12

It may be one of those "odd" things that happen behind the scenes when you do something odd. I've never seen anyone create the table on the partition scheme and then try to add a clustered index afterwards. I've always seen them create the table and then add the clustered index on the partition scheme.

At this point, I'd give that a try to see if it works any better. If not, we'll take another whack at it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

todd 87764 SSC Enthusiast Points: 173 More actions · Answer 13

Ok this is really driving me crazy. I followed your suggestion but have the same problem with duplicates.

If I make a nonunique index on the ID column the total record count in the old table and the new table match, and the partitions each hold the correct AMOUNT of records, BUT the new table has duplicate IDs.

There are no nulls and there is no beginning or trailing spaces in the nchar(14) AbcId column. I can Select * Into AnewTable with no problem and I can rename the table and create a new one and copy the data from the renamed table to the new table without using any partitions and there are no duplicates. Something with the partition is causing the duplicates. Its like some of the records are having the AbcId overwritten when copying into the new table with the partitioned clustered index.

Any more ideas?

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 14

todd 87764 (3/25/2014)
Ok this is really driving me crazy. I followed your suggestion but have the same problem with duplicates.
If I make a nonunique index on the ID column the total record count in the old table and the new table match, and the partitions each hold the correct AMOUNT of records, BUT the new table has duplicate IDs.
There are no nulls and there is no beginning or trailing spaces in the nchar(14) AbcId column. I can Select * Into AnewTable with no problem and I can rename the table and create a new one and copy the data from the renamed table to the new table without using any partitions and there are no duplicates. Something with the partition is causing the duplicates. Its like some of the records are having the AbcId overwritten when copying into the new table with the partitioned clustered index.
Any more ideas?

Dunno... it may be the sequence of how you built things which is different than what I would have done. I'll try to take a look tonight.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)