Primary Key Violation

  • I have a large, wide table (lots of columns and currently 300M rows but could grow to over a billion and beyond). The table has a primary key on an ID column, let's say AbcID, which is a nchar(14) data type. I am trying to partition the table on a date file (AbcDate), which I'm able to do but I'm having trouble reestablishing the primary key.

    Here are the steps I am using, all of which work except for the last step because of a primary key violation error:

    1. Add a new filegroup, DatabaseArchivePartition.

    2. Add a new data file, DatabaseArchive.NDF, to the new archive partition filegroup.

    3. Create the partition function, LargeTable_PartitionFunction, that will be used to split out the archive data from the LargeTable.

    4. Create the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.

    5. Rename the table LargeTable to LargeTable_PartitionStaging, from which to load the new partitioned table.

    6. Create the new LargeTable table, specifying how to partition data.

    7. Add Clustered Index on AbcDate.

    8. Add Primary Key on AbcID (this column "IS" unique).

    9. Copy data from Staging table into the new partitioned LargeTable table.

    If I create a non-unique index on the AbcID column everything works fine, but if it is unique then the primary key violation error occurs. I have double and triple checked the data and the AbcID is truly unique.

    What am I doing wrong?

  • Double check your data before applying the primary key

    SELECT COUNT(*), AbcID

    FROM StagingTable

    GROUP BY AbcID

    HAVING COUNT(*) > 1

    Verify there are no nulls also. Primary Key cannot be null.

    Wes
    (A solid design is always preferable to a creative workaround)

  • It has to be either duplicate data or NULL values. With a unique index you can have a single NULL value. As was stated, with a primary key, you can't have any.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Already did this, as I said I triple checked and there are NO duplicate IDs and there are no nulls.

  • Here is the code, although the names have been changed to protect the innocent:

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    USE master

    go

    PRINT 'Adding filegroup, DatabaseArchivePartition, for archive partition data.'

    ALTER DATABASE MyDatabase

    ADD FILEGROUP DatabaseArchivePartition; -- separate filegroup to hold the archive table partitions

    GO

    PRINT 'Adding new data file, MyDatabaseArchive.NDF, to the new archive partition filegroup.'

    ALTER DATABASE MyDatabase

    ADD FILE

    ( NAME = MyDatabaseArchive

    ,FILENAME = 'H:\Databases\MyDatabaseArchive.NDF'

    ,SIZE = 10240 MB

    ,FILEGROWTH = 10%

    )

    TO FILEGROUP DatabaseArchivePartition;

    GO

    USE MyDatabase

    GO

    PRINT 'Creating the partition function that will be used to split out the archive data from the student_attendance table.'

    --drop PARTITION FUNCTION LargeTable_PartitionFunction

    CREATE PARTITION FUNCTION LargeTable_PartitionFunction

    (DATETIME)

    AS RANGE LEFT FOR VALUES

    ('2013/08/01 00:00:00.000');

    GO

    PRINT 'Creating the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.'

    --drop PARTITION SCHEME LargeTable_PartitionScheme

    CREATE PARTITION SCHEME LargeTable_PartitionScheme

    AS PARTITION LargeTable_PartitionFunction

    TO ([PRIMARY], DatabaseArchivePartition);

    GO

    PRINT 'Renaming table LargeTable to LargeTable_PartitionStaging (partition staging, from which we will load the new table).'

    -- rename the LargeTable table

    EXEC sp_rename 'dbo.LargeTable', 'LargeTable_PartitionStaging';

    go

    PRINT 'Creating new LargeTable table, specifying how to partition data.'

    CREATE TABLE [dbo].LargeTable

    (

    AbcID [nchar](14) NOT NULL

    ,SomeColumn varchar(20) NULL

    ,SomeOtherColumn varchar(50) NULL

    ,AbcDate [datetime] NOT NULL --...dozens of other columns

    )

    ON LargeTable_PartitionScheme(AbcDate);

    GO

    PRINT 'Adding Clustered Index on AbcDate.'

    CREATE CLUSTERED INDEX CX_LargeTable_AbcDate

    ON LargeTable(AbcDate)

    ON [PRIMARY];

    GO

    PRINT 'Adding Primary Key on AbcID'

    ALTER TABLE [dbo].Largetable

    ADD CONSTRAINT PK_LargeTable_AbcID PRIMARY KEY NONCLUSTERED (AbcID)

    ON [PRIMARY];

    GO

    /* the following will work...

    PRINT 'Adding indexes on AbcID and AbcDate.'

    CREATE INDEX IX_LargeTable_AbcDate ON LargeTable(AbcDate) ON [PRIMARY];

    go

    CREATE INDEX IX_LargeTable_AbcID ON LargeTable(AbcID) ON [PRIMARY];

    go

    */

    PRINT 'Copying data from Staging table into the new partitioned LargeTable table.' + CHAR(13)

    INSERT INTO LargeTable SELECT * FROM LargeTable_PartitionStaging

    go

    --check partition counts

    SELECT * FROM sys.partitions WHERE OBJECT_NAME(OBJECT_ID)='LargeTable';

    GO

    SELECT $PARTITION.LargeTable_PartitionFunction(AbcDate) AS PARTITIONID,

    COUNT(* ) AS ROW_COUNT

    FROM dbo.LargeTable

    GROUP BY $PARTITION.LargeTable_PartitionFunction(AbcDate)

    ORDER BY PARTITIONID;

    GO

  • Let me ask this... Does the process of partitioning a table add the partition key to the primary key? This is the only thing I can think of that might be happening that would cause there to be duplicate primary keys.

  • todd 87764 (3/20/2014)


    Let me ask this... Does the process of partitioning a table add the partition key to the primary key? This is the only thing I can think of that might be happening that would cause there to be duplicate primary keys.

    The partitioning column is added to all unique indexes including PKs if the indexes are aligned to the data (you lose the ability to SWITCH in and out if they aren't). Even if the entire partitioning column had only 1 unique value (a made up scenario, for sure), adding it to what used to be a unique column would still create all unique values rather than duplicates.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • todd 87764 (3/20/2014)


    Already did this, as I said I triple checked and there are NO duplicate IDs and there are no nulls.

    Does the original table have a PK or UNIQUE index on the ABCID column?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, the original table has a primary key on the same AbcID column. I added a step to drop the primary key constraint before renaming the table to staging but still get the primary key violation when copying the data from staging to the new table. I'm baffled.

  • todd 87764 (3/20/2014)


    I have a large, wide table (lots of columns and currently 300M rows but could grow to over a billion and beyond). The table has a primary key on an ID column, let's say AbcID, which is a nchar(14) data type. I am trying to partition the table on a date file (AbcDate), which I'm able to do but I'm having trouble reestablishing the primary key.

    Here are the steps I am using, all of which work except for the last step because of a primary key violation error:

    1. Add a new filegroup, DatabaseArchivePartition.

    2. Add a new data file, DatabaseArchive.NDF, to the new archive partition filegroup.

    3. Create the partition function, LargeTable_PartitionFunction, that will be used to split out the archive data from the LargeTable.

    4. Create the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.

    5. Rename the table LargeTable to LargeTable_PartitionStaging, from which to load the new partitioned table.

    6. Create the new LargeTable table, specifying how to partition data.

    7. Add Clustered Index on AbcDate.

    8. Add Primary Key on AbcID (this column "IS" unique).

    9. Copy data from Staging table into the new partitioned LargeTable table.

    If I create a non-unique index on the AbcID column everything works fine, but if it is unique then the primary key violation error occurs. I have double and triple checked the data and the AbcID is truly unique.

    What am I doing wrong?

  • Not sure what that post was all about, canonzone??

    I have been trying to figure out what is going on and have queried the renamed (staging) table as well as the new table and here is some interesting findings:

    If I use a regular non-unique index on AbcID my script runs completely and the final Insert statement shows a result of the correct number of rows inserted (same number of rows that exist in the renamed staging table). SSMS also shows the same correct number of rows when you right-click the table, select Properties then Storage and look at Row Count. And the query, SELECT COUNT(*) FROM LargeTable, also returns the correct count of total rows.

    But when I query the new table that I inserted into for unique AbcID's and counts for those records where the AbcID is duplicated the results show over a half million rows where the AbcID is duplicated - EVEN THOUGH THE ORIGINAL TABLE (LargeTable_PartitionStaging) HAS NO DUPLICATE AbcID's:

    --results in zero/zip/zilch/nada duplicate AbcID's

    SELECT COUNT(*), isnull(AbcID, '') as AbcID

    FROM LargeTable_PartitionStaging

    GROUP BY AbcID

    HAVING COUNT(*) > 1

    ORDER BY AbcID

    --results in over 500k duplicate AbcID's,

    --even though the table was populated by inserting from the Staging table that has no duplicates

    SELECT COUNT(*), isnull(AbcID, '') as AbcID

    FROM LargeTable

    GROUP BY AbcID

    HAVING COUNT(*) > 1

    ORDER BY AbcID

  • Oh and I ran DBCC CHECKDB on the database before and after the table renaming and new table creations and there are no problems showing in the database.

  • It may be one of those "odd" things that happen behind the scenes when you do something odd. I've never seen anyone create the table on the partition scheme and then try to add a clustered index afterwards. I've always seen them create the table and then add the clustered index on the partition scheme.

    At this point, I'd give that a try to see if it works any better. If not, we'll take another whack at it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok this is really driving me crazy. I followed your suggestion but have the same problem with duplicates.

    If I make a nonunique index on the ID column the total record count in the old table and the new table match, and the partitions each hold the correct AMOUNT of records, BUT the new table has duplicate IDs.

    There are no nulls and there is no beginning or trailing spaces in the nchar(14) AbcId column. I can Select * Into AnewTable with no problem and I can rename the table and create a new one and copy the data from the renamed table to the new table without using any partitions and there are no duplicates. Something with the partition is causing the duplicates. Its like some of the records are having the AbcId overwritten when copying into the new table with the partitioned clustered index.

    Any more ideas?

  • todd 87764 (3/25/2014)


    Ok this is really driving me crazy. I followed your suggestion but have the same problem with duplicates.

    If I make a nonunique index on the ID column the total record count in the old table and the new table match, and the partitions each hold the correct AMOUNT of records, BUT the new table has duplicate IDs.

    There are no nulls and there is no beginning or trailing spaces in the nchar(14) AbcId column. I can Select * Into AnewTable with no problem and I can rename the table and create a new one and copy the data from the renamed table to the new table without using any partitions and there are no duplicates. Something with the partition is causing the duplicates. Its like some of the records are having the AbcId overwritten when copying into the new table with the partitioned clustered index.

    Any more ideas?

    Dunno... it may be the sequence of how you built things which is different than what I would have done. I'll try to take a look tonight.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply