March 20, 2014 at 2:46 pm
I have a large, wide table (lots of columns and currently 300M rows but could grow to over a billion and beyond). The table has a primary key on an ID column, let's say AbcID, which is a nchar(14) data type. I am trying to partition the table on a date file (AbcDate), which I'm able to do but I'm having trouble reestablishing the primary key.
Here are the steps I am using, all of which work except for the last step because of a primary key violation error:
1. Add a new filegroup, DatabaseArchivePartition.
2. Add a new data file, DatabaseArchive.NDF, to the new archive partition filegroup.
3. Create the partition function, LargeTable_PartitionFunction, that will be used to split out the archive data from the LargeTable.
4. Create the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.
5. Rename the table LargeTable to LargeTable_PartitionStaging, from which to load the new partitioned table.
6. Create the new LargeTable table, specifying how to partition data.
7. Add Clustered Index on AbcDate.
8. Add Primary Key on AbcID (this column "IS" unique).
9. Copy data from Staging table into the new partitioned LargeTable table.
If I create a non-unique index on the AbcID column everything works fine, but if it is unique then the primary key violation error occurs. I have double and triple checked the data and the AbcID is truly unique.
What am I doing wrong?
March 20, 2014 at 3:43 pm
Double check your data before applying the primary key
SELECT COUNT(*), AbcID
FROM StagingTable
GROUP BY AbcID
HAVING COUNT(*) > 1
Verify there are no nulls also. Primary Key cannot be null.
Wes
(A solid design is always preferable to a creative workaround)
March 20, 2014 at 4:16 pm
It has to be either duplicate data or NULL values. With a unique index you can have a single NULL value. As was stated, with a primary key, you can't have any.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 20, 2014 at 4:51 pm
Already did this, as I said I triple checked and there are NO duplicate IDs and there are no nulls.
March 20, 2014 at 5:16 pm
Here is the code, although the names have been changed to protect the innocent:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
USE master
go
PRINT 'Adding filegroup, DatabaseArchivePartition, for archive partition data.'
ALTER DATABASE MyDatabase
ADD FILEGROUP DatabaseArchivePartition; -- separate filegroup to hold the archive table partitions
GO
PRINT 'Adding new data file, MyDatabaseArchive.NDF, to the new archive partition filegroup.'
ALTER DATABASE MyDatabase
ADD FILE
( NAME = MyDatabaseArchive
,FILENAME = 'H:\Databases\MyDatabaseArchive.NDF'
,SIZE = 10240 MB
,FILEGROWTH = 10%
)
TO FILEGROUP DatabaseArchivePartition;
GO
USE MyDatabase
GO
PRINT 'Creating the partition function that will be used to split out the archive data from the student_attendance table.'
--drop PARTITION FUNCTION LargeTable_PartitionFunction
CREATE PARTITION FUNCTION LargeTable_PartitionFunction
(DATETIME)
AS RANGE LEFT FOR VALUES
('2013/08/01 00:00:00.000');
GO
PRINT 'Creating the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.'
--drop PARTITION SCHEME LargeTable_PartitionScheme
CREATE PARTITION SCHEME LargeTable_PartitionScheme
AS PARTITION LargeTable_PartitionFunction
TO ([PRIMARY], DatabaseArchivePartition);
GO
PRINT 'Renaming table LargeTable to LargeTable_PartitionStaging (partition staging, from which we will load the new table).'
-- rename the LargeTable table
EXEC sp_rename 'dbo.LargeTable', 'LargeTable_PartitionStaging';
go
PRINT 'Creating new LargeTable table, specifying how to partition data.'
CREATE TABLE [dbo].LargeTable
(
AbcID [nchar](14) NOT NULL
,SomeColumn varchar(20) NULL
,SomeOtherColumn varchar(50) NULL
,AbcDate [datetime] NOT NULL --...dozens of other columns
)
ON LargeTable_PartitionScheme(AbcDate);
GO
PRINT 'Adding Clustered Index on AbcDate.'
CREATE CLUSTERED INDEX CX_LargeTable_AbcDate
ON LargeTable(AbcDate)
ON [PRIMARY];
GO
PRINT 'Adding Primary Key on AbcID'
ALTER TABLE [dbo].Largetable
ADD CONSTRAINT PK_LargeTable_AbcID PRIMARY KEY NONCLUSTERED (AbcID)
ON [PRIMARY];
GO
/* the following will work...
PRINT 'Adding indexes on AbcID and AbcDate.'
CREATE INDEX IX_LargeTable_AbcDate ON LargeTable(AbcDate) ON [PRIMARY];
go
CREATE INDEX IX_LargeTable_AbcID ON LargeTable(AbcID) ON [PRIMARY];
go
*/
PRINT 'Copying data from Staging table into the new partitioned LargeTable table.' + CHAR(13)
INSERT INTO LargeTable SELECT * FROM LargeTable_PartitionStaging
go
--check partition counts
SELECT * FROM sys.partitions WHERE OBJECT_NAME(OBJECT_ID)='LargeTable';
GO
SELECT $PARTITION.LargeTable_PartitionFunction(AbcDate) AS PARTITIONID,
COUNT(* ) AS ROW_COUNT
FROM dbo.LargeTable
GROUP BY $PARTITION.LargeTable_PartitionFunction(AbcDate)
ORDER BY PARTITIONID;
GO
March 20, 2014 at 5:17 pm
Let me ask this... Does the process of partitioning a table add the partition key to the primary key? This is the only thing I can think of that might be happening that would cause there to be duplicate primary keys.
March 20, 2014 at 7:41 pm
todd 87764 (3/20/2014)
Let me ask this... Does the process of partitioning a table add the partition key to the primary key? This is the only thing I can think of that might be happening that would cause there to be duplicate primary keys.
The partitioning column is added to all unique indexes including PKs if the indexes are aligned to the data (you lose the ability to SWITCH in and out if they aren't). Even if the entire partitioning column had only 1 unique value (a made up scenario, for sure), adding it to what used to be a unique column would still create all unique values rather than duplicates.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 20, 2014 at 8:26 pm
todd 87764 (3/20/2014)
Already did this, as I said I triple checked and there are NO duplicate IDs and there are no nulls.
Does the original table have a PK or UNIQUE index on the ABCID column?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 20, 2014 at 8:40 pm
Yes, the original table has a primary key on the same AbcID column. I added a step to drop the primary key constraint before renaming the table to staging but still get the primary key violation when copying the data from staging to the new table. I'm baffled.
March 20, 2014 at 8:40 pm
todd 87764 (3/20/2014)
I have a large, wide table (lots of columns and currently 300M rows but could grow to over a billion and beyond). The table has a primary key on an ID column, let's say AbcID, which is a nchar(14) data type. I am trying to partition the table on a date file (AbcDate), which I'm able to do but I'm having trouble reestablishing the primary key.Here are the steps I am using, all of which work except for the last step because of a primary key violation error:
1. Add a new filegroup, DatabaseArchivePartition.
2. Add a new data file, DatabaseArchive.NDF, to the new archive partition filegroup.
3. Create the partition function, LargeTable_PartitionFunction, that will be used to split out the archive data from the LargeTable.
4. Create the partition scheme, LargeTable_PartitionScheme, to specify where the archive data will be stored.
5. Rename the table LargeTable to LargeTable_PartitionStaging, from which to load the new partitioned table.
6. Create the new LargeTable table, specifying how to partition data.
7. Add Clustered Index on AbcDate.
8. Add Primary Key on AbcID (this column "IS" unique).
9. Copy data from Staging table into the new partitioned LargeTable table.
If I create a non-unique index on the AbcID column everything works fine, but if it is unique then the primary key violation error occurs. I have double and triple checked the data and the AbcID is truly unique.
What am I doing wrong?
March 21, 2014 at 12:47 pm
Not sure what that post was all about, canonzone??
I have been trying to figure out what is going on and have queried the renamed (staging) table as well as the new table and here is some interesting findings:
If I use a regular non-unique index on AbcID my script runs completely and the final Insert statement shows a result of the correct number of rows inserted (same number of rows that exist in the renamed staging table). SSMS also shows the same correct number of rows when you right-click the table, select Properties then Storage and look at Row Count. And the query, SELECT COUNT(*) FROM LargeTable, also returns the correct count of total rows.
But when I query the new table that I inserted into for unique AbcID's and counts for those records where the AbcID is duplicated the results show over a half million rows where the AbcID is duplicated - EVEN THOUGH THE ORIGINAL TABLE (LargeTable_PartitionStaging) HAS NO DUPLICATE AbcID's:
--results in zero/zip/zilch/nada duplicate AbcID's
SELECT COUNT(*), isnull(AbcID, '') as AbcID
FROM LargeTable_PartitionStaging
GROUP BY AbcID
HAVING COUNT(*) > 1
ORDER BY AbcID
--results in over 500k duplicate AbcID's,
--even though the table was populated by inserting from the Staging table that has no duplicates
SELECT COUNT(*), isnull(AbcID, '') as AbcID
FROM LargeTable
GROUP BY AbcID
HAVING COUNT(*) > 1
ORDER BY AbcID
March 21, 2014 at 12:51 pm
Oh and I ran DBCC CHECKDB on the database before and after the table renaming and new table creations and there are no problems showing in the database.
March 22, 2014 at 10:07 pm
It may be one of those "odd" things that happen behind the scenes when you do something odd. I've never seen anyone create the table on the partition scheme and then try to add a clustered index afterwards. I've always seen them create the table and then add the clustered index on the partition scheme.
At this point, I'd give that a try to see if it works any better. If not, we'll take another whack at it.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 25, 2014 at 12:14 pm
Ok this is really driving me crazy. I followed your suggestion but have the same problem with duplicates.
If I make a nonunique index on the ID column the total record count in the old table and the new table match, and the partitions each hold the correct AMOUNT of records, BUT the new table has duplicate IDs.
There are no nulls and there is no beginning or trailing spaces in the nchar(14) AbcId column. I can Select * Into AnewTable with no problem and I can rename the table and create a new one and copy the data from the renamed table to the new table without using any partitions and there are no duplicates. Something with the partition is causing the duplicates. Its like some of the records are having the AbcId overwritten when copying into the new table with the partitioned clustered index.
Any more ideas?
March 25, 2014 at 3:02 pm
todd 87764 (3/25/2014)
Ok this is really driving me crazy. I followed your suggestion but have the same problem with duplicates.If I make a nonunique index on the ID column the total record count in the old table and the new table match, and the partitions each hold the correct AMOUNT of records, BUT the new table has duplicate IDs.
There are no nulls and there is no beginning or trailing spaces in the nchar(14) AbcId column. I can Select * Into AnewTable with no problem and I can rename the table and create a new one and copy the data from the renamed table to the new table without using any partitions and there are no duplicates. Something with the partition is causing the duplicates. Its like some of the records are having the AbcId overwritten when copying into the new table with the partitioned clustered index.
Any more ideas?
Dunno... it may be the sequence of how you built things which is different than what I would have done. I'll try to take a look tonight.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply