October 14, 2010 at 8:55 am
Hi,
I have a problem which I cannot solve.
I have inherited an SSIS package which has a bug in it. The package accepts data from an Excel file, and then inserts it into a database.
The bug is - when the string 'MC²' is inserted, after 'MC2' then some of the data becomes corrupted. The SSIS package then breaks at some distance away from where the data described above is inserted. This problem is obviously caused by a Collation conflict, I am quite sure of this. The correlation of the database is Latin1_General_CI_AS, and I need to apply Latin1_General_CS_AS at the appropriate point. Hope this makes sense so far.
The problem is the package is quite complex and breaks at an insert (OLE DB Destination block) right at the end. The insert gives the following error:
"Violation of PRIMARY KEY constraint 'PK_LiveDataInstance'. Cannot insert duplicate key in object 'dbo.LiveItemInstance'."
However when I examine the data the insert process is NOT attempting to insert a duplicate primary key. The data table creation script is:
CREATE TABLE [dbo].[LiveItemInstance](
[PK_LiveItemInstanceID] [uniqueidentifier] ROWGUIDCOL NOT NULL CONSTRAINT [DF__LiveDataInstance__rowgui__31EC6D26] DEFAULT (newid()),
[PK_ID] [int] IDENTITY(1,1) NOT NULL,
[FK_LiveItemAbstract_Id] [uniqueidentifier] NULL,
[FK_Request_Id] [uniqueidentifier] NOT NULL,
[OverrideTX] [nvarchar](255) NULL,
[OverrideRX] [nvarchar](255) NULL,
[LidArrayPosition] [int] NOT NULL,
[LidBitPosition] [int] NOT NULL,
[DataPos] [smallint] NOT NULL,
[DataSize] [smallint] NOT NULL,
[DataMask] [int] NOT NULL,
[isDigital] [bit] NOT NULL,
[isString] [bit] NOT NULL,
[isSigned] [bit] NOT NULL,
[isLittleEndian] [bit] NOT NULL,
[SimpleScale] [real] NOT NULL,
[SimpleOffset] [real] NOT NULL,
[DecimalPlaces] [int] NOT NULL,
[FK_Scaling_Id] [uniqueidentifier] NULL,
[FK_LiveItemMap_Id] [uniqueidentifier] NULL,
[FK_ReleaseStatus_Id] [uniqueidentifier] NOT NULL,
[IsTombStone] [bit] NOT NULL,
CONSTRAINT [PK_LiveDataInstance] PRIMARY KEY CLUSTERED
(
[PK_LiveItemInstanceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = /* ON */ OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
The constraint named PK_LiveItemInstanceID is causing the conflict.
If I allow the inserting of duplicate keys (commented out above) then the data goes in OK. The system using the database does not work with this change however.
So, can someone suggest why the constraint is not being met ?? I am thinking it is one of the Foreign keys is the problem. I have found some suspicious duplicate data, but I am not sure if this is the problem or not.
Any ideas would be very welcome, as I have been stuck on this for a few days. I can supply the data with the duplicate key constraint removed.
Many Thanks in advance,
Baxter.
October 15, 2010 at 1:31 am
Just a long shot, but you didn't accidentely map one of your input columns with the [PK_LiveItemInstanceID] column?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 15, 2010 at 3:15 am
Not directly, only through foreign key constraints.
That being the case I would expect the error to be a foreign key constraint problem, not a primary key constraint problem.
October 15, 2010 at 3:51 am
is the table empty when you load the data?
and are you 100% certain the data does not have duplicates.
I would put some error handling on the destination to send the invalid row to a text file and then see what exact row of data is causing the problem and work back from there.
October 15, 2010 at 4:07 am
Hi,
Thanks for the interest.
I've done all that. The database is populated before it fails.
I have included a copy of the data in Excel form. Definately no duplicates in the primary key column. There are some in the foreign key columns, I suspect PK_LiveItemAbstractID is the column causing the problem.
October 15, 2010 at 4:11 am
what happens when you add error handling to the destination?
October 15, 2010 at 4:15 am
That's the Error Output section in the Excel file.
The package runs fine with the error handling in it, but the consuming C# app does not display any data, so I guess the error it not a red herring.
October 15, 2010 at 4:27 am
If you try to manually insert one of the error rows into the table via t-sql does it give any more specific errors?
are there any triggers on the table?
you could try and run a trace on the database to see what exactly SSIS is passing it.
October 15, 2010 at 4:37 am
I'll give your suggestions a try.
I'll be back !!
October 15, 2010 at 10:36 am
Have you tried adding a data viewer to your package so you can watch the data populate in BIDs and see exactly when the error occurs and on what row?
When I have problem packages like this, I run them in Debug mode from BIDs so I can watch everything as it goes.
October 18, 2010 at 2:20 am
Hi,
Thanks for the interest. The attached Excel doc has an error output, and the affending rows are there.
As can be seen, the Primary key column (newGiud) does not have any duplicates when compared to the data above. The problem is very consistant, with each run of the package producing the same error output, but each time the new primary key column data is random GIUDs, and each one gets rejected as a duplicate key.
I have now decided to attack the problem where the offending data is going into the database, and I have made some progress, the problem seems to be a 'merge join' component needing to be given a different collation property (Latin1_General_CS_AS) than the default database collation property (Latin1_General_CI_AS). Unfortunately I do not think this is possible, so a major re-design may be the only solution. I may have to use pure SQL tasks to do the job correctly.
I'll keep you all posted on my progress.
October 18, 2010 at 3:03 am
Hi,
Thanks for the interest. The attached Excel doc has an error output, and the affending rows are there.
As can be seen, the Primary key column (newGiud) does not have any duplicates when compared to the data above. The problem is very consistant, with each run of the package producing the same error output, but each time the new primary key column data is random GIUDs, and each one gets rejected as a duplicate key.
I have now decided to attack the problem where the offending data is going into the database, and I have made some progress, the problem seems to be a 'merge join' component needing to be given a different collation property (Latin1_General_CS_AS) than the default database collation property (Latin1_General_CI_AS). Unfortunately I do not think this is possible, so a major re-design may be the only solution. I may have to use pure SQL tasks to do the job correctly.
I'll keep you all posted on my progress.
October 18, 2010 at 4:05 am
I don't see an attached Excel file.
EDIT: Nevermind. I thought you meant attached to current posts. Not your previously attached file.
October 18, 2010 at 4:11 am
I've looked at your file and I notice you have a First Pass and a Second Pass. Can you explain these better please?
Because when I sort that file by GUID, without changing anything in the Excel sheet, there is at least 1 duplicate GUID that exists in both passes.
October 18, 2010 at 8:17 am
Hi,
Thanks for the interest.
Please read the first post, in that I attempt to explain what the problem is.
It's pretty complicated, but my database has the wrong collation property set on it, which causes superscript 2's and normal 2s to be seen as the same data. They are 128 decimal apart in the extended ASCII table.
The problem only occurs when strings (NVarChar(4000)) have the superscript 2s and normal 2s in the same position, as here:
M/C2
M/C²
The first pass data contains one of the two examples above, and the second pass contins the other. They are seen as the same at some point in the SSIS package, and with a collation of Latin1_General_CI_AS applied to a select statement they are seen as the same by the database engine.
I have worked out that applying a collation of Latin1_General_CS_AS fixes the problem with a select statement.
So I am attempting to find the point a which the new collation needs to be applied, and as the package is very large this is taking some time.
However this does not explain the Primary Key constraint error, the column which is the prrimary key column is NewGiud, this value is mapped to PK_LiveItemInstanceID which is the primary key as defined in the database creation script in the first post. I have searched for duplicates in this column, are there are none.
The way I understand it the data should go into the database as long as the primary keys aren't duplicated. However in this case that is not being acheived.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply