April 4, 2018 at 7:16 am
Hi, we have a flat table in SQL server that has no PK field and no constraints. The table is manually updated currently.
I have set up an SSIS package to automatically update the table however I now need to stop duplicate records.
I have a copy of the table with a clustered PK using 3 fields, 2 that exist and 1 new field.
If I try to add this PK to the live table, the null values that will be present in historic records for the new field will not be allowed.
The PK looks like this:
Ref Item_ID Traveller
12345 1 Dave
12345 2 John
12345 3 Steve
23456 1 Aaron
23456 2 Bill
45678 1 Graham
56789 1 Walter
The new field "Item_ID" is an int data type so my thought is to loop through the existing records using "ref" and add a sequential int to the "Item_ID" field starting again at 1 when a new ref arises.
Can anyone point me in the direction of how to do this?
I would rather avoid cursors if possible.
Dave
April 4, 2018 at 7:24 am
Check out the ROW_NUMBER() function in BOL
April 4, 2018 at 7:41 am
Don't confuse a clustered index with a primary key and further, if there are business constraints, those should be mirrored as constraints in the database.
😎
To assist you with this problem, we need more information, i.e. what kind of duplication!
Can you please post the DDL (create table) scripts of both the tables and some sample data which includes some duplication examples as an insert statement.
April 4, 2018 at 9:46 am
Hi sorry security prohibits me sharing anything further I am afraid. I appreciate that without it, you will not be able to assist. Thanks for your time.
Dave
April 4, 2018 at 11:10 am
;WITH cte_assign_Item_ID_values AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Ref ORDER BY Traveller) AS row_num
FROM dbo.table_name
)
UPDATE cte_assign_Item_ID_values
SET Item_ID = row_num
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 5, 2018 at 1:21 am
ScottPletcher - Wednesday, April 4, 2018 11:10 AM
;WITH cte_assign_Item_ID_values AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Ref ORDER BY Traveller) AS row_num
FROM dbo.table_name
)
UPDATE cte_assign_Item_ID_values
SET Item_ID = row_num
Absolutely Perfect!!
I can't thank you enough.
Dave
April 6, 2018 at 1:50 pm
What you posted makes no sense. I have never heard the term "flat table" in all the years I've worked with SQL. But I do know that by definition, a table must have a key (constraints are optional, but are a damn good idea).You don't know that fields have a special meaning in this language; it looks like you meant columns which are totally different. Rows are nothing like records. Nulls are not values. Basically are telling us you have a deck of punch cards written in SQL instead of it proper schema?
Since you were also too rude to post DDL, we have to make guesses at it. Here's mine.
CREATE TABLE Trips -- meaningful name!
(trip_id CHAR(5) NOT NULL,
trip_seq INTEGER NOT NULL
CHECK (trip_seq BETWEEN 1 AND 4),
PRIMARY KEY (trip_id, trip_seq)
traveler_name VARCHAR(15) NOT NULL);
INSERT INTO Foobar
VALUES
('12345 ', 1 , 'Dave'),
('12345 ', 2 , 'John'),
('12345 ', 3 , 'Steve'),
('23456 ', 1 , 'Aaron'),
('23456 ', 2 , 'Bill'),
('45678 ', 1 , 'Graham'),
('56789 ', 1 , 'Walter');
>> I have set up an SSIS package to automatically update the table however I now need to stop duplicate records [sic]. <<
Keys are how we prevent duplicate rows. Unfortunately, your deck of punch cards if it is missing what I am calling the trip sequence in the DDL, then the lack of the other half of the compound key is a serious design flaw. I don't know if it would make sense. In your case, but I suggest you Google around for some of my articles on the CREATE SEQUENCE statement; it might help. Otherwise, the DDL I posted for you to protect the data integrity, but you still have to do the work of inserting things.
>> I have a copy of the table with a clustered PK using 3 fields [sic] 2 that exist and 1 new field [sic]. <<
This is one of the many many reasons that DDL has been a requirement on SQL forums for over 30 years. You don't add a primary key to a table; it should be an innate part of your data model and you more or less "discover" it while you're creating the table. Looking at the sample data. you gave us, you don't need this third column to have a key. What you seem to have is called a super key, and the travelers name is dependent on the first two columns.
>> If I try to add this PK to the live table, the null values [sic] that will be present in historic records [sic] for the new field [sic] will not be allowed. <<
Again, you're missing basic terms.
>> The new field [sic] "Item_id" is an INTEGER data type so my thought is to loop through the existing records [sic] using "ref" and add a sequential INTEGER to the "Item_id" field starting again at 1 when a new ref arises. <<
No, what you were calling "item_id"
Your choice of data element names was awful. I know it's long, boring and a real pain, but can you spend some time with a book on basic data modeling and learn some of the ISO 11179 naming rules? And yes, avoiding cursors and loops in a declarative language is always preferred to mimicking the behavior of a procedural language scanning a deck of punch cards.
>> I have set up an SSIS package to automatically update the table however I now need to stop duplicate records [sic]. <<Keys are how we prevent duplicate rows. Unfortunately, your deck of punch cards if it is missing what I am calling the trip sequence in the DDL, then the lack of the other half of the compound key is a serious design flaw. I don't know if it would make sense. In your case, but I suggest you Google around for some of my articles on the CREATE SEQUENCE statement; it might help. Otherwise, the DDL I posted for you to protect the data integrity, but you still have to do the work of inserting things.>> I have a copy of the table with a clustered PK using 3 fields [sic] 2 that exist and 1 new field [sic]. <<This is one of the many many reasons that DDL has been a requirement on SQL forums for over 30 years. You don't add a primary key to a table; it should be an innate part of your data model and you more or less "discover" it while you're creating the table. Looking at the sample data. you gave us, you don't need this third column to have a key. What you seem to have is called a super key, and the travelers name is dependent on the first two columns.>> If I try to add this PK to the live table, the null values [sic] that will be present in historic records [sic] for the new field [sic] will not be allowed. <<Again, you're missing basic terms.>> The new field [sic] "Item_id" is an INTEGER data type so my thought is to loop through the existing records [sic] using "ref" and add a sequential INTEGER to the "Item_id" field starting again at 1 when a new ref arises. <<No, what you were calling "item_id"Your choice of data element names was awful. I know it's long, boring and a real pain, but can you spend some time with a book on basic data modeling and learn some of the ISO 11179 naming rules? And yes, avoiding cursors and loops in a declarative language is always preferred to mimicking the behavior of a procedural language scanning a deck of punch cards.
Please post DDL and follow ANSI/ISO standards when asking for help.
April 6, 2018 at 2:02 pm
Well thanks for enlightening me to my character flaws so eloquently and for confirming that I know no more about SQL than a 2 year old.
If you had taken the time to read the thread, you would have seen that ScottPletcher kindly assisted me with a very simple and perfectly working solution.
Perhaps when you decide to respond to a thread on a forum you might consider being slightly less arrogant and whole bunch less rude as it does tend to get peoples backs up a tad. I would also like to thank you for taking the time to post. Best Regards
Dave
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply