May 7, 2008 at 6:57 am
Hi there ...here comes a tricky one.
I have a database table which needs to make the Index "ParentREF, UniqueName" unique - but this fails because duplicate keys are found. Thus I now need to cleanup these duplicate rows - but I cannot just delete the duplicates, because they might have rows in detail tables.
This means that all duplicate rows needs an update on the "UniqueName" value - but not the first (valid) one!
I can find those rows by
SELECT OID, UniqueName, ParentREF, CreatedUTC, ModifiedUTC FROM dbo.CmsContent AS table0
WHERE EXISTS (
SELECT OID, UniqueName, ParentREF FROM dbo.CmsContent AS table1
WHERE table0.ParentREF = table1.ParentREF
AND table0.UniqueName = table1.UniqueName
AND table0.OID != table1.OID
)
ORDER BY ParentREF, UniqueName, ModifiedUTC desc
...but I struggle to make the required SQL (SP?) to update the "invalid" rows.
Note: the "valid" row is the one with the newest ModifiedUTC value - this row must kept unchanged!
ATM the preferred (cause easiest) way is to rename the invalid rows with
UniqueName = OID
because if I use any other name I risk to create another double entry.
Thanks in advance to whoever can help me
Heiko
May 7, 2008 at 7:27 am
I think this code will do what you need:
[font="Courier New"];WITH cteInvalidRecords AS
(
SELECT
UniqueName,
ParentRef,
MIN(ModifiedUTC) AS ModfiedUTC
FROM
dbo.CmsContent
GROUP BY
UniqueName,
ParentREF
HAVING
COUNT(OID) > 1
)
UPDATE CC
SET UniqueName = CC.OID
FROM
dbo.CmcContent CC JOIN
cteInvalidRecords IR ON
CC.UniqueName = IR.UniqueName AND
CC.ParentREF = IR.ParentREF AND
CC.ModifiedUTC = IR.ModifiedUTC[/font]
The CTE in the header gets all the records with dupes and the earliest ModifiedUTC for each record then I join the CTE with the table to update the Invalid records. The only issue is if you have NULLS in ModifiedUTC. You may want to put your dupes in a table variable or temp table with a default value for Null ModifiedUTC records.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 7, 2008 at 10:07 am
Hi Jack
thanks for the fast response 🙂
It works ... only 1 question ... it does not do all matches in 1 run?!
I had 1936 matches and your method required a couple of runs until it reached 0
:: affected rows ::
- 563
- 43x
- 3xx
- 1xx
..
- 3
- 1
- 0
Do you know why??
If it has to be that way, then I would need that method in a 'while loop' until it reaches 0
**EDIT: I assume this is because there are a certain number of rows with 3 duplicates .and 4 duplicates etc. ?! thus need additional runs to find those?!
thanks in advance
Heiko
May 7, 2008 at 10:14 am
The only reason this would take more than 1 pass is that there are more than 2 records for some of the UniqueName and ParentREF combinations.
You could reverse the logic in the CTE and get the valid rows and delete the non-valid rows using a join where
name <> name
parentRef <> parentRef
date <> date
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 7, 2008 at 10:23 am
Thanks again, Jack 🙂
Hm, I dont follow :satisfied:
Can you show it as SQL what you mean?!
*remind that I cannot delete
best regards
Heiko
May 8, 2008 at 4:29 am
Hi Jack
One more problem ... in some rare cases ther can exist such rows with identical ModifiedUTC.
That would result in renaming all (and therefore not left 1 "valid" we wanted)
Do you have another idea how to solve that?!
Here is what we do now (first we run the "turnedaround" method to find and rename all "invalid" rows) . then for the case that there were some with identical ModifiedUTC (and thus being left in the Table as double entries) we run it with MIN (your suggest) again to elmininate those as well.
The only issue is as described above, that we "loose" all rows for these and dont keep 1 valid.
BEGIN
;WITH cteInvalidRecords AS
(
SELECT
UniqueName,
ParentRef,
MAX(ModifiedUTC) AS ModifiedUTC
FROM
dbo.CmsContent
GROUP BY
UniqueName,
ParentREF
HAVING
COUNT(OID) > 1
)
UPDATE CC
SET UniqueName = CC.OID
FROM
dbo.CmsContent CC JOIN
cteInvalidRecords IR ON
CC.UniqueName = IR.UniqueName AND
CC.ParentREF = IR.ParentREF AND
CC.ModifiedUTC <> IR.ModifiedUTC
END
BEGIN
;WITH cteInvalidRecords AS
(
SELECT
UniqueName,
ParentRef,
MIN(ModifiedUTC) AS ModifiedUTC
FROM
dbo.CmsContent
GROUP BY
UniqueName,
ParentREF
HAVING
COUNT(OID) > 1
)
UPDATE CC
SET UniqueName = CC.OID
FROM
dbo.CmsContent CC JOIN
cteInvalidRecords IR ON
CC.UniqueName = IR.UniqueName AND
CC.ParentREF = IR.ParentREF AND
CC.ModifiedUTC = IR.ModifiedUTC
END
kind regards
Heiko
May 9, 2008 at 6:23 am
All right. Does it matter which one of the records with duplicate ModifiedUTC, you keep? Here's an idea:
[font="Courier New"];WITH cteDuplicateValidRecords AS -- get all the records with more than 1 match and keep the first 1?(Min(OID)
( --assuming OID is an uincrementing numeric
SELECT
MIN(OID) AS OID,
UniqueName,
ParentRef
MAX(ModifiedUTC)
FROM
dbo.CmsContent CC ON
VR.UniqueName = CC.UniqueName AND
VR.ParentRef = CC.ParentRef AND
VR.ModifiedUTC = CC.ModifiedUTC
GROUP BY
UniqueName,
ParentRef
HAVING
COUNT(OID > 1
),
cteInvalidRecords AS -- Find all the non-matching OID's
(
SELECT -- this will get all the duplicates that do not have matching ModifiedUTC
CC.UniqueName,
CC.ParentRef,
MAX(CC.ModifiedUTC) AS ModfiedUTC
FROM
dbo.CmsContent CC JOIN
cteDuplicateValidRecords DVR ON
CC.OID <> DVR.OID
GROUP BY
CC.UniqueName,
CC.ParentREF
HAVING
COUNT(CC.OID) > 1
UNION -- will remove duplicates since not UNION ALL
SELECT -- this gets the records with matching ModifiedUTC
CC.UniqueName,
CC.ParentRef,
MAX(CC.ModifiedUTC) AS ModfiedUTC
FROM
dbo.CmsContent CC JOIN
cteDuplicateValidRecords DVR ON
CC.OID <> DVR.OID
GROUP BY
CC.UniqueName,
CC.ParentREF
)
UPDATE CC
SET UniqueName = CC.OID
FROM
dbo.CmcContent CC JOIN
cteInvalidRecords IR ON
CC.UniqueName = IR.UniqueName AND
CC.ParentREF = IR.ParentREF AND
CC.ModifiedUTC = IR.ModifiedUTC
[/font]
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 9, 2008 at 8:48 am
Thanks Jack
Yes which one of the Multiple rows with same ModifiedUtc doesnt matter then.
But OID is an Guid field (i saw your assumption in comment)
...and it doesnt validate:
First it complains about the ( at first MAX
ParentRef
MAX(ModifiedUTC)
obviously because it misses the , behind the ParentRef.
But then it says "Wrong syntax near ON in line 9 ... but I dont see why... the only thing I see is that the "VR" seems to be nowhere "declared"?! Hmm ....
thanks in advance
Heiko
May 9, 2008 at 1:41 pm
My bad, remove from the ON to the GROUP BY in the first CTE and correct the other issues you have found. Just bad typing and deleting by me.:D
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 12, 2008 at 4:16 am
Hi Jack.
Okay 🙂 ....
unfortunately it doesnt work because OID is an Guid, and not an integer.
Any alternate idea?!
Thanks in advance
Heiko
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply