January 19, 2010 at 8:58 am
Hello...
Once upon a time, we had done a bulk insert into a table and somehow repeated the same data in the mailing address field without catching it. I am trying to clean these up and I am having a difficult time finding a clean way to look for all the records where this has happened.
For example:
MailingAddress varchar(40)
1016 ROCKY CREEK DRIVE 1016 ROCKY CREEK
1100 CASTLE ROCK VILLAS DRIVE 1100 CASTL
In these examples, you can see where it was somehow inserted twice up to the point where the data was maxed in the field. I need to keep the correct instance and clean up the extra characters. Any ideas?
Thank you!
January 19, 2010 at 9:08 am
This solution assumes that the first "word" (the left characters up to the first space in the address) do not repeat in the actual address. See if this helps.
CREATE TABLE #temp (mailingAddress VARCHAR(40))
INSERT INTO #temp
SELECT '1016 ROCKY CREEK DRIVE 1016 ROCKY CREEK '
UNION ALL
SELECT '1100 CASTLE ROCK VILLAS DRIVE 1100 CASTL'
SELECT RTRIM(LEFT(mailingAddress,CHARINDEX(LEFT(mailingAddress,CHARINDEX(' ',mailingAddress)),mailingAddress,LEN(LEFT(mailingAddress,CHARINDEX(' ',mailingAddress))))-1)), * FROM #temp T
DROP TABLE #temp
January 19, 2010 at 9:18 am
That does seem to clean up my two examples, but what would be the best way to track down all the records in my db that are like this so I can insert them into the temp table for update?
January 19, 2010 at 9:24 am
I would try this on a copy of your table first as I haven't tested this fully with all variations of addresses but couldn't you just update your table in place?
CREATE TABLE #temp (mailingAddress VARCHAR(40))
INSERT INTO #temp
SELECT '1016 ROCKY CREEK DRIVE 1016 ROCKY CREE'
UNION ALL
SELECT '1100 CASTLE ROCK VILLAS DRIVE 1100 CASTL'
UNION ALL
SELECT 'THIS ADDRESS SHOULD NOT BE CHANGED'
UPDATE #temp
SET mailingAddress = RTRIM(LEFT(mailingAddress,CHARINDEX(LEFT(mailingAddress,CHARINDEX(' ',mailingAddress)),mailingAddress,LEN(LEFT(mailingAddress,CHARINDEX(' ',mailingAddress))))-1))
WHERE CHARINDEX(LEFT(mailingAddress,CHARINDEX(' ',mailingAddress)),mailingAddress,LEN(LEFT(mailingAddress,CHARINDEX(' ',mailingAddress)))) > 1
SELECT * FROM #temp
DROP TABLE #temp
Where #temp is your table name?
January 19, 2010 at 9:45 am
Thanks Matt... This is going to work.
I did find other records that were duplicated a bit different. I am assuming I will need to tweak your code to change those.
example
1105 1105 Brookshire Ln
2450 2450 Middleset Trail
In these.. I would need to delete one of the numbers.
January 19, 2010 at 9:49 am
... for each row
....... what is the index of the last character
........... of the largest substring
............... repeating no more than once
Is the above statement a basis for a solution to the problem?
i.e. Having found the above for each row, take the left LEFT(<address>, <lastindex>) and use this in the UPDATE? (if it it a large table of course, it may be better to INSERT into a new one, check the results, and then replace the old).
With the SUBSTRING function and an Auxiliary Table of Numbers, a scan can be created to calculate a substring combinations (starting from the begining of the address) for one, and then working backwards for the comparator (Would the article 'Optimising “Ends With” searches with REVERSE' published on this site help here?).
The first substring length should be >= to the second (which may be zero-length).
If previous replies did not solve the issue, would the above help as the basis for a solution (if I understood the problem correctly?)?.
January 19, 2010 at 9:51 am
Yeah, the update statement above won't work and would actually change the addresses you just gave to be 1105 and 2450 respectively. If you are sure those are the only 2 you could just ignore them in a where clause but if you need to code for this case then you would have to change the current where clause.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply