June 4, 2009 at 7:50 pm
Hi guys, need some help with a mailing file problem. On some customer addresses, we have received data where the address appears twice within the same field - mainly on rural property names eg. Penrose Penrose or Kentucky Kentucky. I need to re-format it for mailing. I'm ok where it's a 1-word property that's repeated - I used this code below to update the field [addr_1] to just the first word -
update ##tbl_xx
set addr_1 = substring(addr_1, 1, (CHARINDEX(' ',addr_1)-1) )
where substring(addr_1,1,charindex(' ',addr_1)) = substring(addr_1,(charindex(' ',addr_1)+1),99)
But having problems where it's two-worded name that is repeated eg. wilson farm wilson farm
I need a query that 1) finds these repeated addresses, and then 2) updates the field to just select the first two words. The closest I can get is the code below -
select addr_1,
rtrim(substring(addr_1,1,charindex(' ',addr_1)) +substring(addr_1,(charindex(' ',addr_1)+1), charindex(' ',addr_1))) as reformatted_addr
from ##tbl_xx
where addr_1 like '% % % %' and addr_1 not like '%[0-9]%'
I tried using patindex but don't have much experience with it, so couldn't get it to work for me. Help!
June 5, 2009 at 5:25 pm
The only difference between PATINDEX and CHARINDEX is that PATINDEX supports wildcard characters for the match whereas CHARINDEX does not.
This code uses Charindex to fill in parameters of SUBSTRING. The first WHEN in the case statement uses it to find the first space in field "address" and then compares the substring from the start of address to the space with the substring from after the space to the end. The second when is similar, but it uses charindex in the "start position" parameter to find the next space. It then can compare the two halves of address, before and after that second space. In either case, if a match is found, the first half is used as the returned value.
One other thought does occur here. In general, it can be dangerous to completely automate data clean-up efforts like this. My friends in a certain town in Washington state would probably wonder when they were moved to "Walla". You may want to consider using this sort of query to produce a candidates listing for manual review and action.
Declare @address table (address char(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'Ohio'
union all
select 'Iowa'
select case when substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space
= substring(address, charindex(' ', rtrim(address)) +1, len(address)) --Substring from 1st space to end
then substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space
when substring(address, 1, charindex(' ', rtrim(address), charindex(' ', rtrim(address)) +1) ) --Substring up to 2nd space
= substring(address, charindex(' ',rtrim(address), charindex(' ',rtrim(address)) +1 ) +1, len(address)) --Substring from 2nd space to end
then substring(address, 1, charindex(' ',rtrim(address),charindex(' ',rtrim(address)) +1) ) --Substring up to 2nd space
else address
end
from @address
June 5, 2009 at 8:29 pm
Borrowing on John's good data creation code and assuming the column is actually a VARCHAR instead of a char, here's a super simple way to take care of double, triple, or more words without having to know how many words there are...
Declare @address table (address VARchar(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'New York CITY New York CITY'
union all
select 'Ohio'
union all
select 'Iowa'
SELECT CASE
WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END
FROM @Address
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2009 at 1:05 pm
Jeff, your response is valuable to the original poster in that it gives him a straight-forward, scalable solution, but may be more valuable to me and others on SSC in that it demonstrates the practice of cutting to the core requirement. I had started with the question as posed -- how to parse the field with charindex() -- but the real problem was how to find fields with the first half the same as the second.
Somewhere on SSC a while ago, I saw the expression "You think you need an answer? No. You need a question." When stated "how to find first half equal to second half", the simpler and better solution almost writes itself. This is to say, the genius was in stating the problem accurately.
Let's make it work with address defined as a char(50) by using RTRIM() so that the RIGHT() function works as intended. Note that the LEN() function does an implied RTRIM already.--Declare @address table (address VARchar(50))
Declare @address table (address char(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'New York CITY New York CITY'
union all
select 'Ohio'
union all
select 'Iowa'
SELECT
CASE
WHEN LEFT(Address,LEN(Address)/2)=RIGHT(rtrim(Address),LEN(rtrim(Address))/2)
--WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END
FROM @Address
June 6, 2009 at 6:44 pm
Thanks for the awesome feedback, John. I really appreciate it.
Thanks for including an explanation, as well. I normally try to explain the code for the OP in the form of embedded comments, but it was getting late and I got a bit lazy. You covered it for me very nicely.
As a sidebar, perhaps we can make one small optimization that probably won't matter unless there are millions of rows at hand... it'll also give me a chance to redeem myself for being so lazy last night...
[font="Courier New"]--===== Create and populate a test table.
-- This is NOT a part of the solution.
DECLARE @Address TABLE (Address CHAR(50))
INSERT INTO @Address
(Address)
SELECT 'Lincoln' UNION ALL
SELECT 'Kentucky Kentucky' UNION ALL
SELECT 'Walla Walla' UNION ALL
SELECT 'New York New York' UNION ALL
SELECT 'New York CITY New York CITY' UNION ALL
SELECT 'New York CITYNew York CITY' UNION ALL
SELECT 'New York CITYNew York CITY ' UNION ALL
SELECT 'Ohio' UNION ALL
SELECT 'Iowa'
--===== Return only single addresses.
-- If you compare the left half of the string to the right half
-- and both halves are the same, then use just the left half.
-- If the halves are not the same, then the address has not been
-- doubled up and we can use the address as it is.
--
-- Notice that each half can be either separated by a space or
-- jammed together (see test data above) when a dupe is present.
-- The divide by 2 in the code below works in either case because
-- LEN() procudes an INT and the 2 is an INT which forces only
-- Integer math to take place. For example, if your string is
-- "AB AB" or "ABAB", the length of one half for the first is
-- 5/2 = 2 and 4/2 = 2 for the second. Both equal 2 in integer math.
-- The cool part about LEN() is that it won't count trailing spaces
-- so no special calculation for CHAR(50) needs to be made. We do,
-- however, have to RTRIM() the RIGHT side of the address when we
-- split the address in half so we don't pick up on any trailing
-- spaces.
SELECT CASE
WHEN LEFT(Address,LEN(Address)/2) = RIGHT(RTRIM(Address),LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END AS SingleAddress
FROM @Address[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
June 9, 2009 at 5:06 pm
Thanks so much for your help Jeff & John, really appreciate it! 🙂
June 9, 2009 at 7:17 pm
hugh.mileshkin (6/9/2009)
Thanks so much for your help Jeff & John, really appreciate it! 🙂
Thanks, Hugh. Good feedback like that is the only pay we get for this type of thing. I appreciate it.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 10, 2009 at 9:14 am
Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.
This is fixed by adding RTRIM and LTRIM to left and right half respectively:
WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))
Of course, there are probably other scenarios that don't work, but the OP needs to test with his data. 🙂
Terri
To speak algebraically, Mr. M. is execrable, but Mr. C. is
(x+1)-ecrable.
Edgar Allan Poe
[Discussing fellow writers Cornelius Mathews and William Ellery Channing.]
June 12, 2009 at 7:36 pm
Terri (6/10/2009)
Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.This is fixed by adding RTRIM and LTRIM to left and right half respectively:
WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))
Of course, there are probably other scenarios that don't work, but the OP needs to test with his data. 🙂
Yep... good enhancement. The best thing, of course, would be to pork chop the silly person who made the mistake of combining the data to begin with. 😉 Solutions for to correct these types of mistakes are bound to have a flaw here and there.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply