August 4, 2011 at 1:08 pm
I have a difficult task here. Say there's a column in a table called CityState. The column can have values such as:
Las Angeles, CA
Anaheim, C.A.
San Diego CA
Seattle W.A.
I can't just do a simple substrings on this, because the state could potentially be 4 characters on the right. And I would want it so that it is always just the two character state (if it is C.A. it should be CA). And with a potential comma, the city name isn't as simple either.
If I wanted to select the city name (without the comma) and the state name (without the period) to insert into another table, what would be the best approach here?
Thanks in advance. 🙂
August 4, 2011 at 1:25 pm
i think if you REVERSE the string, and grab up to a space, that might be pretty close to what you are after:
you can always REPLACE() to remove any commas in the city, and REPLACE() to remove periods in the state
SELECT
CityState,
REVERSE(CityState),
CHARINDEX(' ',REVERSE(CityState)),
SUBSTRING(REVERSE(CityState),1,CHARINDEX(' ',REVERSE(CityState))),
SUBSTRING(REVERSE(CityState),CHARINDEX(' ',REVERSE(CityState))+ 1,30),
REVERSE(SUBSTRING(REVERSE(CityState),1,CHARINDEX(' ',REVERSE(CityState)))),
REVERSE(SUBSTRING(REVERSE(CityState),CHARINDEX(' ',REVERSE(CityState))+ 1,30))
FROM
(SELECT 'Las Angeles, CA' As CityState UNION ALL
SELECT 'Anaheim, C.A.' UNION ALL
SELECT 'San Diego CA' UNION ALL
SELECT 'Seattle W.A.'
)X
Lowell
August 4, 2011 at 1:59 pm
I took Lowell's code and tossed in a couple of bells and whistles. Like standardizing the commas.
-- Dude, periods in state abbreviations... seriously?
SELECT CityState,City+', '+st as fixed,city,st
FROM
(SELECT *,
ltrim(rtrim(reverse(left(revcity,(CHARINDEX(' ',revcity,1)))))) as st,
LTRIM(rtrim(reverse(right(revcity,len(revcity)-(CHARINDEX(' ',revcity,1)))))) as city
FROM
(SELECT
CityState
,ltrim(rtrim(reverse(REPLACE(REPLACE(citystate,'.',space(0)),',',SPACE(1))))) as revcity
FROM
(SELECT 'Las Angeles, CA' As CityState UNION ALL
SELECT 'Anaheim, C.A.' UNION ALL
SELECT 'San Diego CA' UNION ALL
SELECT 'Seattle WASH.'
) X
) Y
) Z
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 4, 2011 at 5:09 pm
So it ended up being even worse than originally thought of. The data was even messier than that. In some entries, the zip code was appended to the CityState column. (got to love user input :rolleyes:)
So I ended up having to do a REPLACE() on all numeric characters with an empty character. And there was tons of spaces all over, so I had to trim when I went to split them up.
If anybody is interested in seeing the final resulting query then I will gladly post. Talk about a few hours of frustration...
August 4, 2011 at 10:48 pm
trstringer (8/4/2011)
So it ended up being even worse than originally thought of. The data was even messier than that. In some entries, the zip code was appended to the CityState column. (got to love user input :rolleyes:)So I ended up having to do a REPLACE() on all numeric characters with an empty character. And there was tons of spaces all over, so I had to trim when I went to split them up.
If anybody is interested in seeing the final resulting query then I will gladly post. Talk about a few hours of frustration...
Absolutely... post the code. I'd also be interested to see 10 or 20 of the really wierd entries you had to put up with so I can take a crack at simplifying the code for you.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2011 at 11:28 pm
Here's my take on it. I don't know for sure if I have all the bases covered but I've avoided the very expense REVERSE function altogether thanks to my buddies ISNULL, NULLIF, and RTRIM.
Here's the test table I set up...
--===== Create and populate a test table on-the-fly
-- This is NOT a part of the solution.
SELECT d.MixedName
INTO #CityState
FROM (
SELECT 'Las Angeles, CA' As CityState UNION ALL
SELECT 'Anaheim, C.A.' UNION ALL
SELECT 'San Diego CA' UNION ALL
SELECT 'Seattle W.A.' UNION ALL
SELECT 'Auburn Hills , M.I., 48325-1234 ' UNION ALL
SELECT 'Some Really Long City Name , C.A 90210 '
) d (MixedName)
;
Here's a possible solution...
--===== Solve the problem without any REVERSE's
WITH
cteFirstPass AS
(
SELECT CityState =
RTRIM(
REPLACE(
REPLACE(
SUBSTRING(
MixedName
,1,ISNULL(NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%',MixedName)-1,-1),8000))
,',','')
,'.','')
),
Zip =
RTRIM(
SUBSTRING(
MixedName
,ISNULL(NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%',MixedName),0),7000),10)
)
FROM #CityState
)
SELECT City = RTRIM(SUBSTRING(CityState,1, LEN(CityState)-2)),
State = RIGHT(CityState,2),
Zip
FROM cteFirstPass
;
Now... the premise of the code above is that the last two letters in the text will always be a State abbreviation whether they're separated by periods, spaces, or something else. The comma means nothing in the code above.
It also assumes that Zip codes will consist of at least 5 contiguous digits to the right of the State. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2011 at 7:31 am
First off, here's some data that I saw:
Anaheim CA
Los Angeles, C.A.
San Diego,C.A. (notice the lack of space between the comma and the state abbr)
Tampa Bay, F L 33703 (notice the zip code)
Miami F.L. 331 (yes, that's not a typo. It is a partial zip code)
There are no typos there, sometimes there were zero spaces, sometimes there were excess spaces. Sometimes there was 5-digit zips, sometimes there was partial zips. Unbelievable. 🙂 Anyways, this is what I came up with. I'm sure it's not the most efficient piece of code but it seems to get the job done on 20,000+ records rather fast.
select mailingcitystate
into #MailingCityStateTemp1
from OriginalData
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '0', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '1', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '2', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '3', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '4', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '5', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '6', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '7', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '8', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '9', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, '.', '')
update #MailingCityStateTemp1
set MailingCityState = REPLACE(mailingcitystate, ',', ' ')
update #MailingCityStateTemp1
set MailingCityState = RTRIM(mailingcitystate)
select ltrim(LEFT(mailingcitystate, LEN(mailingcitystate) - 3)) as MailingCity,
replace(RIGHT(mailingcitystate, 3), ' ', '') as MailingState
into #MailingCityStateTemp2
from #MailingCityStateTemp1
A bug that took me a while to find was that I had originally replaced the comma (',') with nothing (''). But that was putting the city and the state's name with no space right next to each other, throwing off when I split the string. The simple solution was to just replace the comma with a space, because all leading and trailing spaces end up getting trimmed.
Please be gentle on the code criticism! 😀
August 7, 2011 at 1:46 am
Did you give Jeff's solution a try?
August 7, 2011 at 9:16 am
trstringer (8/6/2011)
Nobody?! I had to further tweak that code because it was moreso a SELECTed column in an INSERT VALUES clause. But that's the gyst of it.
Heh... you've not tried my code, yet, I see. For the mess on Zip Codes, just alter my code to look for 3 digits or maybe even just one and Bob's your uncle. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply