November 3, 2014 at 12:12 pm
Hi there,
I have a table with address field of 4 million records. This field has Street address + city, province, zipcode in the address. I want to separate them as different columns.
Here are different values of addresses
First 5 digits are zipcode. kraj,kray & oblast mean region, anything after G and before comma is city name
create table temp (address varchar(200) not null)
insert into temp values('12345, Sverdlovskaya oblast, g Ussurijsk, per Leningradskij, d')
insert into temp values('12345, Primorskij kraj, g Ussurijsk, ul Vorovskogo, d 91')
insert into temp values('12345, g Ussurijsk, ul Vorovskogo, d 91')
insert into temp values('12345, STAVROPOL KRAY, g Ussurijsk, ul Vorovskogo, d 91')
I know how to parse zipcode
select SUBSTRING(Address,0,6) as "zipcode" from temp
but I am clueless how to go about writing a parser for city and region
Thanks
November 3, 2014 at 12:31 pm
use patindex to find the location of your delimiters and use those loactions for start and end in substring.
November 3, 2014 at 12:43 pm
I'm not completely sure that this will work for you, as I'm not sure if I'm reading it right. You can use Jeff Moden's DelimitedSplit8K function to split out the different parts of your address and then query them by position. If you aren't familiar with DelimitedSplit8K, check the second link in my signature. It's well worth the time to read the article and I think you'll be happy with performance.
select address,
zip = MAX(case when s.itemnumber = 1 then s.item end),
region = MAX(case when s.itemnumber = 2 then s.item end),
city = MAX(case when s.itemnumber = 3 then s.item end),
address = MAX(case when s.itemnumber = 4 then s.item end)
from temp
cross apply DelimitedSplit8K(address, ',') s
group by address;
This approach just requires that the order of the delimited parts is in the same sequence in the string. Please adjust the parts of the address if I got them wrong. HTH.
November 3, 2014 at 1:15 pm
A variation on Ed's query to take into account the rules that you mentioned. It still uses the Delimited8kSplitter, but checks the items rather than the itemnumbers
SELECT address,
MAX(CASE WHEN itemnumber = 1 THEN item END) postcode,
MAX(CASE WHEN item like '% oblast' or item like '% kra[yj]' THEN reverse(stuff(rev,1,charindex(' ',rev),'')) END) region,
MAX(CASE WHEN item like 'g %' THEN stuff(item,1,2,'') END) city,
MAX(CASE WHEN item like 'd %' THEN stuff(item,1,2,'') END) num
FROM temp t
CROSS APPLY (
SELECT itemnumber,
ltrim(item) item,
reverse(ltrim(item)) rev
FROM DelimitedSplit8K(address,','))x
GROUP BY address;
I hope I was right in guessing that you would want the identifier strings stripped out of the result.
EDIT: Cleaned it up a little
November 3, 2014 at 1:24 pm
Nice work, Micky. Thanks for picking up the part that wasn't sinking in for me.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply