April 30, 2015 at 7:39 pm
Hi,
Version : sqlserver 2008 R2
sample data to play with,
declare @Address table(Id int primary key identity(1,1),Address varchar(100),city varchar(20),State varchar(20))
insert into @Address
select '123 sheril street, Newyork,NY','Newyork','NY' union all
select '333 bamboo street, Manhattan,NY', null,null union all
select '123 ketty street, Albany,NY','Albany','NY' union all
select '123 sibul street, Buffalo,NY',null,null
This is just sample data. i have around 100000 address for different city and states.Basically i need to remove the city and state from address
Here is waht i need to achieve,
1. Get the city,state check exists on address column if found then remove the city and state. if not leave it as it.
2. If the city and state column is null and if that is present in address column then remove that from address clumn and upate it in city and state column
respectively.
Is this possible to achieve without cursor or looping? any sample query please .
April 30, 2015 at 7:50 pm
/* remove City and State from the Address */
SELECT Address
, LEFT(Address,CHARINDEX(',',Address,1)-1) AS StreetAddr
, RIGHT(Address,2) AS AddrState
FROM Address1;
April 30, 2015 at 8:35 pm
Simple solution using the DelimitedSplit8K [/url]function
😎
USE tempdb;
GO
SET NOCOUNT ON;
declare @Address table(Id int primary key identity(1,1),Address varchar(100),city varchar(20),State varchar(20))
insert into @Address
select '123 sheril street, Newyork,NY','Newyork','NY' union all
select '333 bamboo street, Manhattan,NY', null,null union all
select '123 ketty street, Albany,NY','Albany','NY' union all
select '123 sibul street, Buffalo,NY',null,null ;
/* Split and Cross-tab */
SELECT
A.Id
,MAX(CASE WHEN X.ItemNumber = 1 THEN LTRIM(X.Item) END) AS Address
,MAX(CASE WHEN X.ItemNumber = 2 THEN LTRIM(X.Item)
WHEN A.City IS NOT NULL THEN A.City END) AS City
,MAX(CASE WHEN X.ItemNumber = 3 THEN LTRIM(X.Item)
WHEN A.State IS NOT NULL THEN A.State END) AS State
FROM @Address A
OUTER APPLY dbo.DelimitedSplit8K(A.Address,',') AS X
GROUP BY A.Id;
Results
Id Address City State
---- ------------------- ------------ ------
1 123 sheril street Newyork NY
2 333 bamboo street Manhattan NY
3 123 ketty street Albany NY
4 123 sibul street Buffalo NY
May 1, 2015 at 7:49 pm
Hey Guys,
thanks for your reply.
I Eirik,
before i though of posting on this forum, i thought about using Jeff's delimit function what you have suggested. But i am sorry that i forgot to mention about some of the address messed up in my system like, it will not have proper comma separated address.
Ex : larser&tubro street hudson TX
Ex : peppy and tugo road pitsburgh PA
Not sure how do i achieve this without looping it. I thought of creating temp table will have city and state as separate column and using that i need to match with my actual table to do the operation. is it possible to do without cursor? any suggestion or sample please
May 2, 2015 at 8:43 am
KGJ-Dev (5/1/2015)
Hey Guys,thanks for your reply.
I Eirik,
before i though of posting on this forum, i thought about using Jeff's delimit function what you have suggested. But i am sorry that i forgot to mention about some of the address messed up in my system like, it will not have proper comma separated address.
Ex : larser&tubro street hudson TX
Ex : peppy and tugo road pitsburgh PA
Not sure how do i achieve this without looping it. I thought of creating temp table will have city and state as separate column and using that i need to match with my actual table to do the operation. is it possible to do without cursor? any suggestion or sample please
Thought it was too good to be true, normally this type of information wouldn't be that consistent;-)
😎
Do you have a list of Cities and States in a table/tables?
May 2, 2015 at 7:54 pm
Hi Eirik,
thanks for your time on this, please assume that i have city,state,zip in a separate table called tbl_city_state. please proceed with your suggestion/sample based on tbl_city_state table
thanks
May 2, 2015 at 11:22 pm
Here is a partial solution, it matches the city/state but does not remove them from the address.
😎
Sample data
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TBL_ADDRESS_WORKTABLE') IS NOT NULL DROP TABLE dbo.TBL_ADDRESS_WORKTABLE;
CREATE TABLE dbo.TBL_ADDRESS_WORKTABLE
(
Id INT IDENTITY(1,1) NOT NULL
,Address VARCHAR(100) NOT NULL
,city VARCHAR(20) NULL
,State VARCHAR(20) NULL
);
INSERT INTO dbo.TBL_ADDRESS_WORKTABLE
SELECT '123 sheril street, Newyork,NY','New york','NY' UNION ALL
SELECT '124 sheril street New York, NY','Newyork','NY' UNION ALL
SELECT '333 bamboo street, Manhattan ,NY ', null,null UNION ALL
SELECT '334 birch street Manhattan NY', null,null UNION ALL
SELECT '123 ketty street Albany,NY','Albany','NY' UNION ALL
SELECT '123 sibul street, Buffalo,NY',null,null ;
IF OBJECT_ID(N'dbo.tbl_city_state') IS NOT NULL DROP TABLE dbo.tbl_city_state;
CREATE TABLE dbo.tbl_city_state
(
city_state_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_CITY_STATE_CITY_STATE_ID PRIMARY KEY CLUSTERED
,State CHAR(2) NOT NULL
,City VARCHAR(50) NOT NULL
,County VARCHAR(100) NOT NULL
);
INSERT INTO dbo.tbl_city_state(State,City,County)
VALUES
('NY','Albany','Albany')
,('NY','Manhattan','New York')
,('NY','Amsterdam','Montgomery')
,('NY','Auburn','Cayuga')
,('NY','Batavia','Genesee')
,('NY','Beacon','Dutchess')
,('NY','Binghamton','Broome')
,('NY','Buffalo','Erie')
,('NY','Canandaigua','Ontario')
,('NY','Cohoes','Albany')
,('NY','Corning','Steuben')
,('NY','Cortland','Cortland')
,('NY','Dunkirk','Chautauqua')
,('NY','Elmira','Chemung')
,('NY','Fulton','Oswego')
,('NY','Geneva','Ontario[A]')
,('NY','Glen Cove','Nassau')
,('NY','Glens Falls','Warren')
,('NY','Gloversville','Fulton')
,('NY','Hornell','Steuben')
,('NY','Hudson','Columbia')
,('NY','Ithaca','Tompkins')
,('NY','Jamestown','Chautauqua')
,('NY','Johnstown','Fulton')
,('NY','Kingston','Ulster')
,('NY','Lackawanna','Erie')
,('NY','Little Falls','Herkimer')
,('NY','Lockport','Niagara')
,('NY','Long Beach','Nassau')
,('NY','Mechanicville','Saratoga')
,('NY','Middletown','Orange')
,('NY','Mount Vernon','Westchester')
,('NY','New Rochelle','Westchester')
,('NY','New York','Bronx, Kings, New York,Queens, and Richmond')
,('NY','Newburgh','Orange')
,('NY','Niagara Falls','Niagara')
,('NY','North Tonawanda','Niagara')
,('NY','Norwich','Chenango')
,('NY','Ogdensburg','St. Lawrence')
,('NY','Olean','Cattaraugus')
,('NY','Oneida','Madison')
,('NY','Oneonta','Otsego')
,('NY','Oswego','Oswego')
,('NY','Peekskill','Westchester')
,('NY','Plattsburgh','Clinton')
,('NY','Port Jervis','Orange')
,('NY','Poughkeepsie','Dutchess')
,('NY','Rensselaer','Rensselaer')
,('NY','Rochester','Monroe')
,('NY','Rome','Oneida')
,('NY','Rye','Westchester')
,('NY','Salamanca','Cattaraugus')
,('NY','Saratoga Springs','Saratoga')
,('NY','Schenectady','Schenectady')
,('NY','Sherrill','Oneida')
,('NY','Syracuse','Onondaga')
,('NY','Tonawanda','Erie')
,('NY','Troy','Rensselaer')
,('NY','Utica','Oneida')
,('NY','Watertown','Jefferson')
,('NY','Watervliet','Albany')
,('NY','White Plains','Westchester')
,('NY','Yonkers','Westchester');
The query
USE tempdb;
GO
SET NOCOUNT ON;
SELECT
AW.Id
,CS.city_state_ID
,AW.Address
,CS.CITY
,CS.State
FROM dbo.TBL_ADDRESS_WORKTABLE AW
CROSS APPLY dbo.tbl_city_state CS
WHERE
(
AW.city = CS.City
AND
AW.State = CS.State
)
OR
(
CHARINDEX(CS.CITY, AW.Address,1) > 0
AND
CHARINDEX(CS.State, AW.Address,1) > 0
)
OR
(
CHARINDEX(REPLACE(CS.CITY,CHAR(32),''), AW.Address,1) > 0
AND
CHARINDEX(CS.State, AW.Address,1) > 0
)
;
Sample results
Id city_state_ID Address CITY State
--- ------------- ---------------------------------- ---------- -----
1 34 123 sheril street, Newyork,NY New York NY
2 34 124 sheril street New York, NY New York NY
3 2 333 bamboo street, Manhattan ,NY Manhattan NY
4 2 334 birch street Manhattan NY Manhattan NY
5 1 123 ketty street Albany,NY Albany NY
6 8 123 sibul street, Buffalo,NY Buffalo NY
May 3, 2015 at 5:41 am
Hi Eirik,
appreciated your time and wonderful help. My ultimate goal is to clean up the address field. address field should only have the address. Any thoughts please.
May 3, 2015 at 6:32 am
Here is an addition to the previous code, nothing fancy but it should work most of the time
😎
USE tempdb;
GO
SET NOCOUNT ON;
SELECT
AW.Id
,CS.city_state_ID
,AW.Address
,CASE
WHEN CHARINDEX(CS.City,AW.Address,1) = 0 THEN AW.Address
WHEN CHARINDEX(CS.City,AW.Address,1) > 0 THEN
CASE
WHEN RIGHT(SUBSTRING(AW.Address,1,CHARINDEX(CS.City,AW.Address,1) - 2),1) IN (' ',',') THEN SUBSTRING(AW.Address,1,CHARINDEX(CS.City,AW.Address,1) - 3)
ELSE SUBSTRING(AW.Address,1,CHARINDEX(CS.City,AW.Address,1) - 2)
END
ELSE ''
END
,CS.CITY
,CS.State
FROM dbo.TBL_ADDRESS_WORKTABLE AW
CROSS APPLY dbo.tbl_city_state CS
WHERE
(
AW.city = CS.City
AND
AW.State = CS.State
)
OR
(
CHARINDEX(CS.CITY, AW.Address,1) > 0
AND
CHARINDEX(CS.State, AW.Address,1) > 0
)
OR
(
CHARINDEX(REPLACE(CS.CITY,CHAR(32),''), AW.Address,1) > 0
AND
CHARINDEX(CS.State, AW.Address,1) > 0
)
;
Quick thought, mis-spellings could be handled via alternative name/spelling table.
May 3, 2015 at 10:53 am
Wonderful. thank you so much. i will work with my real data and post back to you in case of any tricky situation. much appreciated.
May 14, 2015 at 3:01 pm
Hi Eirik,
I am into trouble now about some street names having names city name and the current logic removed the the street name as well
ex:
address = '555 Manhatten street, Manhattan ,NY '
logically the the text has to be strip from second Manhattan but as per the current logic it strips @ first occurrence.
i am only getting the out put as 555 where i supposed to get '555 Manhatten street' as output
Any suggestion how to handle this please
May 16, 2015 at 2:06 am
born2achieve (5/14/2015)
Hi Eirik,I am into trouble now about some street names having names city name and the current logic removed the the street name as well
ex:
address = '555 Manhatten street, Manhattan ,NY '
logically the the text has to be strip from second Manhattan but as per the current logic it strips @ first occurrence.
i am only getting the out put as 555 where i supposed to get '555 Manhatten street' as output
Any suggestion how to handle this please
Here is yet another solution which is based on the previous one with the two additions, firstly the logic of determine whether the Address has a State code. The second addition is to use the length of the City and State values to find last separator before those values and cut the Address string there if and only if it has a State code at the end.
😎
This is not bullet proof but it should work in most cases. You will probably have to handle the exceptions outside this code.
Sample data
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TBL_ADDRESS_WORKTABLE') IS NOT NULL DROP TABLE dbo.TBL_ADDRESS_WORKTABLE;
CREATE TABLE dbo.TBL_ADDRESS_WORKTABLE
(
Id INT IDENTITY(1,1) NOT NULL
,Address VARCHAR(100) NOT NULL
,city VARCHAR(20) NULL
,State VARCHAR(20) NULL
);
INSERT INTO dbo.TBL_ADDRESS_WORKTABLE
SELECT '123 sheril street, Newyork,NY','New york','NY' UNION ALL
SELECT '124 sheril street New York, NY','Newyork','NY' UNION ALL
SELECT '125 sheril street New York, NY',NULL,'NY' UNION ALL
SELECT 'PBOX 12345 New York, NY','Newyork','NY' UNION ALL
SELECT '333 bamboo street, Manhattan ,NY ', null,null UNION ALL
SELECT '334 birch street Manhattan NY', null,null UNION ALL
SELECT '335 Manhattan street','Manhattan','NY' UNION ALL
SELECT '336 Manhattan street, Manhattan NY','Manhattan','NY' UNION ALL
SELECT '337 Manhattan street, Manhattan NY',null,'NY' UNION ALL
SELECT 'Manhattan street 338 Manhattan NY',null,NULL UNION ALL
SELECT '123 ketty street Albany,NY','Albany','NY' UNION ALL
SELECT '123 sibul street, Buffalo,NY',null,null ;
IF OBJECT_ID(N'dbo.tbl_city_state') IS NOT NULL DROP TABLE dbo.tbl_city_state;
CREATE TABLE dbo.tbl_city_state
(
city_state_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_CITY_STATE_CITY_STATE_ID PRIMARY KEY CLUSTERED
,State CHAR(2) NOT NULL
,City VARCHAR(50) NOT NULL
,County VARCHAR(100) NOT NULL
);
INSERT INTO dbo.tbl_city_state(State,City,County)
VALUES
('NY','Albany','Albany')
,('NY','Manhattan','New York')
,('NY','Amsterdam','Montgomery')
,('NY','Auburn','Cayuga')
,('NY','Batavia','Genesee')
,('NY','Beacon','Dutchess')
,('NY','Binghamton','Broome')
,('NY','Buffalo','Erie')
,('NY','Canandaigua','Ontario')
,('NY','Cohoes','Albany')
,('NY','Corning','Steuben')
,('NY','Cortland','Cortland')
,('NY','Dunkirk','Chautauqua')
,('NY','Elmira','Chemung')
,('NY','Fulton','Oswego')
,('NY','Geneva','Ontario[A]')
,('NY','Glen Cove','Nassau')
,('NY','Glens Falls','Warren')
,('NY','Gloversville','Fulton')
,('NY','Hornell','Steuben')
,('NY','Hudson','Columbia')
,('NY','Ithaca','Tompkins')
,('NY','Jamestown','Chautauqua')
,('NY','Johnstown','Fulton')
,('NY','Kingston','Ulster')
,('NY','Lackawanna','Erie')
,('NY','Little Falls','Herkimer')
,('NY','Lockport','Niagara')
,('NY','Long Beach','Nassau')
,('NY','Mechanicville','Saratoga')
,('NY','Middletown','Orange')
,('NY','Mount Vernon','Westchester')
,('NY','New Rochelle','Westchester')
,('NY','New York','Bronx, Kings, New York,Queens, and Richmond')
,('NY','Newburgh','Orange')
,('NY','Niagara Falls','Niagara')
,('NY','North Tonawanda','Niagara')
,('NY','Norwich','Chenango')
,('NY','Ogdensburg','St. Lawrence')
,('NY','Olean','Cattaraugus')
,('NY','Oneida','Madison')
,('NY','Oneonta','Otsego')
,('NY','Oswego','Oswego')
,('NY','Peekskill','Westchester')
,('NY','Plattsburgh','Clinton')
,('NY','Port Jervis','Orange')
,('NY','Poughkeepsie','Dutchess')
,('NY','Rensselaer','Rensselaer')
,('NY','Rochester','Monroe')
,('NY','Rome','Oneida')
,('NY','Rye','Westchester')
,('NY','Salamanca','Cattaraugus')
,('NY','Saratoga Springs','Saratoga')
,('NY','Schenectady','Schenectady')
,('NY','Sherrill','Oneida')
,('NY','Syracuse','Onondaga')
,('NY','Tonawanda','Erie')
,('NY','Troy','Rensselaer')
,('NY','Utica','Oneida')
,('NY','Watertown','Jefferson')
,('NY','Watervliet','Albany')
,('NY','White Plains','Westchester')
,('NY','Yonkers','Westchester');
The query
USE tempdb;
GO
SET NOCOUNT ON;
/**********************************************************
Separate City and State from an Address
1. Take the last three characters from the Address and
remove space/comma.
2. Check if it matches a State from tbl_city_state or
the value in the State column.
3. Cross apply the tbl_city_state to match with either
the column values or pattern in the Address column.
4. Find the first comma/space before the City part of
the Address if the Address ends with a State code
and return the position otherwise the length of the
Address.
5. Substring the Address from position 1 to the value
from 4. as the "clean" Address.
**********************************************************/
;WITH BASE_DATA AS
(
SELECT
AW.Id
,AW.Address
,REPLACE(REPLACE(RIGHT(AW.Address,3),' ',''),',','') AS ADDR_END
,AW.city
,AW.State
FROM dbo.TBL_ADDRESS_WORKTABLE AW
)
,CHECK_STATE_IN_ADDRESS AS
(
SELECT
BD.Id
,BD.Address
,BD.city
,BD.State
,BD.ADDR_END
,CASE
WHEN BD.ADDR_END = BD.State AND BD.State IS NOT NULL THEN 1
WHEN BD.State IS NULL AND EXISTS (SELECT * FROM dbo.tbl_city_state WHERE State = BD.ADDR_END) THEN 1
ELSE 0
END AS STATE_IN_ADDR
FROM BASE_DATA BD
)
,ADDRESS_WITH_CITY_STATE AS
(
SELECT
CSIA.Id
,CSIA.Address
,CS.city
,CS.State
,CHARINDEX(CHAR(32),REVERSE(CSIA.Address),(1 + LEN(CS.city ) + LEN(CS.State))) AS POINT_A
,CHARINDEX(CHAR(44),REVERSE(CSIA.Address),(1 + LEN(CS.city ) + LEN(CS.State))) AS POINT_B
,CSIA.STATE_IN_ADDR
FROM CHECK_STATE_IN_ADDRESS CSIA
CROSS APPLY dbo.tbl_city_state CS
WHERE
(
CSIA.city = CS.City
AND
CSIA.State = CS.State
)
OR
(
CHARINDEX(CS.CITY, CSIA.Address,1) > 0
AND
CHARINDEX(CS.State, CSIA.Address,1) > 0
)
OR
(
CHARINDEX(REPLACE(CS.CITY,CHAR(32),''), CSIA.Address,1) > 0
AND
CHARINDEX(CS.State, CSIA.Address,1) > 0
)
)
SELECT
AWCS.Id
,SUBSTRING(AWCS.Address,1,
CASE
WHEN AWCS.STATE_IN_ADDR = 1 AND AWCS.POINT_B = 0 THEN LEN(AWCS.Address) - AWCS.POINT_A
WHEN AWCS.STATE_IN_ADDR = 1 AND AWCS.POINT_B > 0 THEN LEN(AWCS.Address) - AWCS.POINT_B
ELSE LEN(AWCS.Address)
END) AS Address
,AWCS.city
,AWCS.State
FROM ADDRESS_WITH_CITY_STATE AWCS;
Results
Id Address city State
--- ---------------------- ----------- -----
1 123 sheril street New York NY
2 124 sheril street New York NY
3 125 sheril street New York NY
4 PBOX 12345 New York NY
5 333 bamboo stree Manhattan NY
6 334 birch street Manhattan NY
7 335 Manhattan street Manhattan NY
8 336 Manhattan street Manhattan NY
9 337 Manhattan street Manhattan NY
10 Manhattan street 338 Manhattan NY
11 123 ketty street Albany NY
12 123 sibul street Buffalo NY
May 16, 2015 at 2:01 pm
Another approach, not perfect but not bad either;-) Additional metrics can be used to enhance the logic.
😎
USE tempdb;
GO
SET NOCOUNT ON;
;WITH BASE_DATA AS
(
SELECT
AW.Id
,AW.Address
,AW.city
,AW.State
,LEN(AW.Address) AS Address_LEN
,LEN(AW.city) AS city_LEN
,LEN(AW.State) AS State_LEN
,CHARINDEX(REVERSE(AW.city), REVERSE(AW.Address),1) AS CITY_POS
,CHARINDEX(CHAR(32),REVERSE(AW.Address), CHARINDEX(REVERSE(AW.city), REVERSE(AW.Address),1)) AS SPACE_POS
,CHARINDEX(CHAR(44),REVERSE(AW.Address), CHARINDEX(REVERSE(AW.city), REVERSE(AW.Address),1)) AS COMMA_POS
,CHARINDEX(REVERSE(AW.State), REVERSE(AW.Address),1) AS State_POS
,CS.city_state_ID
FROM dbo.TBL_ADDRESS_WORKTABLE AW
LEFT OUTER JOIN dbo.tbl_city_state CS
ON AW.city = CS.City
AND AW.State = CS.State
)
SELECT
BD.Id
,BD.Address
,BD.city
,BD.State
,SUBSTRING(BD.Address,1, BD.Address_LEN - COALESCE(CASE
WHEN BD.CITY_POS > 0 AND BD.COMMA_POS > BD.SPACE_POS THEN BD.COMMA_POS
WHEN BD.CITY_POS > 0 AND BD.COMMA_POS = 0 THEN BD.SPACE_POS
END,0)) AS Cleaned_Candidate_Address
,BD.Address_LEN
,BD.city_LEN
,BD.State_LEN
,BD.CITY_POS
,BD.SPACE_POS
,BD.COMMA_POS
,BD.State_POS
,BD.city_state_ID
FROM BASE_DATA BD;[/code]
May 16, 2015 at 2:35 pm
Hi Eirik,
This is excellent approach and did clean almost. Thank you very much Gentle Man!!!.
May 16, 2015 at 3:14 pm
born2achieve (5/16/2015)
Hi Eirik,This is excellent approach and did clean almost. Thank you very much Gentle Man!!!.
You are most welcome.
😎
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply