August 3, 2012 at 8:09 am
Hi,
Consider the folllwing two records:
if object_id('test') > 0 drop table dbo.test
create table dbo.test (
master_idint,
Titlevarchar(20),
Forenamevarchar(50),
Surnamevarchar(50),
Address1varchar(200),
Address2varchar(200),
Address3varchar(200),
Address4varchar(200),
Townvarchar(100),
Countyvarchar(100),
Postcodevarchar(20),
Hierarchyint
)
insert into dbo.test
select 75984,'Mr','O','SQLady','6 Vivien Avenue Midsomer Norto','Midsomer Norton',NULL, NULL, 'RADSTOCK',NULL, 'BA6 2VG',2 union all
select 76144,'Mrs','N','Replacemento','56 Vivien Avenue Midsomer Nort','Midsomer Norton',NULL, NULL, 'RADSTOCK',NULL, 'BA6 2VG',2
select * from dbo.test
As you can see, Address1 contains part of address 2 which is incorrect. I tried to use replace but this doesn't find the whole of address2 in adress1 so nothing changes. Any ideas how to remove the Midsomer Nort from address 1?
Thanks in advance.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 3, 2012 at 8:15 am
I've seperated the street number but if anyone can help me remove the redundant information from the address1 I'd be very grateful.
update a
set a.streetno = F.StreetNo
from dbo.test as a
cross apply (select charindex(' ', Address1) AS CommaPos) F0
cross apply (select case when CommaPos = 0 then 'Bad Address' else SUBSTRING(Address1,1,CommaPos - 1) end as StreetNo) F
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 3, 2012 at 8:17 am
It's one of those things that sounds trivially easy until you start trying to do it.
You could set it to remove subsets, but then you end up with Address2 = "No 6", and you end up deleting "No" out of "1234 North Main St" in Address1, so you end up with "1234 rth Main St".
How important is this to the business you work for/with? In the US (that doesn't look like a US address, but I didn't check), you can buy software that will clean up addresses for you. It's called "CASSing", and the vendors have to be "PAVE Certified" by the USPS. Canada has a similar system available from various vendors. I believe so do most of the European countries. I haven't checked Latin American countries for that kind of thing (the address looks like it might be LATAM or Spain is why I mention those).
If address validation and clean-up matters, I'd look into getting something like that. You can Bing/Google the companies that provide it. If you go that route (assuming it's available for this address' local), it'll clean those kinds of things up for you, and help you identify the ones that it can't auto-clean. Very efficient.
Addresses, like human names, are a lot trickier to clean up than they look at first glance.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 3, 2012 at 8:26 am
GSquared, everything you say is correct.
I'm in the UK and yes we do use software to clean addresses for some customer data but others don't actually want their addresses rebuilding so we're having to manually cleanse the address fields which is a nightmare to do.
My main project is the data merge but without correctly formatted address fields this process is proving very difficult. Another member Chris@Work/Chris@Home has given some great hints in another thread but I feel like Iβm' been going round in circles this week. I can't even get my household merge to work!
The current scripts used, although not very efficient but they do the job. To get past this problem the previous developer basically did a comparison of the LEFT(Address1, 12).
I was hoping to be able to standardise my addresses somehow and use the fuzzy mating ratio suggested by Chris to work out my dupes.
Itβs the end of my second week in this job and Iβm already starting to panic! :unsure:
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 3, 2012 at 8:35 am
Thinking about this more, I wonder if the following will work.
Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 6, 2012 at 6:07 am
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.
You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 6, 2012 at 6:41 am
GSquared (8/6/2012)
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.
You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.
You've been through this pain before, haven't you? π
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 6, 2012 at 7:40 am
How about "cleaning" all address into another table with a pointer back to the original "unclean" address.
You can than compare the clean addresses to identify duplicates.
August 7, 2012 at 6:22 am
Phil Parkin (8/6/2012)
GSquared (8/6/2012)
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.
You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.
You've been through this pain before, haven't you? π
Was the DBA for a direct-mail marketing company for 7 years. If there's one type of data I'm overqualified to suffer from, it's name-and-address tables.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 8, 2012 at 3:07 am
Below works! If I understood correctly π
create table #test (
master_idint,
Titlevarchar(20),
Forenamevarchar(50),
Surnamevarchar(50),
Address1varchar(200),
Address2varchar(200),
Address3varchar(200),
Address4varchar(200),
Townvarchar(100),
Countyvarchar(100),
Postcodevarchar(20),
Hierarchyint
)
insert into #test
select 75984,'Mr','O','SQLady','6 Vivien Avenue Midsomer Norto','Midsomer Norton',NULL, NULL, 'RADSTOCK',NULL, 'BA6 2VG',2 union all
select 76144,'Mrs','N','Replacemento','56 Vivien Avenue Midsomer Nort','Midsomer Norton',NULL, NULL, 'RADSTOCK',NULL, 'BA6 2VG',2
select * from #test
DECLARE @index INT
DECLARE @Add1 VARCHAR(MAX)
DECLARE @Add2 VARCHAR(MAX)
DECLARE Addres CURSOR FOR SELECT address1, address2 FROM #test
OPEN addres
FETCH NEXT FROM addres INTO @add1, @add2
WHILE @@FETCH_STATUS = 0
BEGIN
SET @index = 1
WHILE @index <= LEN(@add2)
BEGIN
IF @Add2 = RIGHT(@add1, LEN(@Add2))
BEGIN
UPDATE #test
SET address1 = REPLACE(address1, @Add2, '')
WHERE address1 = @add1
END
SET @add2 = LEFT(@add2, LEN(@Add2)-1)
END
FETCH NEXT FROM addres INTO @add1, @add2
END
CLOSE addres
DEALLOCATE addres
SELECT * FROM #test
DROP TABLE #test
August 8, 2012 at 3:56 am
Yes, that works for the sample data provided (although the cursor isn't necessary) but what happens when you have '11 High Street' and 'Chester-le-Street' as your address1 and address2?
I think Michael's idea is the best, if this exercise is really necessary. It's going to involve a lot of manual work. I can't understand why you'd give your customers a choice about what format their address is stored in. The format approved by Royal Mail should be the one to use. There may be something I don't know about your business, so perhaps I'm wrong.
As Gus mentions, beware of assuming street names follow a standard. Here's some examples of some well-known streets in London that don't: Strand, Kensington Gore, Eastcheap, New Change, Green Lanes.
John
August 8, 2012 at 4:15 am
John - Didn't understood that how code will fail there? May you please elaborate a bit.
August 8, 2012 at 4:22 am
My apologies. I performed a visual inspection but didn't actually test it. Try it with '11 Village Green' and 'Greenford' instead.
John
August 8, 2012 at 4:33 am
Well John I agree if such addresses exists in UK then obviously my code will fail. May be there is a lot I need to learn about UK. A newbie is newbie and an expert is an expert. π
August 8, 2012 at 6:05 am
John Mitchell-245523 (8/8/2012)
I think Michael's idea is the best, if this exercise is really necessary. It's going to involve a lot of manual work. I can't understand why you'd give your customers a choice about what format their address is stored in. The format approved by Royal Mail should be the one to use. There may be something I don't know about your business, so perhaps I'm wrong.
John
Yes, MVJ's idea sounds good.
I'm working on a project for the data services department. The source data comes from various customers so we don't have control on data quality I'm afraid!
I will create a copy of the data which links back to the original source and rebuild the address fields using PAF software. How I get this bit automated is another problem I will need to work out!
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply