October 4, 2016 at 8:06 am
Hi Guys,
First post for me here, so i dont know if i am at the right place.
I have a street that i need to dissect a streetname in 3 columns
adres=molenweg 9a
streetname = molenweg
housenumber = 9
suffix = a
I have found the following on the internet
SELECT top 1 LEFT(Adres,PATINDEX('% [0-9]%',Adres)-1)'Street'
, SUBSTRING(Adres,PATINDEX('% [0-9]%',Adres)+1,PATINDEX('%[0-9],%',Adres+ ',')-PATINDEX('% [0-9]%',Adres))'House Number'
FROM t
it works when i dont have a suffix.
Can someone point me in the right direction ?
October 4, 2016 at 8:23 am
maikel01 (10/4/2016)
Hi Guys,First post for me here, so i dont know if i am at the right place.
I have a street that i need to dissect a streetname in 3 columns
adres=molenweg 9a
streetname = molenweg
housenumber = 9
suffix = a
I have found the following on the internet
SELECT top 1 LEFT(Adres,PATINDEX('% [0-9]%',Adres)-1)'Street'
, SUBSTRING(Adres,PATINDEX('% [0-9]%',Adres)+1,PATINDEX('%[0-9],%',Adres+ ',')-PATINDEX('% [0-9]%',Adres))'House Number'
FROM t
it works when i dont have a suffix.
Can someone point me in the right direction ?
Can you provide some specific parsing rules, please? Eg,
* Streetname = the first word after the equals sign
* House number = the penultimate character in the string
* Suffix = the final character in the string
Note: I do not think that these rules are sophisticated enough, though they work for your example. I presented them in order to make you think about other cases.
Once we have the detailed rules, we can get you up and running.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 4, 2016 at 8:52 am
The following partial solution is dependant on PatExclude8K available here ... http://www.sqlservercentral.com/scripts/T-SQL/117890/
If your data is "adres=molenweg 9a" then you will need to strip out the "adres=" first.
DECLARE @SourceTable TABLE (
ID INT IDENTITY(1,1)
, adres VARCHAR(255)
);
INSERT INTO @SourceTable (adres)
VALUES (' molenweg 9a '), (' molenweg 9 ')
;WITH cteSplitOne AS (
SELECT ID, adres
-- Find the first number with a preceding space, and use that to extract the street name.
, StreetName = LTRIM(RTRIM(LEFT(adres,PATINDEX('% [0-9]%',adres)-1)))
-- The rest of the adress is the housenumber and suffix
, HouseAndSuffix = LTRIM(RTRIM(RIGHT(adres, LEN(adres) - PATINDEX('% [0-9]%',adres) +1)))
FROM @SourceTable
)
SELECT cte.ID, cte.adres
, cte.StreetName
, HouseNumber = num.NewString
, Suffix = suff.NewString
FROM cteSplitOne AS cte
CROSS APPLY dbo.PatExclude8K(HouseAndSuffix, '%[^0-9]%') AS num
CROSS APPLY dbo.PatExclude8K(HouseAndSuffix, '%[^A-Za-z]%') AS suff
October 4, 2016 at 9:06 am
Here's an option that doesn't need additional functions.
This will fail without warning if the street name has a number.
DECLARE @SourceTable TABLE (
ID INT IDENTITY(1,1)
, adres VARCHAR(255)
);
INSERT INTO @SourceTable (adres)
VALUES (' molenweg 9a '), (' molenweg 9 '), ('Two words 5423 X')
SELECT ID,
adres,
streetname,
LEFT( CompoNum, PATINDEX( '%[^0-9]%', CompoNum+'.')-1) AS housenumber,
STUFF( CompoNum, 1, PATINDEX( '%[^0-9]%', CompoNum+'.')-1, '') AS suffix
FROM @SourceTable
CROSS APPLY (SELECT LEFT( adres, PATINDEX( '%[0-9]%', adres)-1) AS streetname,
STUFF( adres, 1, PATINDEX( '%[0-9]%', adres)-1, '') CompoNum)x;
October 5, 2016 at 1:48 am
Hi Phil,
First of thanks for your reply i have read through the links u provided and it helped me understand what you mean.
The rules are the following:
* Streetname = the first word in the string there are a few rules it can have special chars in it e.g. '&', '.', ',','-','/','`'
* Housenumber = the first numeric character in the string it can look like '48' or '48-50'
* Suffix = the final character in the string
I would like to thank you already for giving me the pointer of giving more information.
October 5, 2016 at 1:51 am
Thanks for your reply DesNorton and Luis Cazares. It worked for a lot of addresses but i did not give not enough information how it could be build up that it does not work for all.
October 5, 2016 at 6:24 am
based on experience, you won't find a one-function-to-rule-them-all address cleanup. break it up into smaller pieces that complement each other.
there's too many exceptions to the patterns;
what I've done previously is create a separate table with "cleaned" addresses.
and changed the logic of the join criteria to use cleaned addresses if found, else original address.
with that in place, create a waterfall type of cleanup of addresses...clean records for like 70% with one method, then a second method works on what was left to clean up maybe 10%, with as many as ten methods to finally complete the "cleaned" addresses initiative until the diminishing returns made me spend time on other projects.
Lowell
October 5, 2016 at 7:47 am
Following the rules you posted, here's an alternative.
DECLARE @SourceTable TABLE (
ID INT IDENTITY(1,1)
, adres VARCHAR(255)
);
INSERT INTO @SourceTable (adres)
VALUES (' molenweg 9a '), (' molenweg 9 '), ('Two words 48-50')
SELECT ID,
adres,
LEFT( adres, numberstart - 1) AS streetname,
SUBSTRING( adres, numberstart, LEN(adres) - CASE WHEN RTRIM(adres) LIKE '%[^0-9]' THEN 1 ELSE 0 END) AS housenumber,
CASE WHEN RTRIM(adres) LIKE '%[^0-9]' THEN RIGHT(RTRIM(adres), 1) ELSE '' END AS suffix
FROM @SourceTable
CROSS APPLY (SELECT PATINDEX( '%[0-9]%', adres) AS numberstart)x;
October 6, 2016 at 1:42 am
I think you are right Lowell wil need to do it that way. Thanks luis this is gonna help me a lot.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply