January 9, 2018 at 4:10 am
For those who don't know, a PAON is the part of an address that identifies a building. A SOAN is the part that identifies a part of a property. ie, in "Flat 24 Loddon House", "Flat 24" would be the SAON, "Loddon House" would be the PAON. in "56 Rice Road", there is no SAON and the PAON would be "56".
I have a data source in which the SAON part has frequently being included in the PAON and I need to clean them out. It's not going to be an exact science and I'm just looking for a "best I can get" result.
I have a cursor based function that's getting me close enough to be happy but the dataset is quite large so I'm hoping for a more efficient approach (this runs nightly so performance isn't really crucial but I'd still rather get as efficient as possible).
I've attached a script which shows the cursor approach as well as a script to create and populate a small table of SAON parts.
Can anyone suggest a non cursor based approach for this?
January 24, 2018 at 3:26 am
Do you save extracted SAON and POAN parts o addresses?
_____________
Code for TallyGenerator
January 24, 2018 at 3:57 am
Neither is actually saved but both parts are used in a subsequent lookup to a table of UPRNs (a UPRN is a unique code given to every property in the UK)
I'm really concerned about getting the PAON though.
January 24, 2018 at 6:32 am
Do you really expect PAON part of an address to be changed over time?
Because I cannot see any other reason for not saving PAON and SAON parts once they are found.
Actually, I cannot see any reason for saving the whole address as a single string.
You can always get it by concatenating SAON and PAON parts.
Not so easy for the opposite transformation, as you already know.
Ideally, different parts of addresses must be separated on entry form, but if it's not done then you better split the address string into parts on saving it in the database.
Newly recorded addresses may be bound for manual verification, and correction if needed.
Saved and verified split would overwrite whatever fuzzy and probably not perfect logic you put in your function.
Having PAON parts saved could also help identifying the correct way of splitting of new addresses.
_____________
Code for TallyGenerator
January 24, 2018 at 8:24 am
In principle I agree with you 100% but this isn't something I have control over unfortunately. The data is coming from a third party source.
Some context that may or not be useful: I work on a program that aims to identify the most "at risk" families in the country so that social services can intervene early. Basically we take a load of data feeds from a load of sources (police, education, social services, NHS... about 40 different agencies in total), and try and identify the individuals across those sources - so lots of fuzzy matching around names, aliases, dates of birth, known addresses, and known ID codes (NI number, UPN, NHS number etc.) and so on.
The particular problem I'm having here is on trying to match addresses against the post office database to get UPRNs. The post office's DB is in a nice consistent state but the data I'm trying to match against it is... well... I get what I'm given and have to make the best of it:doze:
January 24, 2018 at 10:33 pm
For an address "2/56 Rice Road" what would be the interpretation?
And what if it's "Apt.22 2/56 Rice Road" ?
_____________
Code for TallyGenerator
January 25, 2018 at 2:07 am
As I said in my original post, it's not an exact science and I'm really just looking for a "good enough" solution. Because we don't control the incoming data we could get anything and nothing I write is ever going to accommodate every situation. Neither of those patterns occurs frequently enough in the data to be worth considering.
I'm not really concerned about that side of things though. As I said, my current approach is getting me close enough to be happy with the result. I'm only really concerned with performance and replacing the cursor.
January 25, 2018 at 4:24 am
If all you want to do is remove the cursor from your function, this should work. It assumes that (apart from the number) there is no more than one "Part" to be removed from the address. If you have a lot of addresses to check (or even if you only have a few) you'll want to make this into an inline table-valued function for optimum performance.
DECLARE @ReturnString nvarchar(max) = 'Flat 24 Loddon House';
SELECT @ReturnString = STUFF(LTRIM(@ReturnString),1,LEN(Candidate),'') -- Remove the part
FROM tbl_Matrix_SAONParts
WHERE @ReturnString LIKE Candidate + '%';
SET @ReturnString = SUBSTRING(@ReturnString,PATINDEX('%[A-Z]%', @ReturnString), LEN(@ReturnString)); -- Remove the number
SELECT @ReturnString;
John
January 25, 2018 at 4:45 am
Yes, I think I can make the assumption that there will only be a single part to remove. So that should do what I'm looking for. I hadn't considered using the Where clause like that but it seems quite obvious in hindsight.
I only want to remove the numeric part if a SAON part has been removed, though, or else I risk removing road numbers. (Ie "Flat 12 Blodwin House" should become "Blodwin House" but I wouldn't want "22 Acacia Avenue" to become "Acacia Avenue"). I should be able to handle that by checking the row count before doing that step though.
I'm going to have a play with it to test it but it looks like exactly what I'm after. Thanks. 🙂
January 25, 2018 at 4:56 am
Or just put it in the SELECT:
DECLARE @ReturnString nvarchar(max) = 'Flat 24 Loddon House';
SELECT @ReturnString = SUBSTRING(
STUFF(LTRIM(@ReturnString),1,LEN(Candidate),'') -- part
, PATINDEX('%[A-Z]%',@ReturnString),LEN(@ReturnString) -- number
)
FROM tbl_Matrix_SAONParts
WHERE @ReturnString LIKE Candidate + '%';
SELECT @ReturnString;
John
January 26, 2018 at 8:36 am
Just a thought - if the data you are receiving has a SAON in it the data is being sourced from either a local government property gazetteer or the national one - in either case the UPRN is in the dataset. Could you not just ask the originator to include the UPRN in the data extract?
January 26, 2018 at 11:01 am
The UPRN is included in a lot of the sources we get but many of them don't include it. I wish they did, my life would be a whole lot easier.🙂
We do ask for it but any datasets we get are provided for us voluntarily (despite the fact that we're local govt ourselves). The various source agencies have no obligation to provide them. We're not paying for them and we have almost no clout when it comes to such requests. Some more may start to include it in the future but I can't wait for it and I can't rely it.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply