September 8, 2015 at 6:24 am
Hi All,
I hope that you can help.
I am querying with a SELECT statement against an address table that has a post code column with corrupt data.
Sample record:- Northants NN4 0NB
Should be NN4 0NB
I have used the following on another column:-
LEFT(A.Addr1,CASE WHEN CHARINDEX ('(', A.Addr1) =0 THEN LEN(A.Addr1) ELSE CHARINDEX('(' ,A.Addr1) -1 END) AS 'Address 1'
but unable to correct this problem?
Thanks in advance
September 8, 2015 at 6:27 am
Is it always a single word, followed by a space, followed by the correct data?
Or, is it always two 3-character strings with a space at the end with the illegal data in front of that?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 8, 2015 at 6:43 am
Hi,
Most of the records in this column are correct i.e. NN4 0NB (two strings separated with a space), Then occasionally I return corrupt records i.e. Derbyshire S42 5BX (three strings separated with spaces). The latter is incorrect i.e. I would only want to SELECT S42 5BX (with no spaces before the start of the string)
Hope that my explanation is sufficient.
Regards
September 8, 2015 at 6:49 am
can you make an assumption based on the length? your example using charindex of the left parenthesis doesn't match your sample you posted.
;WITH MyCTE([PostalCode])
AS
(
SELECT 'Derbyshire S42 5BX' UNION ALL
SELECT 'NN4 0NB' UNION ALL
SELECT 'S42 5BX' UNION ALL
SELECT 'Northants NN4 0NB'
)
SELECT RIGHT([PostalCode],7) As PossibleValue, * FROM MyCTE WHERE LEN([PostalCode]) > 7
Lowell
September 8, 2015 at 7:02 am
Pack_Star (9/8/2015)
Hi,Most of the records in this column are correct i.e. NN4 0NB (two strings separated with a space), Then occasionally I return corrupt records i.e. Derbyshire S42 5BX (three strings separated with spaces). The latter is incorrect i.e. I would only want to SELECT S42 5BX (with no spaces before the start of the string)
Hope that my explanation is sufficient.
Regards
Are you sure your incorrect data is always going to be three strings? If the county is West Yorkshire, it could be four strings and if it's Tyne And Wear then that's five strings.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
September 8, 2015 at 7:05 am
Lowell (9/8/2015)
can you make an assumption based on the length? your example using charindex of the left parenthesis doesn't match your sample you posted.
;WITH MyCTE([PostalCode])
AS
(
SELECT 'Derbyshire S42 5BX' UNION ALL
SELECT 'NN4 0NB' UNION ALL
SELECT 'S42 5BX' UNION ALL
SELECT 'Northants NN4 0NB'
)
SELECT RIGHT([PostalCode],7) As PossibleValue, * FROM MyCTE WHERE LEN([PostalCode]) > 7
A correctly formed postcode can be 7 or 8 characters - eg SW1A 4WW so change RIGHT(PostalCode, 7) to RIGHT(PostalCode, 8) and LTRIM the result
<Edit for poor wording>
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
September 8, 2015 at 7:30 am
A correctly formed postcode can be 7 or 8 characters - eg SW1A 4WW so change RIGHT(PostalCode, 7) to RIGHT(PostalCode, 8) and LTRIM the result
Postcodes can be 6,7 or 8 characters, M1 2DF is also a valid format.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
September 8, 2015 at 7:50 am
Hi All,
I believe that this could be possibly be resolved with a Case Statement?
The select needs to do the following. If there are three separate strings in the column then select from the first character of the second string else select the full column. Please note that each of the strings in the column can be of variable lengths.
Hope that this make sense.
Kind Regards
September 8, 2015 at 7:59 am
Pack_Star (9/8/2015)
Hi All,I believe that this could be possibly be resolved with a Case Statement?
The select needs to do the following. If there are three separate strings in the column then select from the first character of the second string else select the full column. Please note that each of the strings in the column can be of variable lengths.
Hope that this make sense.
Kind Regards
This is the approach I would take, but I'm gonna word it differently.
If there's only one space character (possibly after trimming) use the whole value, otherwise, take the characters after the 2nd to last space to the end of the string.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
September 8, 2015 at 8:03 am
SELECT
ltrim(right(postalcode,charindex(' ',reverse(postalcode),PATINDEX('% %',reverse(postalcode))+1)))
FROM yourdata
I've tested this over 14500 county+[space]+postcode combinations and the only ones it didn't work on were cases where there was no space in the postcode on our data.
Edited to simplify and clarify
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
September 8, 2015 at 8:05 am
If it's an option for you, concentrate on cleaning the data. You're always going to find edge cases that you hadn't accounted for - maybe the two parts of the postcode separated by two spaces or by no spaces, or perhaps the second part is missing altogether. Once you've cleaned the data, make sure that input is validated. It may be worth investing in an address database from Royal Mail. If you can't do that, regular expressions[/url] might be a good way to validate or verify data in the column.
John
September 8, 2015 at 8:16 am
BWFC (9/8/2015)
SELECT
ltrim(right(postalcode,charindex(' ',reverse(postalcode),PATINDEX('% %',reverse(postalcode))+1)))
FROM yourdata
I've tested this over 14500 county+[space]+postcode combinations and the only ones it didn't work on were cases where there was no space in the postcode on our data.
Edited to simplify and clarify
If you're dealing with multiple countries, you might have to include the country in the CASE statement, for example:
CASE
WHEN COUNTRY IN ('Country1', 'Country2', ...) AND PostalCode ..... THEN ...
WHEN COUNTRY IN ('Country10', 'Country1', ...) AND PostalCode ..... THEN ...
ELSE ... END
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
September 8, 2015 at 8:19 am
John Mitchell-245523 (9/8/2015)
If it's an option for you, concentrate on cleaning the data. You're always going to find edge cases that you hadn't accounted for - maybe the two parts of the postcode separated by two spaces or by no spaces, or perhaps the second part is missing altogether. Once you've cleaned the data, make sure that input is validated. It may be worth investing in an address database from Royal Mail. If you can't do that, regular expressions[/url] might be a good way to validate or verify data in the column.John
Regular Expressions in TSQL? I'll be happy when Microsoft makes that possible without using CLR.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
September 8, 2015 at 10:46 am
Hi All,
For now I am unable to "clean" any of these records as doing so will cause problems elsewhere.
Thanks for the suggestions. They assisted me in finding a solution:-
CASE WHEN postalcode NOT LIKE '_[a-Z]_[a-Z]%'
THEN LTRIM(postalcode)
ELSE RIGHT(RTRIM(postalcode),8) END AS 'Corrected Postcode'
For future reference for others who come across this issue, please see the following:-
https://www.mrs.org.uk/pdf/postcodeformat.pdf
Kind regards
September 9, 2015 at 3:00 am
I don't know what your data looks like but that will only work for an eight character postcode. If you take the rightmost eight characters of your string you will take the last character of the preceding string followed by a space if you have a six character postcode and the preceding space if you have a seven character postcode.
with example(postalcode) as
(
select 'Hertfordshire AL1 3JW' union all
select 'West Midlands B4 6BJ' union all
select 'Somerset BS15 4PP'
)
select
CASE WHEN postalcode NOT LIKE '_[a-Z]_[a-Z]%'
THEN LTRIM(postalcode)
ELSE RIGHT(RTRIM(postalcode),8) END AS 'Corrected Postcode'
from example
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply