Splitting postcodes

  • I have many post codes where there is no space between the first part and second part of the postcode. For example I have:

    GL34EJ should be GL3 4EJ

    WN22NH should be WN2 2NH

    DH62QU should be DH6 2QU

    UB40EA should be UB4 0EA

    DA14PE should be DA1 4PE

    WA34LQ should be WA3 4LQ

    etc...

    How can I achieve this?

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • This I've answered my own question with this:

    select postcode, LEFT(postcode,LEN(postcode)-3) + ' ' + RIGHT(postcode,3)

    from flatrecord_SemiPerm

    where PATINDEX('[A-Z][A-Z][0-9][0-9][0-9][A-Z]%', postcode) > 0

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (1/31/2013)


    This I've answered my own question with this:

    select postcode, LEFT(postcode,LEN(postcode)-3) + ' ' + RIGHT(postcode,3)

    from flatrecord_SemiPerm

    where PATINDEX('[A-Z][A-Z][0-9][0-9][0-9][A-Z]%', postcode) > 0

    What country are you doing? I'm going to assume UK, as that is where I'm from so it's where I understand the validation methods.

    What about B11AA ? Your code will filter that out as being incorrect, but it is in fact a valid Birmingham post-code (should read B1 1AA). Take a look at this --> http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom%5B/url%5D, you'll see some more examples of postcodes that you'll get rid of.


    --EDIT--


    Here's some sample data for valid formats of post codes in the UK. They're not technically valid, just the correct formats 🙂

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    SET NOCOUNT ON;

    WITH t1(N) AS (SELECT 1 UNION ALL SELECT 1),

    t2(N) AS (SELECT 1 FROM t1 x, t1 y),

    t3(N) AS (SELECT 1 FROM t2 x, t2 y),

    Tally(N) AS (SELECT TOP 51 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),

    Tally2(N) AS (SELECT TOP 26 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),

    letters(N) AS (SELECT LTRIM(RTRIM(SUBSTRING(c.poss,a.N,2)))+LTRIM(RTRIM(SUBSTRING(d.poss,b.N,1)))

    FROM Tally a

    CROSS JOIN Tally b

    CROSS APPLY (SELECT 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z') c(poss)

    CROSS APPLY (SELECT 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z') d(poss)),

    bignumbers(N) AS (SELECT TOP 100 CAST((ABS(CHECKSUM(NEWID())) % 99) + 1 AS VARCHAR(2)) FROM t3 x, t3 y),

    smallnumbers(N) AS (SELECT TOP 100 CAST((ABS(CHECKSUM(NEWID())) % 9) + 1 AS VARCHAR(1)) FROM t3 x, t3 y)

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID, a.N+b.N+c.N+d.N AS postCode

    INTO #testEnvironment

    FROM letters a

    CROSS JOIN bignumbers b

    CROSS JOIN smallnumbers c

    CROSS JOIN letters d;

    There WHERE clause for this needs to look like this: -

    SELECT *

    FROM #testEnvironment

    WHERE (postCode LIKE '[A-Z][A-Z0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z0-9][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]');

    Again, this is not exactly correct. If you need valid post codes rather than just valid formats, then the WHERE clause gets rather complicated with lots of "OR" paths.

    So, how to we go about splitting it? Well, that's easy really.

    SELECT LEFT(postCode,LEN(postCode)-3) + ' ' + RIGHT(postCode,3)

    FROM #testEnvironment

    WHERE (postCode LIKE '[A-Z][A-Z0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z0-9][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]');

    What about the performance though?

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    SET NOCOUNT ON;

    WITH t1(N) AS (SELECT 1 UNION ALL SELECT 1),

    t2(N) AS (SELECT 1 FROM t1 x, t1 y),

    t3(N) AS (SELECT 1 FROM t2 x, t2 y),

    Tally(N) AS (SELECT TOP 51 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),

    Tally2(N) AS (SELECT TOP 26 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),

    letters(N) AS (SELECT LTRIM(RTRIM(SUBSTRING(c.poss,a.N,2)))+LTRIM(RTRIM(SUBSTRING(d.poss,b.N,1)))

    FROM Tally a

    CROSS JOIN Tally b

    CROSS APPLY (SELECT 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z') c(poss)

    CROSS APPLY (SELECT 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z') d(poss)),

    bignumbers(N) AS (SELECT TOP 100 CAST((ABS(CHECKSUM(NEWID())) % 99) + 1 AS VARCHAR(2)) FROM t3 x, t3 y),

    smallnumbers(N) AS (SELECT TOP 100 CAST((ABS(CHECKSUM(NEWID())) % 9) + 1 AS VARCHAR(1)) FROM t3 x, t3 y)

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID, a.N+b.N+c.N+d.N AS postCode

    INTO #testEnvironment

    FROM letters a

    CROSS JOIN bignumbers b

    CROSS JOIN smallnumbers c

    CROSS JOIN letters d;

    CREATE NONCLUSTERED INDEX nc_postCode ON #testEnvironment (postCode);

    DECLARE @HOLDER VARCHAR(10), @Duration CHAR(12), @StartTime DATETIME;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = LEFT(postCode,LEN(postCode)-3) + ' ' + RIGHT(postCode,3)

    FROM #testEnvironment

    WHERE (postCode LIKE '[A-Z][A-Z0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z0-9][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]' OR

    postCode LIKE '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]');

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration: %s',0,1,@Duration) WITH NOWAIT;

    Duration: 00:00:01:820


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply