Extract Blocks of Number

  • Hi,

    I'm trying to extract blocks of numbers from a column in SQL. I've been looking online for a while and I've tried using a few different functions to get the outcome but none of them solve my problem.

    I want to be able to use a function or similar to separate out numbers that are 11 digits in length and start with 07. So as an example;

    Declare @Temp Table(Data VarChar(8000))

    Insert Into @Temp Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999')

    I want the outcome to be 07999999999 07888888888. In functions that i've seen online they can extract numbers generally not specifically beginning with 07 or 11 digits in length and when they output it doesn't have a space between them.

    Thank you in advance for help that anyone can offer.

    P

  • As you did not provide detailed DDL, INSERT and desired results, I've had to guess. Does this help?

    DROP TABLE IF EXISTS #t1;

    CREATE TABLE #t1
    (
    SomeText VARCHAR(500)
    );

    INSERT #t1
    (
    SomeText
    )
    VALUES
    ('07111111111,smdcvsd lsdlhkj sflh sfljkh ')
    ,('dflgkjhdf godhg 07222222222,smdcvsd lsdlhkj sflh sfljkh ')
    ,('no numbers here');

    SELECT t.SomeText
    ,Extracted = IIF(pos.StartPos > 0, SUBSTRING(t.SomeText, pos.StartPos, 11), '')
    FROM #t1 t
    CROSS APPLY
    (
    SELECT StartPos = PATINDEX('%07[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', t.SomeText)
    ) pos;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Phil,

    That does work for extracting one number but I was hoping there is a solution if the column contains multiple numbers begining with 07.

    Can you think of a way?

    Thanks again for your time and help.

    P

  • How many occurrences do you wish to handle? What do you want the output to look like?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • A recursive CTE will allow you to select more than one number from a line

    DROP TABLE #t1
    go

    CREATE TABLE #t1
    (
    SomeText VARCHAR(500)
    );

    INSERT #t1
    (
    SomeText
    )
    VALUES
    ('07111111111,smdcvsd lsdlhkj sflh sfljkh ')
    ,('two numbers my name is john 07999999999 smith 07888888888 this last number doesnt count 999')
    ,('at end 07999999999')
    ,('not long enough 0711111111 smith')
    ,('not long enough at end 0711111111')
    ,('too long 072222222222 smith')
    ,('too long at end 072222222222')
    ,('dflgkjhdf godhg ,smdcvsd lsdlhkj sflh sfljkh 073333333333')
    ,('no numbers here');
    ;WITH CTE AS(
    SELECT t.SomeText
    ,Extracted = IIF(pos.StartPos > 0, SUBSTRING(t.SomeText, pos.StartPos, 11), '')
    ,Remaining = IIF(pos.StartPos > 0, SUBSTRING(t.SomeText, pos.StartPos+11, 8000 ), '')
    FROM #t1 t
    CROSS APPLY (SELECT StartPos = PATINDEX('%[^0-9]07[^0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', ' ' + t.SomeText+' ')) pos
    UNION ALL
    SELECT t.SomeText
    ,Extracted = IIF(pos.StartPos > 0, SUBSTRING(t.Remaining, pos.StartPos, 11), '')
    ,Remaining = IIF(pos.StartPos > 0, SUBSTRING(t.Remaining, pos.StartPos+11, 8000 ), '')
    FROM CTE t
    CROSS APPLY (SELECT StartPos = PATINDEX('%[^0-9]07[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%',' '+ t.Remaining+' ')) pos
    WHERE Len(Remaining)>0
    )
    select SomeText,Extracted
    from cte
    -- where extracted <> ''

    • This reply was modified 5 years ago by  Jonathan AC Roberts. Reason: Changed PATINDEX('%[^0-9]07[^0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', ' ' + t.SomeText+' ')
  • Hi,

    There is a maximum of three occurrences.

    I want the output to be the numbers separated by a comma.

    07999999999, 07888888888

    Thanks,

    Paul

  • dramaqueen wrote:

    Hi,

    There is a maximum of three occurrences.

    I want the output to be the numbers separated by a comma.

    07999999999, 07888888888

    Thanks,

    Paul

    Once you have the numbers in separate rows you can make them into a CSV with FOR XML PATH with a STUFF. There are plenty of examples of how to do this on SqlServerCentral  or a quick internet search will find some.

    https://stackoverflow.com/questions/31211506/how-stuff-and-for-xml-path-work-in-sql-server

  • Thank you Phil and Jonathan.

  • Grab a copy of NGrams8k and you can do this:

    Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))

    Insert @Temp ([Data])
    Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
    ('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
    ;

    SELECT
    t.SomeId, ng.position, ng.token
    FROM @Temp AS t
    CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
    WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))

    Easy peasy!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan Burstein wrote:

    Grab a copy of NGrams8k and you can do this:

    Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))

    Insert @Temp ([Data])
    Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
    ('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
    ;

    SELECT
    t.SomeId, ng.position, ng.token
    FROM @Temp AS t
    CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
    WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))

    Easy peasy!

    If she changes the WHERE to:

    WHERE ' ' + ng.token + ' ' LIKE CONCAT('[^0-9]07',REPLICATE('[0-9]',9),'[^0-9]')

    it will eliminate numbers like 307999999999 and numbers that are too long.

    Link to NGrams8k: https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams

  • Jonathan AC Roberts wrote:

    Alan Burstein wrote:

    Grab a copy of NGrams8k and you can do this:

    Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))

    Insert @Temp ([Data])
    Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
    ('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
    ;

    SELECT
    t.SomeId, ng.position, ng.token
    FROM @Temp AS t
    CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
    WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))

    Easy peasy!

    If she changes the WHERE to:

    WHERE ' ' + ng.token + ' ' LIKE CONCAT('[^0-9]07',REPLICATE('[0-9]',9),'[^0-9]')

    it will eliminate numbers like 307999999999 and numbers that are too long.

    Link to NGrams8k: https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams

    Well Played!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan Burstein wrote:

    Jonathan AC Roberts wrote:

    Alan Burstein wrote:

    Grab a copy of NGrams8k and you can do this:

    Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))

    Insert @Temp ([Data])
    Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
    ('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
    ;

    SELECT
    t.SomeId, ng.position, ng.token
    FROM @Temp AS t
    CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
    WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))

    Easy peasy!

    If she changes the WHERE to:

    WHERE ' ' + ng.token + ' ' LIKE CONCAT('[^0-9]07',REPLICATE('[0-9]',9),'[^0-9]')

    it will eliminate numbers like 307999999999 and numbers that are too long.

    Link to NGrams8k: https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams

    Well Played!

    Thank you Alan, I had to make the same change to my code too.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply