November 21, 2019 at 1:00 pm
Hi,
I'm trying to extract blocks of numbers from a column in SQL. I've been looking online for a while and I've tried using a few different functions to get the outcome but none of them solve my problem.
I want to be able to use a function or similar to separate out numbers that are 11 digits in length and start with 07. So as an example;
Declare @Temp Table(Data VarChar(8000))
Insert Into @Temp Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999')
I want the outcome to be 07999999999 07888888888. In functions that i've seen online they can extract numbers generally not specifically beginning with 07 or 11 digits in length and when they output it doesn't have a space between them.
Thank you in advance for help that anyone can offer.
P
November 21, 2019 at 1:34 pm
As you did not provide detailed DDL, INSERT and desired results, I've had to guess. Does this help?
DROP TABLE IF EXISTS #t1;
CREATE TABLE #t1
(
SomeText VARCHAR(500)
);
INSERT #t1
(
SomeText
)
VALUES
('07111111111,smdcvsd lsdlhkj sflh sfljkh ')
,('dflgkjhdf godhg 07222222222,smdcvsd lsdlhkj sflh sfljkh ')
,('no numbers here');
SELECT t.SomeText
,Extracted = IIF(pos.StartPos > 0, SUBSTRING(t.SomeText, pos.StartPos, 11), '')
FROM #t1 t
CROSS APPLY
(
SELECT StartPos = PATINDEX('%07[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', t.SomeText)
) pos;
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 21, 2019 at 1:42 pm
Hi Phil,
That does work for extracting one number but I was hoping there is a solution if the column contains multiple numbers begining with 07.
Can you think of a way?
Thanks again for your time and help.
P
November 21, 2019 at 1:45 pm
How many occurrences do you wish to handle? What do you want the output to look like?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 21, 2019 at 1:53 pm
A recursive CTE will allow you to select more than one number from a line
DROP TABLE #t1
go
CREATE TABLE #t1
(
SomeText VARCHAR(500)
);
INSERT #t1
(
SomeText
)
VALUES
('07111111111,smdcvsd lsdlhkj sflh sfljkh ')
,('two numbers my name is john 07999999999 smith 07888888888 this last number doesnt count 999')
,('at end 07999999999')
,('not long enough 0711111111 smith')
,('not long enough at end 0711111111')
,('too long 072222222222 smith')
,('too long at end 072222222222')
,('dflgkjhdf godhg ,smdcvsd lsdlhkj sflh sfljkh 073333333333')
,('no numbers here');
;WITH CTE AS(
SELECT t.SomeText
,Extracted = IIF(pos.StartPos > 0, SUBSTRING(t.SomeText, pos.StartPos, 11), '')
,Remaining = IIF(pos.StartPos > 0, SUBSTRING(t.SomeText, pos.StartPos+11, 8000 ), '')
FROM #t1 t
CROSS APPLY (SELECT StartPos = PATINDEX('%[^0-9]07[^0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', ' ' + t.SomeText+' ')) pos
UNION ALL
SELECT t.SomeText
,Extracted = IIF(pos.StartPos > 0, SUBSTRING(t.Remaining, pos.StartPos, 11), '')
,Remaining = IIF(pos.StartPos > 0, SUBSTRING(t.Remaining, pos.StartPos+11, 8000 ), '')
FROM CTE t
CROSS APPLY (SELECT StartPos = PATINDEX('%[^0-9]07[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%',' '+ t.Remaining+' ')) pos
WHERE Len(Remaining)>0
)
select SomeText,Extracted
from cte
-- where extracted <> ''
November 21, 2019 at 1:54 pm
Hi,
There is a maximum of three occurrences.
I want the output to be the numbers separated by a comma.
07999999999, 07888888888
Thanks,
Paul
November 21, 2019 at 2:07 pm
Hi,
There is a maximum of three occurrences.
I want the output to be the numbers separated by a comma.
07999999999, 07888888888
Thanks,
Paul
Once you have the numbers in separate rows you can make them into a CSV with FOR XML PATH with a STUFF. There are plenty of examples of how to do this on SqlServerCentral or a quick internet search will find some.
https://stackoverflow.com/questions/31211506/how-stuff-and-for-xml-path-work-in-sql-server
November 21, 2019 at 2:10 pm
Thank you Phil and Jonathan.
November 21, 2019 at 2:54 pm
Grab a copy of NGrams8k and you can do this:
Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))
Insert @Temp ([Data])
Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
;
SELECT
t.SomeId, ng.position, ng.token
FROM @Temp AS t
CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))
Easy peasy!
-- Itzik Ben-Gan 2001
November 21, 2019 at 3:04 pm
Grab a copy of NGrams8k and you can do this:
Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))
Insert @Temp ([Data])
Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
;
SELECT
t.SomeId, ng.position, ng.token
FROM @Temp AS t
CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))Easy peasy!
If she changes the WHERE to:
WHERE ' ' + ng.token + ' ' LIKE CONCAT('[^0-9]07',REPLICATE('[0-9]',9),'[^0-9]')
it will eliminate numbers like 307999999999 and numbers that are too long.
Link to NGrams8k: https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams
November 21, 2019 at 3:29 pm
Alan Burstein wrote:Grab a copy of NGrams8k and you can do this:
Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))
Insert @Temp ([Data])
Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
;
SELECT
t.SomeId, ng.position, ng.token
FROM @Temp AS t
CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))Easy peasy!
If she changes the WHERE to:
WHERE ' ' + ng.token + ' ' LIKE CONCAT('[^0-9]07',REPLICATE('[0-9]',9),'[^0-9]')it will eliminate numbers like 307999999999 and numbers that are too long.
Link to NGrams8k: https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams
Well Played!
-- Itzik Ben-Gan 2001
November 21, 2019 at 3:49 pm
Jonathan AC Roberts wrote:Alan Burstein wrote:Grab a copy of NGrams8k and you can do this:
Declare @Temp Table(SomeId INT IDENTITY, [Data] VarChar(8000))
Insert @Temp ([Data])
Values('hello my name is john 07999999999 smith 07888888888 this last number doesnt count 999'),
('More numbers here: 07123456789,07987654321, 07555555555, 0798765432 (<<10 19278398127398729387129837129837')
;
SELECT
t.SomeId, ng.position, ng.token
FROM @Temp AS t
CROSS APPLY dbo.NGrams8k(t.[Data],11) AS ng
WHERE ng.token LIKE CONCAT('07',REPLICATE('[0-9]',9))Easy peasy!
If she changes the WHERE to:
WHERE ' ' + ng.token + ' ' LIKE CONCAT('[^0-9]07',REPLICATE('[0-9]',9),'[^0-9]')it will eliminate numbers like 307999999999 and numbers that are too long.
Link to NGrams8k: https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams
Well Played!
Thank you Alan, I had to make the same change to my code too.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply