From each string, extract numbers following a '#' and create separate row

Question

Post reply

From each string, extract numbers following a '#' and create separate row

SqlRookie

SSC Enthusiast

Points: 198
More actions
October 24, 2024 at 11:02 am

#4473138

Hi there,
This was posted in the wrong forum, hence duplicating here. I have SQL version 2019 (thanks to Phil, Scott and Steve for their previous inputs).
I've tried CROSS APPLY, PATINDEX and many other functions, but can't nail this.
For each record, I want to extract all numbers which follow a '#' and then create a new row for each.
Example String 1: "Hello world. #1234 has been replaced by #014521"
To return:
1234
014521
Example String 2: "#687459"
To return:
687459
If there is no '#', then return blank.
Thanks in advance. 🙂

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

Post got submitted twice. Sorry.

This reply was modified 2 months ago by Alan Burstein. Reason: Dupe

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 2

For this you can use a "splitter" (AKA "tokenizer" function.)

DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521';

SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(@String,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';

Returns:

SomeNbr
----------
1234
014521

Against a table it would look like this:

DECLARE @Strings TABLE (StringID INT IDENTITY, SomeString VARCHAR(1000));
INSERT @Strings(SomeString)
VALUES 
('Hello world. #1234 has been replaced by #014521'),
('Numbers, numbers... #012 #999 #Numbers'),
('The quick brown fox...'),
('#000 Another row... #123') ;

SELECT
StringID = s.StringID,
SomeNbr = REPLACE(split.[value],'#','')
FROM @Strings AS s
CROSS APPLY STRING_SPLIT(s.SomeString,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';

This returns:

StringID    SomeNbr
----------- -----------
1           1234
1           014521
2           012
2           999
4           000
4           123

This gets us what we need except for "If there is no '#', then return blank." For the blanks we'll need to push our logic into a subquery and leverage OUTER APPLY like so:

SELECT
  StringID = s.StringID,
  SomeNbr = ISNULL(split.SomeNbr,'')
FROM @Strings AS s
OUTER APPLY
(
  SELECT SomeNbr = REPLACE(split.[value],'#','')
  FROM STRING_SPLIT(s.SomeString,' ') AS split
  WHERE split.[value] LIKE '#[0-9]%'
) AS split;

Returns:

StringID   SomeNbr
----------- -----------
1           1234
1           014521
2           012
2           999
4           000
4           123

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

SqlRookie SSC Enthusiast Points: 198 More actions · Answer 3

Hi Alan,

Many thanks for the code you have provided. I am still evaluating how to apply this to an existing table, but I do have an observation. If there is no space between the number and '#', I get this:-

DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521#2345';

SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(@String,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';

Result:

1234
0145212345

Should be:

1234

014521

2345

Also, if I declare this string with parenthesis around a number, it give me just the first occurrence of a #, 1234:-

DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by (#014521) (#2345)

Result:

1234

Thank you again! 🙂

p.s. How do I avoid the extra spaces between each line of text? lol

This reply was modified 2 months ago by SqlRookie.
This reply was modified 2 months ago by SqlRookie.

Steve Collins SSCrazy Points: 2174 More actions · Answer 4

Or what's the question? It seems the topic's scope has crept after solutions were posted

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

PangyuSong SSC Enthusiast Points: 114 More actions · Answer 5

PangyuSong

SSC Enthusiast

Points: 114

October 25, 2024 at 10:56 pm

#4473878

It's too difficult to answer. I am a new learner.

PangyuSong SSC Enthusiast Points: 114 More actions · Answer 6

PangyuSong

SSC Enthusiast

Points: 114

October 25, 2024 at 10:57 pm

#4473879

It is too difficult for me.

Alan Burstein SSC Guru Points: 61141 More actions · Answer 7

UPDATED 10/31/2024

The was an error in my code, I just changed "bernieML.samd.ngrams8K" to "dbo.ngrams8K". I was using code from my own DB (BernieML) with a different schema (samd).

@sqlrookie-2

Sorry for the late reply - I didn't get the email saying there were replies...

To deal with cases where there aren't spaces before "#" or situations like (#123), or (for fun) numbers without preceding #'s, we can do this:

SELECT
  Item = SUBSTRING(split.[value],1,IIF(i.Pos=0,8000,i.Pos-1))
FROM        STRING_SPLIT(@String,'#')                    AS split
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',split.[Value]))) AS i(Pos)
WHERE       split.[value] LIKE '[0-9]%';

On a pre-2019 system I would recommend delimitedSplit8K. The solution would look like this:

SELECT
  Item = SUBSTRING(split.[value],1,IIF(i.Pos=0,8000,i.Pos-1))
FROM        STRING_SPLIT(@String,'#')                    AS split
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',split.[Value]))) AS i(Pos)
WHERE       split.[value] LIKE '[0-9]%';

This parameter value includes each of the aforementioned issues, and will still be handled correctly:

DECLARE @String VARCHAR(8000) =
'Emp 223 says: (#1234) has been replaced by #014521#2345. Please call me at 555-333-1234.';

For customized split requirements I use NGrams8K. Note the code below and results.

DECLARE @String VARCHAR(8000) =
  'Emp 223 says: (#1234) has been replaced by #014521#2345. Please call me at 555-333-1234.';

;--==== (1) Get the position of each delimiter (ng.Position)
SELECT ng.Position
FROM   dbo.NGrams8k(@String,1) AS ng
WHERE  ng.Token = '#';

;--==== (2) Build out suffixes from each delimiter position (s.Suffix)
SELECT
  ng.Position,
  s.Suffix
FROM        dbo.NGrams8k(@String,1)               AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
WHERE       ng.Token = '#';

;--==== (3) Find the end position for each item (i.Pos)
SELECT
  ng.Position,
  s.Suffix,
  i.Pos
FROM        dbo.NGrams8k(@String,1)               AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',s.Suffix)))         AS i(Pos)
WHERE       ng.Token = '#';

;--==== (4) use s.Suffix and i.Pos to build the "Item"
SELECT
  Item = SUBSTRING(s.Suffix,1,IIF(i.Pos=0,8000,i.Pos-1))
FROM        dbo.NGrams8k(@String,1)                        AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',s.Suffix)))         AS i(Pos)
WHERE       ng.Token = '#';

Results from each:

This reply was modified 1 month, 3 weeks ago by Alan Burstein.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 8

Alan... you might want to explain what the bernieML.samd. stuff is.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 9

SqlRookie wrote:

I am still evaluating how to apply this to an existing table..

And there's the rub. Please provide 5 to 10 rows of readily consumable data for use to operate with. Please see the article at the first link in my signature line below for one way to provide such a thing.

In the meantime, the following works for the variable version...

DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521#2345';

 SELECT Number = TRIM(sp2.Number)
   FROM STRING_SPLIT(@String,'#') sp1
  CROSS APPLY (SELECT value FROM STRING_SPLIT(sp1.value,' '))sp2(Number)
  WHERE sp2.Number LIKE ('[0-9]%')
;

Here's the result...

This would make a good, fairly high performance iTVF (inline Table Valued Function).

Looking forward to some readily consumable data to take it there.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Alan Burstein SSC Guru Points: 61141 More actions · Answer 10

Jeff Moden wrote:

Alan... you might want to explain what the bernieML.samd. stuff is.

Thanks Jeff - I fixed my code. I was using my own DB and forgot to remove the DB.Schema reference. It's been a couple years since I posted here - rust and lack of coffee 😉

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001