October 24, 2024 at 11:02 am
Hi there,
This was posted in the wrong forum, hence duplicating here. I have SQL version 2019 (thanks to Phil, Scott and Steve for their previous inputs).
I've tried CROSS APPLY, PATINDEX and many other functions, but can't nail this.
For each record, I want to extract all numbers which follow a '#' and then create a new row for each.
Example String 1: "Hello world. #1234 has been replaced by #014521"
To return:
1234
014521
Example String 2: "#687459"
To return:
687459
If there is no '#', then return blank.
Thanks in advance. 🙂
October 24, 2024 at 5:36 pm
Post got submitted twice. Sorry.
-- Itzik Ben-Gan 2001
October 24, 2024 at 5:36 pm
For this you can use a "splitter" (AKA "tokenizer" function.)
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521';
SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(@String,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';
Returns:
SomeNbr
----------
1234
014521
Against a table it would look like this:
DECLARE @Strings TABLE (StringID INT IDENTITY, SomeString VARCHAR(1000));
INSERT @Strings(SomeString)
VALUES
('Hello world. #1234 has been replaced by #014521'),
('Numbers, numbers... #012 #999 #Numbers'),
('The quick brown fox...'),
('#000 Another row... #123') ;
SELECT
StringID = s.StringID,
SomeNbr = REPLACE(split.[value],'#','')
FROM @Strings AS s
CROSS APPLY STRING_SPLIT(s.SomeString,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';
This returns:
StringID SomeNbr
----------- -----------
1 1234
1 014521
2 012
2 999
4 000
4 123
This gets us what we need except for "If there is no '#', then return blank." For the blanks we'll need to push our logic into a subquery and leverage OUTER APPLY like so:
SELECT
StringID = s.StringID,
SomeNbr = ISNULL(split.SomeNbr,'')
FROM @Strings AS s
OUTER APPLY
(
SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(s.SomeString,' ') AS split
WHERE split.[value] LIKE '#[0-9]%'
) AS split;
Returns:
StringID SomeNbr
----------- -----------
1 1234
1 014521
2 012
2 999
4 000
4 123
-- Itzik Ben-Gan 2001
October 25, 2024 at 9:45 am
Hi Alan,
Many thanks for the code you have provided. I am still evaluating how to apply this to an existing table, but I do have an observation. If there is no space between the number and '#', I get this:-
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521#2345';
SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(@String,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';
Result:
1234
0145212345
Should be:
1234
014521
2345
Also, if I declare this string with parenthesis around a number, it give me just the first occurrence of a #, 1234:-
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by (#014521) (#2345)
Result:
1234
Thank you again! 🙂
p.s. How do I avoid the extra spaces between each line of text? lol
October 25, 2024 at 7:47 pm
Or what's the question? It seems the topic's scope has crept after solutions were posted
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
October 25, 2024 at 10:56 pm
It's too difficult to answer. I am a new learner.
October 25, 2024 at 10:57 pm
It is too difficult for me.
October 30, 2024 at 9:09 pm
UPDATED 10/31/2024
The was an error in my code, I just changed "bernieML.samd.ngrams8K" to "dbo.ngrams8K". I was using code from my own DB (BernieML) with a different schema (samd).
Sorry for the late reply - I didn't get the email saying there were replies...
To deal with cases where there aren't spaces before "#" or situations like (#123), or (for fun) numbers without preceding #'s, we can do this:
SELECT
Item = SUBSTRING(split.[value],1,IIF(i.Pos=0,8000,i.Pos-1))
FROM STRING_SPLIT(@String,'#') AS split
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',split.[Value]))) AS i(Pos)
WHERE split.[value] LIKE '[0-9]%';
On a pre-2019 system I would recommend delimitedSplit8K. The solution would look like this:
SELECT
Item = SUBSTRING(split.[value],1,IIF(i.Pos=0,8000,i.Pos-1))
FROM STRING_SPLIT(@String,'#') AS split
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',split.[Value]))) AS i(Pos)
WHERE split.[value] LIKE '[0-9]%';
This parameter value includes each of the aforementioned issues, and will still be handled correctly:
DECLARE @String VARCHAR(8000) =
'Emp 223 says: (#1234) has been replaced by #014521#2345. Please call me at 555-333-1234.';
For customized split requirements I use NGrams8K. Note the code below and results.
DECLARE @String VARCHAR(8000) =
'Emp 223 says: (#1234) has been replaced by #014521#2345. Please call me at 555-333-1234.';
;--==== (1) Get the position of each delimiter (ng.Position)
SELECT ng.Position
FROM dbo.NGrams8k(@String,1) AS ng
WHERE ng.Token = '#';
;--==== (2) Build out suffixes from each delimiter position (s.Suffix)
SELECT
ng.Position,
s.Suffix
FROM dbo.NGrams8k(@String,1) AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
WHERE ng.Token = '#';
;--==== (3) Find the end position for each item (i.Pos)
SELECT
ng.Position,
s.Suffix,
i.Pos
FROM dbo.NGrams8k(@String,1) AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',s.Suffix))) AS i(Pos)
WHERE ng.Token = '#';
;--==== (4) use s.Suffix and i.Pos to build the "Item"
SELECT
Item = SUBSTRING(s.Suffix,1,IIF(i.Pos=0,8000,i.Pos-1))
FROM dbo.NGrams8k(@String,1) AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',s.Suffix))) AS i(Pos)
WHERE ng.Token = '#';
Results from each:
-- Itzik Ben-Gan 2001
October 31, 2024 at 1:16 am
Alan... you might want to explain what the bernieML.samd. stuff is.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2024 at 1:42 am
I am still evaluating how to apply this to an existing table..
And there's the rub. Please provide 5 to 10 rows of readily consumable data for use to operate with. Please see the article at the first link in my signature line below for one way to provide such a thing.
In the meantime, the following works for the variable version...
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521#2345';
SELECT Number = TRIM(sp2.Number)
FROM STRING_SPLIT(@String,'#') sp1
CROSS APPLY (SELECT value FROM STRING_SPLIT(sp1.value,' '))sp2(Number)
WHERE sp2.Number LIKE ('[0-9]%')
;
Here's the result...
This would make a good, fairly high performance iTVF (inline Table Valued Function).
Looking forward to some readily consumable data to take it there.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2024 at 3:59 pm
Alan... you might want to explain what the bernieML.samd. stuff is.
Thanks Jeff - I fixed my code. I was using my own DB and forgot to remove the DB.Schema reference. It's been a couple years since I posted here - rust and lack of coffee 😉
-- Itzik Ben-Gan 2001
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply