Please help to get, Alphabet with underscore i.e special charecter

Question

Post reply

Please help to get, Alphabet with underscore i.e special charecter

DBA.k

Ten Centuries

Points: 1113
More actions
July 12, 2022 at 6:22 pm
Go to Answer
#4062300

EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf
In the above string, i would like to get as below like alphabet and '_', i tried to use the function fn_GetAlphabetsOnly but getting only alphabet.
EMA_MRNMM_PMS_PID_OR_Driver_Licensefrom

ScottPletcher

SSC Guru

Points: 101093

July 12, 2022 at 11:18 pm

Answer

#4062446

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_GetAlphabetsAndUnderscoresOnly]
(
    @string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @invalidCharLocation SMALLINT
SET @string = LEFT(@string, CHARINDEX('.', @string + '.') - 1)
SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
WHILE @invalidCharLocation > 0
BEGIN
    SET @string = STUFF(@string, @invalidCharLocation, 1, '')
    SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
END --WHILE
SET @string = @string
RETURN @string
END --FUNCTION

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

ratbak Ten Centuries Points: 1358 More actions · Answer 1

We don't have your fn_GetAlphabetsOnly function -- that's a user-defined function.

The following returns EMA_T__MRNMM_PMSPAT_PID_OR_Dr:

DECLARE @string VARCHAR(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Dr'
SELECT @string,REPLACE(@string,'%[^0-9]%',''), REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','');

Note: Your sample results don't seem to reflect the fact that eliminating numeric digits could result in two or more consecutive underscore characters. If you're actually trying to consolidate those, that's another issue.

See also https://www.sqlservercentral.com/scripts/patreplace8k

kaj SSCommitted Points: 1605 More actions · Answer 2

Just to be sure. You want to?

Locate the first numeric character
Strip everything from the first numeric character up to the next underscore
Repeat from step 1 using the current underscore as the starting position
Return the stripped-down string

Steve Collins SSCrazy Points: 2157 More actions · Answer 3

Instead of a WHILE loop you could try a numbers table or tally function approach

declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Dr';

select string_agg(v.chr, '') within group (order by fn.n) az_only
from dbo.fnTally(1, len(@string)) fn
     cross apply (values (substring(@string, fn.n, 1))) v(chr)
where v.chr like '[A-Za-z_]';

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 4

With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance. There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DBA.k Ten Centuries Points: 1113 More actions · Answer 5

DBA.k

Ten Centuries

Points: 1113

July 13, 2022 at 3:48 pm

#4062833

Thank you, So much!!

DBA.k Ten Centuries Points: 1113 More actions · Answer 6

Thank you so much, Scott!! Really appreciate you help..

kaj SSCommitted Points: 1605 More actions · Answer 7

Jeff Moden wrote:

With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance. There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.

Yes, had it produced the desired outcome. 🙂

ScottPletcher SSC Guru Points: 101093 More actions · Answer 8

I thought later of one possible correction to removing the file extension, just in case two or more periods (.) appear in the value.

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
ALTER  FUNCTION [dbo].[fn_GetAlphabetsAndUnderscoresOnly]
(
    @string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @invalidCharLocation SMALLINT
/* strip file extension from name */
SET @string = LEFT(@string, LEN(@string) - CHARINDEX('.', REVERSE('.' + @string)))
SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
WHILE @invalidCharLocation > 0
BEGIN
    SET @string = STUFF(@string, @invalidCharLocation, 1, '')
    SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
END --WHILE
SET @string = @string
RETURN @string
END --FUNCTION

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 9

kaj wrote:

Jeff Moden wrote:
With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance. There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.
Yes, had it produced the desired outcome. 🙂

Agreed. In fact, none of the code posted above produces the actual outcome posted by the OP. Not even the selected "Correct" answer.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 10

@dba.k ,

Look again. None of the code posted actually produces the desired output that you posted. "Must look eye".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2157 More actions · Answer 11

Afaik this actually produces the desired output. Should work with SQL 2014. It uses the ordinal splitter DelimitedSplit8K_Lead. The query: splits the input string on '_' underscore, finds the offset location of the first integer within the split string, substrings to extract the lead characters, and removes the file extension. Then it concatenates the calculated strings 'string_wo_ext' separated by underscores using the FOR XML trick.

declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf';

select stuff((select '_' + calc.string_wo_ext
              from dbo.DelimitedSplit8K_Lead(@string, '_') ds
                   cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
                   cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
                   cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
              where Item like '[A-Za-z_]%'
              order by ds.ItemNumber 
              for xml path('')), 1, 1, '') string_val;

CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT t.N+1
                   FROM cteTally t
                  WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) 
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
   FROM cteStart s
;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

ScottPletcher SSC Guru Points: 101093 More actions · Answer 12

Steve Collins wrote:

Afaik this actually produces the desired output. Should work with SQL 2014. It uses the ordinal splitter DelimitedSplit8K_Lead. The query: splits the input string on '_' underscore, finds the offset location of the first integer within the split string, substrings to extract the lead characters, and removes the file extension. Then it concatenates the calculated strings 'string_wo_ext' separated by underscores using the FOR XML trick.

declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf';

select stuff((select '_' + calc.string_wo_ext
              from dbo.DelimitedSplit8K_Lead(@string, '_') ds
                   cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
                   cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
                   cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
              where Item like '[A-Za-z_]%'
              order by ds.ItemNumber 
              for xml path('')), 1, 1, '') string_val;

CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT t.N+1
                   FROM cteTally t
                  WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) 
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
   FROM cteStart s
;

A couple of potential issues: if there's an additional period (.) in the file name and/or an underscore in the extension.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 13

ScottPletcher wrote:

A couple of potential issues: if there's an additional period (.) in the file name and/or an underscore in the extension.

At least it produces the correct output when using the given data. 😉 And, it doesn't use a While loop inside of a scalar function, which will prevent parallelism.

Also, if you RUN at the following, you'll find that an extra period in the file name doesn't seem to cause a problem but the extra underscore in the extension is a problem. Here's the example with Steve's code.

declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT0000.01287_PID15235307_OR_Driver_Licensefrom.p_df';

select stuff((select '_' + calc.string_wo_ext
              from dbo.DelimitedSplit8K_Lead(@string, '_') ds
                   cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
                   cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
                   cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
              where Item like '[A-Za-z_]%'
              order by ds.ItemNumber 
              for xml path('')), 1, 1, '') string_val;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)