EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf
In the above string, i would like to get as below like alphabet and '_', i tried to use the function fn_GetAlphabetsOnly but getting only alphabet.
EMA_MRNMM_PMS_PID_OR_Driver_Licensefrom
July 12, 2022 at 6:58 pm
We don't have your fn_GetAlphabetsOnly function -- that's a user-defined function.
The following returns EMA_T__MRNMM_PMSPAT_PID_OR_Dr:
DECLARE @string VARCHAR(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Dr'
SELECT @string,REPLACE(@string,'%[^0-9]%',''), REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','');
Note: Your sample results don't seem to reflect the fact that eliminating numeric digits could result in two or more consecutive underscore characters. If you're actually trying to consolidate those, that's another issue.
See also https://www.sqlservercentral.com/scripts/patreplace8k
July 12, 2022 at 7:27 pm
Just to be sure. You want to?
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_GetAlphabetsAndUnderscoresOnly]
(
@string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @invalidCharLocation SMALLINT
SET @string = LEFT(@string, CHARINDEX('.', @string + '.') - 1)
SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
WHILE @invalidCharLocation > 0
BEGIN
SET @string = STUFF(@string, @invalidCharLocation, 1, '')
SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
END --WHILE
SET @string = @string
RETURN @string
END --FUNCTION
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 13, 2022 at 12:01 pm
Instead of a WHILE loop you could try a numbers table or tally function approach
declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Dr';
select string_agg(v.chr, '') within group (order by fn.n) az_only
from dbo.fnTally(1, len(@string)) fn
cross apply (values (substring(@string, fn.n, 1))) v(chr)
where v.chr like '[A-Za-z_]';
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
July 13, 2022 at 3:41 pm
With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance. There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 13, 2022 at 3:48 pm
Thank you, So much!!
July 13, 2022 at 3:51 pm
Thank you so much, Scott!! Really appreciate you help..
July 13, 2022 at 3:52 pm
With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance. There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.
Yes, had it produced the desired outcome. 🙂
July 13, 2022 at 5:42 pm
I thought later of one possible correction to removing the file extension, just in case two or more periods (.) appear in the value.
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[fn_GetAlphabetsAndUnderscoresOnly]
(
@string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @invalidCharLocation SMALLINT
/* strip file extension from name */
SET @string = LEFT(@string, LEN(@string) - CHARINDEX('.', REVERSE('.' + @string)))
SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
WHILE @invalidCharLocation > 0
BEGIN
SET @string = STUFF(@string, @invalidCharLocation, 1, '')
SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
END --WHILE
SET @string = @string
RETURN @string
END --FUNCTION
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 13, 2022 at 11:00 pm
Jeff Moden wrote:With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance. There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.
Yes, had it produced the desired outcome. 🙂
Agreed. In fact, none of the code posted above produces the actual outcome posted by the OP. Not even the selected "Correct" answer.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 13, 2022 at 11:11 pm
@dba.k ,
Look again. None of the code posted actually produces the desired output that you posted. "Must look eye".
--Jeff Moden
Change is inevitable... Change for the better is not.
July 14, 2022 at 3:17 am
Afaik this actually produces the desired output. Should work with SQL 2014. It uses the ordinal splitter DelimitedSplit8K_Lead. The query: splits the input string on '_' underscore, finds the offset location of the first integer within the split string, substrings to extract the lead characters, and removes the file extension. Then it concatenates the calculated strings 'string_wo_ext' separated by underscores using the FOR XML trick.
declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf';
select stuff((select '_' + calc.string_wo_ext
from dbo.DelimitedSplit8K_Lead(@string, '_') ds
cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
where Item like '[A-Za-z_]%'
order by ds.ItemNumber
for xml path('')), 1, 1, '') string_val;
CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
July 14, 2022 at 4:28 am
Afaik this actually produces the desired output. Should work with SQL 2014. It uses the ordinal splitter DelimitedSplit8K_Lead. The query: splits the input string on '_' underscore, finds the offset location of the first integer within the split string, substrings to extract the lead characters, and removes the file extension. Then it concatenates the calculated strings 'string_wo_ext' separated by underscores using the FOR XML trick.
declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf';
select stuff((select '_' + calc.string_wo_ext
from dbo.DelimitedSplit8K_Lead(@string, '_') ds
cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
where Item like '[A-Za-z_]%'
order by ds.ItemNumber
for xml path('')), 1, 1, '') string_val;CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;
A couple of potential issues: if there's an additional period (.) in the file name and/or an underscore in the extension.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 14, 2022 at 12:59 pm
A couple of potential issues: if there's an additional period (.) in the file name and/or an underscore in the extension.
At least it produces the correct output when using the given data. 😉 And, it doesn't use a While loop inside of a scalar function, which will prevent parallelism.
Also, if you RUN at the following, you'll find that an extra period in the file name doesn't seem to cause a problem but the extra underscore in the extension is a problem. Here's the example with Steve's code.
declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT0000.01287_PID15235307_OR_Driver_Licensefrom.p_df';
select stuff((select '_' + calc.string_wo_ext
from dbo.DelimitedSplit8K_Lead(@string, '_') ds
cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
where Item like '[A-Za-z_]%'
order by ds.ItemNumber
for xml path('')), 1, 1, '') string_val;
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply