get a substring value from string

Question

Post reply

get a substring value from string

sonika2015

SSC Enthusiast

Points: 172
More actions
December 14, 2022 at 4:10 pm

#4125562

Hi,
I am new to SQL Server.
I have table
create table tmp1
(
serialnumber varchar(500)
)
insert into tmp1 values('1232132|2343345435|34543534534534|3344562534'),('23243|23423432432|34543534534235|3344562534')
select * from tmp1
I need to fetch the value before the last pipe symbol
the output would be as below
34543534534534
34543534534235
anyone can help

December 15, 2022 at 6:59 am

This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182943 More actions · Answer 1

Quick question, are you consistently looking for the third value in the string?

Here is a quick suggestion, mind you that there is no guarantee that the ordinals are correct

USE TEEST;
GO
SET NOCOUNT ON;
GO
DECLARE @TMP table 
(
serialnumber varchar(500)
);
INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
                      ,('23243|23423432432|34543534534235|3344562534');
;WITH BASE_SPLIT AS
(
    SELECT
        SP.[value] AS ITEM_OF_INTEREST
       ,ROW_NUMBER() OVER 
                (
                    PARTITION BY T.serialnumber
                    ORDER BY @@VERSION) AS RID
    FROM @TMP T
    CROSS APPLY string_split(T.serialnumber,'|') SP
)
SELECT
    BS.RID
   ,BS.ITEM_OF_INTEREST
FROM BASE_SPLIT BS
WHERE BS.RID = 3;

Result set

RID  ITEM_OF_INTEREST
---- -----------------
3    34543534534534
3    34543534534235

ScottPletcher SSC Guru Points: 101180 More actions · Answer 2

select *, 
    case when last_bar = 0 or next_to_last_bar = 0 then ''
         else SUBSTRING(serialnumber, LEN(serialnumber) - last_bar - next_to_last_bar + 2, next_to_last_bar - 1) end
from tmp1
cross apply (
    select charindex('|', reverse(serialnumber)) as last_bar
) as ca1
cross apply (
    select charindex('|', reverse(left(serialnumber, len(serialnumber)-last_bar))) as next_to_last_bar
) as ca2

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Ronzo Hall of Fame Points: 3917 More actions · Answer 3

SET NOCOUNT ON
;
DECLARE @TMP
TABLE 
(
serialnumber
VARCHAR (500)
)
;
INSERT INTO
@TMP 
VALUES
('1232132|2343345435|34543534534534|3344562534')
,
('23243|23423432432|34543534534235|3344562534')
;
SELECT
Parsed.value
FROM
(
SELECT
*
FROM
@TMP
CROSS APPLY
string_split (serialnumber, '|', 1)
)
AS
Parsed
INNER JOIN 
(
SELECT
serialnumber
,
MAX (ordinal)
AS
MaxOrdinal
FROM
@TMP
CROSS APPLY
string_split (serialnumber, '|', 1)
GROUP BY
serialnumber
)
AS
EntryCount
ON
Parsed.serialnumber = EntryCount.serialnumber
AND
Parsed.ordinal = MaxOrdinal - 1
;

Have Fun!
Ronzo

Jeff Moden SSC Guru Points: 1004561 More actions · Answer 4

The new STRING_SPLIT with the 3rd operand is great for this. The old STRING_SPLIT without it is not. Even MS has stated that the order of the output is NOT guaranteed in any way, shape, or form. It would be better to use multiple CHARINDEXes to get the location of the 3rd string or use DelimitedSplit8K if you don't really have SQL 2022.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 5

-- Finding second last - problems if this should be more generel
DECLARE @Tmp TABLE 
(
SerialNoVARCHAR (500)
);

INSERT INTO @Tmp VALUES
('1232132|2343345435|34543534534534|3344562534'),
('23243|23423432432|34543534534235|3344562534'),
('23243|23|42343|2432|A3454335B|3344562534'),
('23|24|3|2342|3432432|X34545Y|3344562534');

WITH ReverseSN 
AS
(
SELECT REVERSE (SerialNo) AS SN
FROM @Tmp
),
SecondLast
AS
(
SELECT
CHARINDEX ('|', SN) + 1 AS f,
CHARINDEX ( '|', SN, CHARINDEX (SN, '|') - 1) AS s,
SUBSTRING (SN, CHARINDEX ('|', SN) + 1, 
CHARINDEX ('|', SN, CHARINDEX ('|', SN) + 1) - CHARINDEX ('|', SN)  - 1) AS ReverseSLast
FROM ReverseSN
)
SELECT REVERSE (ReverseSLast)
FROM SecondLast;
GO
-- A more general recursive solution

DECLARE @Tmp TABLE 
(
SerialNo VARCHAR (500)
);

INSERT INTO @Tmp VALUES
('1232132|2343345435|34543534534534|3344562534'),
('23243|23423432432|34543534534235|3344562534'),
('23243|23|42343|2432|A3454335B|3344562534'),
('23|24|3|2342|3432432|X34545Y|3344562534');

WITH Data 
AS
(
SELECTNEWID() AS ID,-- ID for each row
CONCAT (SerialNo, '|') AS SerialNo-- For stop
FROM @Tmp
),
Split
AS
(
SELECTID,
LEFT (SerialNo, CHARINDEX ('|', SerialNo) - 1) AS No,
1 AS Position,
RIGHT (SerialNo, LEN (SerialNo) - CHARINDEX ('|', SerialNo)) AS Rest
FROM Data
UNION ALL
SELECTID,
LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
Position + 1,
RIGHT (Rest, LEN (Rest) - CHARINDEX ('|', Rest))
FROM Split
WHERE LEN (Rest) > 1
),
SplitRes
AS
(
SELECT*,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
FROM Split
)
SELECT No
FROM SplitRes
WHERE RowNum = 2;-- 2 could be a parameter

This reply was modified 2 years, 3 months ago by carsten.saastamoinen.

Jeff Moden SSC Guru Points: 1004561 More actions · Answer 6

@carsten.saastamoinen

That "split" contains an incremental recursive CTE and it's going to be absolutely terrible for performance. See here...

https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 7

As expected!!!! On my desk it takes only 14 seconds to process 131,000 rows. Hardly not a statement that will be executed several times every minute or several times every hour. Then the values should be stored in a table and therefore only be execute ones using 14 seconds. But it is an absolutely terrible solution if, instead, a solution has to be made for the last number, another for the second last number, one for the third last, .... Solutions that will give very different times with that solution, as you suggest. The recursive solution will use the same time for all solutions and can even handle an unknown number of values in the string. Hopefully, your proposal will never need to be changed, so the same change should be made for all the similar solutions. Redundant code is as dangerous as redundant data.

Jonathan AC Roberts SSCoach Points: 18400 More actions · Answer 8

Jeff Moden wrote:

The new STRING_SPLIT with the 3rd operand is great for this. The old STRING_SPLIT without it is not. Even MS has stated that the order of the output is NOT guaranteed in any way, shape, or form. It would be better to use multiple CHARINDEXes to get the location of the 3rd string or use DelimitedSplit8K if you don't really have SQL 2022.

As long as last value is the third value they could install this function (with ordinal):

IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
    EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X') 
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
    @string    nvarchar(MAX), 
    @separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING 
AS RETURN
   WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
        Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
        T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
        Delim(Pos) AS (SELECT t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0),
        Separated(value,position) AS (SELECT SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END, 
                                                       LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL)) 
                                                       - CASE WHEN d.Pos=0 THEN 2-LEN(@separator+'x') 
                                                              ELSE d.Pos END - (LEN(@separator+'x')-1)),
                                             d.pos      
                                        FROM Delim d
                                       WHERE @string IS NOT NULL)
       SELECT s.value value,
              ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ordinal
         FROM Separated s
        WHERE s.value <> @separator;
GO

Then the following SQL will do the job:

-- Create table variable with data
DECLARE @TMP table 
(
    serialnumber varchar(500)
);
INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
                      ,('23243|23423432432|34543534534235|3344562534');

SELECT X.value 
  FROM @TMP t
 CROSS APPLY dbo.STRING_SPLIT(t.serialnumber,'|') X
 WHERE X.ordinal = 3

Jeff Moden SSC Guru Points: 1004561 More actions · Answer 9

carsten.saastamoinen wrote:

As expected!!!! On my desk it takes only 14 seconds to process 131,000 rows. Hardly not a statement that will be executed several times every minute or several times every hour. Then the values should be stored in a table and therefore only be execute ones using 14 seconds. But it is an absolutely terrible solution if, instead, a solution has to be made for the last number, another for the second last number, one for the third last, .... Solutions that will give very different times with that solution, as you suggest. The recursive solution will use the same time for all solutions and can even handle an unknown number of values in the string. Hopefully, your proposal will never need to be changed, so the same change should be made for all the similar solutions. Redundant code is as dangerous as redundant data.

Performance problem "expected". "Only 14 seconds on 131,ooo" rows.

Why not just do it the right way so that someone that looks at your code doesn't use it for something worse. It doesn't take any longer to do it right, especially in SQL Server 2022.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 10

Most people are still using a version earlier than 2022! So just using the 2022 version to solve this problem is not the right solution - for some developers it will takes year before they change to 2022.

If the value is to be used several times a minute, a solution will hopefully be prepared where the value is stored in a separate column. And then 131,000 rows per minute are unlikely to be inserted. So not a problem!!!

The task is not to find the third value, but the second last. A solution with CHARINDEX is excluded, as it is not a general solution. What about changes in the string to 5 values. It is not the solution of this kind of problems to simply work out a solution if there are 4, if there are 5, ... Redundant code is as dangerous as redundant data. As far as possible, general solutions should be prepared!

As I have previously proven, Tally is only the most efficient if a lot of data is processed and the plan is cached. Hardly two prerequisites that are met for this type of problem - as indicated earlier, the solution must probably be in another way, where the individual values are extracted at the time of insertion.

And when we get to version 2022, we can fortunately forget about Tally. So making a bad solution with Tally in 2022 soon 2023, where this outdated method is used, can hardly be called the correct solution!

Eirikur Eiriksson SSC Guru Points: 182943 More actions · Answer 11

Here is a slightly different approach that performs quite well given that only the two delimiter positions needed are used in the output.

USE TEEST;
GO
SET NOCOUNT ON;
GO
DECLARE @TMP table 
(
serialnumber varchar(500)
);
INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
                      ,('23243|23423432432|34543534534235|3344562534')
                      ,('23243|23423432432|34543534534236|3344562534')
                      ,('23243|23423432432|34543534534237|3344562534')
                      ,('23243|23423432432|34543534534238|3344562534')
                      ,('23243|23423432432|34543534534239|3344562534')
;

SELECT
    SUBSTRING(T.serialnumber,CI2.CI + 1,(CI3.CI - CI2.CI) -1)  AS ITEM_OF_INTEREST
FROM    @TMP    T
CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,         1) ) CI1(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,CI1.CI + 1) ) CI2(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,CI2.CI + 1) ) CI3(CI)
;

Result set

ITEM_OF_INTEREST
-----------------
  34543534534534
  34543534534235
  34543534534236
  34543534534237
  34543534534238
  34543534534239

Eirikur Eiriksson SSC Guru Points: 182943 More actions · Answer 12

Simple test harness to demonstrate the inefficiency of the string_split method

USE TEEST;
GO
SET NOCOUNT ON;
GO
DECLARE @SAMPLE_SIZE BIGINT      = 1000000;
DECLARE @STR_BUCKET  VARCHAR(20) = NULL;
DECLARE @TIMER  TABLE (
    T_TXT   VARCHAR(100)    NOT NULL
   ,T_TS    DATETIME2(7)    NOT NULL DEFAULT (SYSDATETIME())
   ,T_CPU   FLOAT           NOT NULL DEFAULT (@@CPU_BUSY * CONVERT(FLOAT,@@TIMETICKS,0))
   ,T_IDLE  FLOAT           NOT NULL DEFAULT (@@IDLE     * CONVERT(FLOAT,@@TIMETICKS,0))
   ,T_IO    FLOAT           NOT NULL DEFAULT (@@IO_BUSY  * CONVERT(FLOAT,@@TIMETICKS,0))
);
-- Uncomment the next line to skip generating the test set.
-- /*
IF OBJECT_ID('dbo.TBL_TEST_SPLIT') IS NOT NULL DROP TABLE dbo.TBL_TEST_SPLIT;
CREATE TABLE dbo.TBL_TEST_SPLIT
(
    T_STR   VARCHAR(250) NOT NULL
)
;
INSERT INTO @TIMER(T_TXT) VALUES ('TEST GEN');

;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
,  NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) 
               ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
,TEST_STRINGS(T_STR) AS 
(
    SELECT
        CONCAT
        (
            ABS(CHECKSUM(NEWID()))
           ,CHAR(124)
           ,ABS(CHECKSUM(NEWID()))
           ,CHAR(124)
           ,ABS(CHECKSUM(NEWID()))
           ,CHAR(124)
           ,ABS(CHECKSUM(NEWID()))
        )
    FROM NUMS NM
)
INSERT INTO dbo.TBL_TEST_SPLIT(T_STR)
SELECT
    TS.T_STR
FROM TEST_STRINGS TS;
INSERT INTO @TIMER(T_TXT) VALUES ('TEST GEN');
-- */
INSERT INTO @TIMER(T_TXT) VALUES ('DRY RUN');
SELECT
    @STR_BUCKET = TS.T_STR
FROM dbo.TBL_TEST_SPLIT TS;
INSERT INTO @TIMER(T_TXT) VALUES ('DRY RUN');

INSERT INTO @TIMER(T_TXT) VALUES ('STRING_SPLIT');
;WITH BASE_SPLIT AS
(
    SELECT
        SP.[value] AS ITEM_OF_INTEREST
       ,ROW_NUMBER() OVER 
                (
                    PARTITION BY TS.T_STR
                    ORDER BY @@VERSION) AS RID
    FROM dbo.TBL_TEST_SPLIT TS
    CROSS APPLY string_split(TS.T_STR,'|') SP
)
SELECT
     @STR_BUCKET = BS.ITEM_OF_INTEREST
FROM BASE_SPLIT BS
WHERE BS.RID = 3;
INSERT INTO @TIMER(T_TXT) VALUES ('STRING_SPLIT');

INSERT INTO @TIMER(T_TXT) VALUES ('CHARINDEX');
SELECT
    @STR_BUCKET = SUBSTRING(TS.T_STR,CI2.CI + 1,(CI3.CI - CI2.CI) -1) 
FROM    dbo.TBL_TEST_SPLIT TS
CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,         1) ) CI1(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,CI1.CI + 1) ) CI2(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,CI2.CI + 1) ) CI3(CI)
;
INSERT INTO @TIMER(T_TXT) VALUES ('CHARINDEX');

SELECT
    T.T_TXT
   ,MAX(T.T_CPU)  - MIN(T.T_CPU)              AS CPU
   ,MAX(T.T_IDLE) - MIN(T.T_IDLE)             AS IDLE
   ,MAX(T.T_IO) - MIN(T.T_IO)                 AS IO
   ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
FROM    @TIMER  T
GROUP BY T.T_TXT
ORDER BY DURATION ASC;

Result set (Gen11 i5 laptop)

T_TXT         CPU      IDLE      IO     DURATION
------------- -------- --------- ------ ---------
DRY RUN         93750    656250      0   102985
TEST GEN      1375000  10375000      0  1486256
CHARINDEX     3250000  24000000      0  3498358
STRING_SPLIT  6218750  45250000  31250  6586345

This is kind of obvious as any string-splitting method generates intermediate cardinality that is much greater than the charindex method.

Jonathan AC Roberts SSCoach Points: 18400 More actions · Answer 13

Here is a solution to get the value before the last pipe symbol, it will work on SQL Server 2012 or higher using this function: https://www.sqlservercentral.com/scripts/a-varcharmax-string_split-function-for-sql-2012-and-above-2, it is also faster than carsten.saastamoinen's solution.

SELECT REVERSE(X.value) ValueBeforeLastPipe
  FROM #TMP t
 CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
 WHERE X.ordinal = 2

Here is some code for a comparison which does the operation on 130k rows:

-- Finding second last
DROP TABLE IF EXISTS  #Tmp ;
create table #Tmp 
(
    SerialNumber VARCHAR (500)
);


INSERT INTO #Tmp 
SELECT top(130000) T.* FROM (VALUES ('1232132|2343345435|2ndLast1|3344562534'),
('23243|23423432432|2ndLast2|3344562534'),
('23243|23|42343|2432|2ndLast3|3344562534'),
('23|24|3|2342|3432432|2ndLast4|3344562534'))t(c)
,SYS.all_columns A, SYS.all_columns B;


GO
set statistics time on
DECLARE @X VARCHAR(500);
WITH Data AS
(
    SELECT NEWID() AS ID,-- ID for each row
           CONCAT (SerialNumber, '|') AS SerialNumber-- For stop
      FROM #Tmp
),
Split AS
(
    SELECT ID,
           LEFT (SerialNumber, CHARINDEX ('|', SerialNumber) - 1) AS No,
           1 AS Position,
           RIGHT (SerialNumber, LEN (SerialNumber) - CHARINDEX ('|', SerialNumber)) AS Rest
      FROM Data
     UNION ALL
    SELECT ID,
           LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
           Position + 1,
           RIGHT(Rest, LEN (Rest) - CHARINDEX ('|', Rest))
      FROM Split
     WHERE LEN (Rest) > 1
),
SplitRes AS
(
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
      FROM Split
)
SELECT No ValueBeforeLastPipe, COUNT(*) Cound
  FROM SplitRes
 WHERE RowNum = 2
 GROUP BY No;-- 2 could be a parameter


GO
DECLARE @X VARCHAR(500)
SELECT REVERSE(X.value) ValueBeforeLastPipe, COUNT(*) Count
  FROM #TMP t
 CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
 WHERE X.ordinal = 2
 GROUP BY X.value;

On my machine carsten.saastamoinen's solution runs in 12228 ms and mine in 814 ms

get a substring value from string

Cookies on SQLServerCentral