get a substring value from string

  • Hi,

    I am new to SQL Server.

    I have table

    create table tmp1

    (

    serialnumber varchar(500)

    )

    insert into tmp1 values('1232132|2343345435|34543534534534|3344562534'),('23243|23423432432|34543534534235|3344562534')

    select * from tmp1

    I need to fetch the value before the last pipe symbol

    the output would be as below

    34543534534534

    34543534534235

    anyone can help

  • Quick question, are you consistently looking for the third value in the string?

    😎

    Here is a quick suggestion, mind you that there is no guarantee that the ordinals are correct 😉

    USE TEEST;
    GO
    SET NOCOUNT ON;
    GO
    DECLARE @TMP table
    (
    serialnumber varchar(500)
    );
    INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
    ,('23243|23423432432|34543534534235|3344562534');
    ;WITH BASE_SPLIT AS
    (
    SELECT
    SP.[value] AS ITEM_OF_INTEREST
    ,ROW_NUMBER() OVER
    (
    PARTITION BY T.serialnumber
    ORDER BY @@VERSION) AS RID
    FROM @TMP T
    CROSS APPLY string_split(T.serialnumber,'|') SP
    )
    SELECT
    BS.RID
    ,BS.ITEM_OF_INTEREST
    FROM BASE_SPLIT BS
    WHERE BS.RID = 3;

    Result set

    RID  ITEM_OF_INTEREST
    ---- -----------------
    3 34543534534534
    3 34543534534235
  • select *, 
    case when last_bar = 0 or next_to_last_bar = 0 then ''
    else SUBSTRING(serialnumber, LEN(serialnumber) - last_bar - next_to_last_bar + 2, next_to_last_bar - 1) end
    from tmp1
    cross apply (
    select charindex('|', reverse(serialnumber)) as last_bar
    ) as ca1
    cross apply (
    select charindex('|', reverse(left(serialnumber, len(serialnumber)-last_bar))) as next_to_last_bar
    ) as ca2

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • SET NOCOUNT ON
    ;
    DECLARE @TMP
    TABLE
    (
    serialnumber
    VARCHAR (500)
    )
    ;
    INSERT INTO
    @TMP
    VALUES
    ('1232132|2343345435|34543534534534|3344562534')
    ,
    ('23243|23423432432|34543534534235|3344562534')
    ;
    SELECT
    Parsed.value
    FROM
    (
    SELECT
    *
    FROM
    @TMP
    CROSS APPLY
    string_split (serialnumber, '|', 1)
    )
    AS
    Parsed
    INNER JOIN
    (
    SELECT
    serialnumber
    ,
    MAX (ordinal)
    AS
    MaxOrdinal
    FROM
    @TMP
    CROSS APPLY
    string_split (serialnumber, '|', 1)
    GROUP BY
    serialnumber
    )
    AS
    EntryCount
    ON
    Parsed.serialnumber = EntryCount.serialnumber
    AND
    Parsed.ordinal = MaxOrdinal - 1
    ;

    Have Fun!
    Ronzo

  • This was removed by the editor as SPAM

  • The new STRING_SPLIT with the 3rd operand is great for this.  The old STRING_SPLIT without it is not.  Even MS has stated that the order of the output is NOT guaranteed in any way, shape, or form.  It would be better to use multiple CHARINDEXes to get the location of the 3rd string or use DelimitedSplit8K if you don't really have SQL 2022.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • -- Finding second last - problems if this should be more generel
    DECLARE @Tmp TABLE
    (
    SerialNoVARCHAR (500)
    );

    INSERT INTO @Tmp VALUES
    ('1232132|2343345435|34543534534534|3344562534'),
    ('23243|23423432432|34543534534235|3344562534'),
    ('23243|23|42343|2432|A3454335B|3344562534'),
    ('23|24|3|2342|3432432|X34545Y|3344562534');

    WITH ReverseSN
    AS
    (
    SELECT REVERSE (SerialNo) AS SN
    FROM @Tmp
    ),
    SecondLast
    AS
    (
    SELECT
    CHARINDEX ('|', SN) + 1 AS f,
    CHARINDEX ( '|', SN, CHARINDEX (SN, '|') - 1) AS s,
    SUBSTRING (SN, CHARINDEX ('|', SN) + 1,
    CHARINDEX ('|', SN, CHARINDEX ('|', SN) + 1) - CHARINDEX ('|', SN) - 1) AS ReverseSLast
    FROM ReverseSN
    )
    SELECT REVERSE (ReverseSLast)
    FROM SecondLast;
    GO
    -- A more general recursive solution

    DECLARE @Tmp TABLE
    (
    SerialNo VARCHAR (500)
    );

    INSERT INTO @Tmp VALUES
    ('1232132|2343345435|34543534534534|3344562534'),
    ('23243|23423432432|34543534534235|3344562534'),
    ('23243|23|42343|2432|A3454335B|3344562534'),
    ('23|24|3|2342|3432432|X34545Y|3344562534');

    WITH Data
    AS
    (
    SELECTNEWID() AS ID,-- ID for each row
    CONCAT (SerialNo, '|') AS SerialNo-- For stop
    FROM @Tmp
    ),
    Split
    AS
    (
    SELECTID,
    LEFT (SerialNo, CHARINDEX ('|', SerialNo) - 1) AS No,
    1 AS Position,
    RIGHT (SerialNo, LEN (SerialNo) - CHARINDEX ('|', SerialNo)) AS Rest
    FROM Data
    UNION ALL
    SELECTID,
    LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
    Position + 1,
    RIGHT (Rest, LEN (Rest) - CHARINDEX ('|', Rest))
    FROM Split
    WHERE LEN (Rest) > 1
    ),
    SplitRes
    AS
    (
    SELECT*,
    ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
    FROM Split
    )
    SELECT No
    FROM SplitRes
    WHERE RowNum = 2;-- 2 could be a parameter
  • @carsten.saastamoinen

    That "split" contains an incremental recursive CTE and it's going to be absolutely terrible for performance.  See here...

    https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As expected!!!! On my desk it takes only 14 seconds to process 131,000 rows. Hardly not a statement that will be executed several times every minute or several times every hour.  Then the values should be stored in a table and therefore only be execute ones using 14 seconds. But it is an absolutely terrible solution if, instead, a solution has to be made for the last number, another for the second last number, one for the third last, .... Solutions that will give very different times with that solution, as you suggest. The recursive solution will use the same time for all solutions and can even handle an unknown number of values in the string. Hopefully, your proposal will never need to be changed, so the same change should be made for all the similar solutions. Redundant code is as dangerous as redundant data.

  • Jeff Moden wrote:

    The new STRING_SPLIT with the 3rd operand is great for this.  The old STRING_SPLIT without it is not.  Even MS has stated that the order of the output is NOT guaranteed in any way, shape, or form.  It would be better to use multiple CHARINDEXes to get the location of the 3rd string or use DelimitedSplit8K if you don't really have SQL 2022. 

    As long as last value is the third value they could install this function (with ordinal):

    IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
    EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X')
    END
    GO
    ALTER FUNCTION [dbo].[STRING_SPLIT]
    (
    @string nvarchar(MAX),
    @separator nvarchar(MAX)
    )
    RETURNS TABLE WITH SCHEMABINDING
    AS RETURN
    WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
    Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
    T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
    Delim(Pos) AS (SELECT t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0),
    Separated(value,position) AS (SELECT SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END,
    LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL))
    - CASE WHEN d.Pos=0 THEN 2-LEN(@separator+'x')
    ELSE d.Pos END - (LEN(@separator+'x')-1)),
    d.pos
    FROM Delim d
    WHERE @string IS NOT NULL)
    SELECT s.value value,
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ordinal
    FROM Separated s
    WHERE s.value <> @separator;
    GO

    Then the following SQL will do the job:

    -- Create table variable with data
    DECLARE @TMP table
    (
    serialnumber varchar(500)
    );
    INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
    ,('23243|23423432432|34543534534235|3344562534');

    SELECT X.value
    FROM @TMP t
    CROSS APPLY dbo.STRING_SPLIT(t.serialnumber,'|') X
    WHERE X.ordinal = 3

     

  • carsten.saastamoinen wrote:

    As expected!!!! On my desk it takes only 14 seconds to process 131,000 rows. Hardly not a statement that will be executed several times every minute or several times every hour.  Then the values should be stored in a table and therefore only be execute ones using 14 seconds. But it is an absolutely terrible solution if, instead, a solution has to be made for the last number, another for the second last number, one for the third last, .... Solutions that will give very different times with that solution, as you suggest. The recursive solution will use the same time for all solutions and can even handle an unknown number of values in the string. Hopefully, your proposal will never need to be changed, so the same change should be made for all the similar solutions. Redundant code is as dangerous as redundant data.

    Performance problem "expected".  "Only 14 seconds on 131,ooo" rows.

    Why not just do it the right way so that someone that looks at your code doesn't use it for something worse.  It doesn't take any longer to do it right, especially in SQL Server 2022.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Most people are still using a version earlier than 2022! So just using the 2022 version to solve this problem is not the right solution - for some developers it will takes year before they change to 2022.

    If the value is to be used several times a minute, a solution will hopefully be prepared where the value is stored in a separate column. And then 131,000 rows per minute are unlikely to be inserted. So not a problem!!!

    The task is not to find the third value, but the second last. A solution with CHARINDEX is excluded, as it is not a general solution. What about changes in the string to 5 values. It is not the solution of this kind of problems to simply work out a solution if there are 4, if there are 5, ... Redundant code is as dangerous as redundant data. As far as possible, general solutions should be prepared!

    As I have previously proven, Tally is only the most efficient if a lot of data is processed and the plan is cached. Hardly two prerequisites that are met for this type of problem - as indicated earlier, the solution must probably be in another way, where the individual values are extracted at the time of insertion.

    And when we get to version 2022, we can fortunately forget about Tally. So making a bad solution with Tally in 2022 soon 2023, where this outdated method is used, can hardly be called the correct solution!

  • Here is a slightly different approach that performs quite well given that only the two delimiter positions needed are used in the output.

    😎

    USE TEEST;
    GO
    SET NOCOUNT ON;
    GO
    DECLARE @TMP table
    (
    serialnumber varchar(500)
    );
    INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
    ,('23243|23423432432|34543534534235|3344562534')
    ,('23243|23423432432|34543534534236|3344562534')
    ,('23243|23423432432|34543534534237|3344562534')
    ,('23243|23423432432|34543534534238|3344562534')
    ,('23243|23423432432|34543534534239|3344562534')
    ;

    SELECT
    SUBSTRING(T.serialnumber,CI2.CI + 1,(CI3.CI - CI2.CI) -1) AS ITEM_OF_INTEREST
    FROM @TMP T
    CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber, 1) ) CI1(CI)
    CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,CI1.CI + 1) ) CI2(CI)
    CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,CI2.CI + 1) ) CI3(CI)
    ;

    Result set

    ITEM_OF_INTEREST
    -----------------
    34543534534534
    34543534534235
    34543534534236
    34543534534237
    34543534534238
    34543534534239
  • Simple test harness to demonstrate the inefficiency of the string_split method 😉

    😎

    USE TEEST;
    GO
    SET NOCOUNT ON;
    GO
    DECLARE @SAMPLE_SIZE BIGINT = 1000000;
    DECLARE @STR_BUCKET VARCHAR(20) = NULL;
    DECLARE @TIMER TABLE (
    T_TXT VARCHAR(100) NOT NULL
    ,T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME())
    ,T_CPU FLOAT NOT NULL DEFAULT (@@CPU_BUSY * CONVERT(FLOAT,@@TIMETICKS,0))
    ,T_IDLE FLOAT NOT NULL DEFAULT (@@IDLE * CONVERT(FLOAT,@@TIMETICKS,0))
    ,T_IO FLOAT NOT NULL DEFAULT (@@IO_BUSY * CONVERT(FLOAT,@@TIMETICKS,0))
    );
    -- Uncomment the next line to skip generating the test set.
    -- /*
    IF OBJECT_ID('dbo.TBL_TEST_SPLIT') IS NOT NULL DROP TABLE dbo.TBL_TEST_SPLIT;
    CREATE TABLE dbo.TBL_TEST_SPLIT
    (
    T_STR VARCHAR(250) NOT NULL
    )
    ;
    INSERT INTO @TIMER(T_TXT) VALUES ('TEST GEN');

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE)
    ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
    ,TEST_STRINGS(T_STR) AS
    (
    SELECT
    CONCAT
    (
    ABS(CHECKSUM(NEWID()))
    ,CHAR(124)
    ,ABS(CHECKSUM(NEWID()))
    ,CHAR(124)
    ,ABS(CHECKSUM(NEWID()))
    ,CHAR(124)
    ,ABS(CHECKSUM(NEWID()))
    )
    FROM NUMS NM
    )
    INSERT INTO dbo.TBL_TEST_SPLIT(T_STR)
    SELECT
    TS.T_STR
    FROM TEST_STRINGS TS;
    INSERT INTO @TIMER(T_TXT) VALUES ('TEST GEN');
    -- */
    INSERT INTO @TIMER(T_TXT) VALUES ('DRY RUN');
    SELECT
    @STR_BUCKET = TS.T_STR
    FROM dbo.TBL_TEST_SPLIT TS;
    INSERT INTO @TIMER(T_TXT) VALUES ('DRY RUN');

    INSERT INTO @TIMER(T_TXT) VALUES ('STRING_SPLIT');
    ;WITH BASE_SPLIT AS
    (
    SELECT
    SP.[value] AS ITEM_OF_INTEREST
    ,ROW_NUMBER() OVER
    (
    PARTITION BY TS.T_STR
    ORDER BY @@VERSION) AS RID
    FROM dbo.TBL_TEST_SPLIT TS
    CROSS APPLY string_split(TS.T_STR,'|') SP
    )
    SELECT
    @STR_BUCKET = BS.ITEM_OF_INTEREST
    FROM BASE_SPLIT BS
    WHERE BS.RID = 3;
    INSERT INTO @TIMER(T_TXT) VALUES ('STRING_SPLIT');

    INSERT INTO @TIMER(T_TXT) VALUES ('CHARINDEX');
    SELECT
    @STR_BUCKET = SUBSTRING(TS.T_STR,CI2.CI + 1,(CI3.CI - CI2.CI) -1)
    FROM dbo.TBL_TEST_SPLIT TS
    CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR, 1) ) CI1(CI)
    CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,CI1.CI + 1) ) CI2(CI)
    CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,CI2.CI + 1) ) CI3(CI)
    ;
    INSERT INTO @TIMER(T_TXT) VALUES ('CHARINDEX');

    SELECT
    T.T_TXT
    ,MAX(T.T_CPU) - MIN(T.T_CPU) AS CPU
    ,MAX(T.T_IDLE) - MIN(T.T_IDLE) AS IDLE
    ,MAX(T.T_IO) - MIN(T.T_IO) AS IO
    ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
    FROM @TIMER T
    GROUP BY T.T_TXT
    ORDER BY DURATION ASC;

    Result set (Gen11 i5 laptop)

    T_TXT         CPU      IDLE      IO     DURATION
    ------------- -------- --------- ------ ---------
    DRY RUN 93750 656250 0 102985
    TEST GEN 1375000 10375000 0 1486256
    CHARINDEX 3250000 24000000 0 3498358
    STRING_SPLIT 6218750 45250000 31250 6586345

    This is kind of obvious as any string-splitting method generates intermediate cardinality that is much greater than the charindex method.

  • Here is a solution to get the value before the last pipe symbol, it will work on SQL Server 2012 or higher using this function: https://www.sqlservercentral.com/scripts/a-varcharmax-string_split-function-for-sql-2012-and-above-2, it is also faster than carsten.saastamoinen's solution.

    SELECT REVERSE(X.value) ValueBeforeLastPipe
    FROM #TMP t
    CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
    WHERE X.ordinal = 2

    Here is some code for a comparison which does the operation on 130k rows:

    -- Finding second last
    DROP TABLE IF EXISTS #Tmp ;
    create table #Tmp
    (
    SerialNumber VARCHAR (500)
    );


    INSERT INTO #Tmp
    SELECT top(130000) T.* FROM (VALUES ('1232132|2343345435|2ndLast1|3344562534'),
    ('23243|23423432432|2ndLast2|3344562534'),
    ('23243|23|42343|2432|2ndLast3|3344562534'),
    ('23|24|3|2342|3432432|2ndLast4|3344562534'))t(c)
    ,SYS.all_columns A, SYS.all_columns B;


    GO
    set statistics time on
    DECLARE @X VARCHAR(500);
    WITH Data AS
    (
    SELECT NEWID() AS ID,-- ID for each row
    CONCAT (SerialNumber, '|') AS SerialNumber-- For stop
    FROM #Tmp
    ),
    Split AS
    (
    SELECT ID,
    LEFT (SerialNumber, CHARINDEX ('|', SerialNumber) - 1) AS No,
    1 AS Position,
    RIGHT (SerialNumber, LEN (SerialNumber) - CHARINDEX ('|', SerialNumber)) AS Rest
    FROM Data
    UNION ALL
    SELECT ID,
    LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
    Position + 1,
    RIGHT(Rest, LEN (Rest) - CHARINDEX ('|', Rest))
    FROM Split
    WHERE LEN (Rest) > 1
    ),
    SplitRes AS
    (
    SELECT *,
    ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
    FROM Split
    )
    SELECT No ValueBeforeLastPipe, COUNT(*) Cound
    FROM SplitRes
    WHERE RowNum = 2
    GROUP BY No;-- 2 could be a parameter


    GO
    DECLARE @X VARCHAR(500)
    SELECT REVERSE(X.value) ValueBeforeLastPipe, COUNT(*) Count
    FROM #TMP t
    CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
    WHERE X.ordinal = 2
    GROUP BY X.value;

    On my machine carsten.saastamoinen's solution runs in 12228 ms and mine in 814 ms

     

     

     

     

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply