December 14, 2022 at 4:10 pm
Hi,
I am new to SQL Server.
I have table
create table tmp1
(
serialnumber varchar(500)
)
insert into tmp1 values('1232132|2343345435|34543534534534|3344562534'),('23243|23423432432|34543534534235|3344562534')
select * from tmp1
I need to fetch the value before the last pipe symbol
the output would be as below
34543534534534
34543534534235
anyone can help
December 14, 2022 at 5:52 pm
Quick question, are you consistently looking for the third value in the string?
😎
Here is a quick suggestion, mind you that there is no guarantee that the ordinals are correct 😉
USE TEEST;
GO
SET NOCOUNT ON;
GO
DECLARE @TMP table
(
serialnumber varchar(500)
);
INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
,('23243|23423432432|34543534534235|3344562534');
;WITH BASE_SPLIT AS
(
SELECT
SP.[value] AS ITEM_OF_INTEREST
,ROW_NUMBER() OVER
(
PARTITION BY T.serialnumber
ORDER BY @@VERSION) AS RID
FROM @TMP T
CROSS APPLY string_split(T.serialnumber,'|') SP
)
SELECT
BS.RID
,BS.ITEM_OF_INTEREST
FROM BASE_SPLIT BS
WHERE BS.RID = 3;
Result set
RID ITEM_OF_INTEREST
---- -----------------
3 34543534534534
3 34543534534235
December 14, 2022 at 8:07 pm
select *,
case when last_bar = 0 or next_to_last_bar = 0 then ''
else SUBSTRING(serialnumber, LEN(serialnumber) - last_bar - next_to_last_bar + 2, next_to_last_bar - 1) end
from tmp1
cross apply (
select charindex('|', reverse(serialnumber)) as last_bar
) as ca1
cross apply (
select charindex('|', reverse(left(serialnumber, len(serialnumber)-last_bar))) as next_to_last_bar
) as ca2
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 15, 2022 at 12:29 am
SET NOCOUNT ON
;
DECLARE @TMP
TABLE
(
serialnumber
VARCHAR (500)
)
;
INSERT INTO
@TMP
VALUES
('1232132|2343345435|34543534534534|3344562534')
,
('23243|23423432432|34543534534235|3344562534')
;
SELECT
Parsed.value
FROM
(
SELECT
*
FROM
@TMP
CROSS APPLY
string_split (serialnumber, '|', 1)
)
AS
Parsed
INNER JOIN
(
SELECT
serialnumber
,
MAX (ordinal)
AS
MaxOrdinal
FROM
@TMP
CROSS APPLY
string_split (serialnumber, '|', 1)
GROUP BY
serialnumber
)
AS
EntryCount
ON
Parsed.serialnumber = EntryCount.serialnumber
AND
Parsed.ordinal = MaxOrdinal - 1
;
December 15, 2022 at 6:59 am
This was removed by the editor as SPAM
December 15, 2022 at 9:27 pm
The new STRING_SPLIT with the 3rd operand is great for this. The old STRING_SPLIT without it is not. Even MS has stated that the order of the output is NOT guaranteed in any way, shape, or form. It would be better to use multiple CHARINDEXes to get the location of the 3rd string or use DelimitedSplit8K if you don't really have SQL 2022.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 16, 2022 at 10:12 am
-- Finding second last - problems if this should be more generel
DECLARE @Tmp TABLE
(
SerialNoVARCHAR (500)
);
INSERT INTO @Tmp VALUES
('1232132|2343345435|34543534534534|3344562534'),
('23243|23423432432|34543534534235|3344562534'),
('23243|23|42343|2432|A3454335B|3344562534'),
('23|24|3|2342|3432432|X34545Y|3344562534');
WITH ReverseSN
AS
(
SELECT REVERSE (SerialNo) AS SN
FROM @Tmp
),
SecondLast
AS
(
SELECT
CHARINDEX ('|', SN) + 1 AS f,
CHARINDEX ( '|', SN, CHARINDEX (SN, '|') - 1) AS s,
SUBSTRING (SN, CHARINDEX ('|', SN) + 1,
CHARINDEX ('|', SN, CHARINDEX ('|', SN) + 1) - CHARINDEX ('|', SN) - 1) AS ReverseSLast
FROM ReverseSN
)
SELECT REVERSE (ReverseSLast)
FROM SecondLast;
GO
-- A more general recursive solution
DECLARE @Tmp TABLE
(
SerialNo VARCHAR (500)
);
INSERT INTO @Tmp VALUES
('1232132|2343345435|34543534534534|3344562534'),
('23243|23423432432|34543534534235|3344562534'),
('23243|23|42343|2432|A3454335B|3344562534'),
('23|24|3|2342|3432432|X34545Y|3344562534');
WITH Data
AS
(
SELECTNEWID() AS ID,-- ID for each row
CONCAT (SerialNo, '|') AS SerialNo-- For stop
FROM @Tmp
),
Split
AS
(
SELECTID,
LEFT (SerialNo, CHARINDEX ('|', SerialNo) - 1) AS No,
1 AS Position,
RIGHT (SerialNo, LEN (SerialNo) - CHARINDEX ('|', SerialNo)) AS Rest
FROM Data
UNION ALL
SELECTID,
LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
Position + 1,
RIGHT (Rest, LEN (Rest) - CHARINDEX ('|', Rest))
FROM Split
WHERE LEN (Rest) > 1
),
SplitRes
AS
(
SELECT*,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
FROM Split
)
SELECT No
FROM SplitRes
WHERE RowNum = 2;-- 2 could be a parameter
December 16, 2022 at 4:43 pm
@carsten.saastamoinen
That "split" contains an incremental recursive CTE and it's going to be absolutely terrible for performance. See here...
https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes
--Jeff Moden
Change is inevitable... Change for the better is not.
December 16, 2022 at 5:35 pm
As expected!!!! On my desk it takes only 14 seconds to process 131,000 rows. Hardly not a statement that will be executed several times every minute or several times every hour. Then the values should be stored in a table and therefore only be execute ones using 14 seconds. But it is an absolutely terrible solution if, instead, a solution has to be made for the last number, another for the second last number, one for the third last, .... Solutions that will give very different times with that solution, as you suggest. The recursive solution will use the same time for all solutions and can even handle an unknown number of values in the string. Hopefully, your proposal will never need to be changed, so the same change should be made for all the similar solutions. Redundant code is as dangerous as redundant data.
December 16, 2022 at 6:20 pm
The new STRING_SPLIT with the 3rd operand is great for this. The old STRING_SPLIT without it is not. Even MS has stated that the order of the output is NOT guaranteed in any way, shape, or form. It would be better to use multiple CHARINDEXes to get the location of the 3rd string or use DelimitedSplit8K if you don't really have SQL 2022.
As long as last value is the third value they could install this function (with ordinal):
IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X')
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
@string nvarchar(MAX),
@separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING
AS RETURN
WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
Delim(Pos) AS (SELECT t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0),
Separated(value,position) AS (SELECT SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END,
LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL))
- CASE WHEN d.Pos=0 THEN 2-LEN(@separator+'x')
ELSE d.Pos END - (LEN(@separator+'x')-1)),
d.pos
FROM Delim d
WHERE @string IS NOT NULL)
SELECT s.value value,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ordinal
FROM Separated s
WHERE s.value <> @separator;
GO
Then the following SQL will do the job:
-- Create table variable with data
DECLARE @TMP table
(
serialnumber varchar(500)
);
INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
,('23243|23423432432|34543534534235|3344562534');
SELECT X.value
FROM @TMP t
CROSS APPLY dbo.STRING_SPLIT(t.serialnumber,'|') X
WHERE X.ordinal = 3
December 16, 2022 at 6:32 pm
As expected!!!! On my desk it takes only 14 seconds to process 131,000 rows. Hardly not a statement that will be executed several times every minute or several times every hour. Then the values should be stored in a table and therefore only be execute ones using 14 seconds. But it is an absolutely terrible solution if, instead, a solution has to be made for the last number, another for the second last number, one for the third last, .... Solutions that will give very different times with that solution, as you suggest. The recursive solution will use the same time for all solutions and can even handle an unknown number of values in the string. Hopefully, your proposal will never need to be changed, so the same change should be made for all the similar solutions. Redundant code is as dangerous as redundant data.
Performance problem "expected". "Only 14 seconds on 131,ooo" rows.
Why not just do it the right way so that someone that looks at your code doesn't use it for something worse. It doesn't take any longer to do it right, especially in SQL Server 2022.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2022 at 9:08 am
Most people are still using a version earlier than 2022! So just using the 2022 version to solve this problem is not the right solution - for some developers it will takes year before they change to 2022.
If the value is to be used several times a minute, a solution will hopefully be prepared where the value is stored in a separate column. And then 131,000 rows per minute are unlikely to be inserted. So not a problem!!!
The task is not to find the third value, but the second last. A solution with CHARINDEX is excluded, as it is not a general solution. What about changes in the string to 5 values. It is not the solution of this kind of problems to simply work out a solution if there are 4, if there are 5, ... Redundant code is as dangerous as redundant data. As far as possible, general solutions should be prepared!
As I have previously proven, Tally is only the most efficient if a lot of data is processed and the plan is cached. Hardly two prerequisites that are met for this type of problem - as indicated earlier, the solution must probably be in another way, where the individual values are extracted at the time of insertion.
And when we get to version 2022, we can fortunately forget about Tally. So making a bad solution with Tally in 2022 soon 2023, where this outdated method is used, can hardly be called the correct solution!
December 18, 2022 at 10:23 am
Here is a slightly different approach that performs quite well given that only the two delimiter positions needed are used in the output.
😎
USE TEEST;
GO
SET NOCOUNT ON;
GO
DECLARE @TMP table
(
serialnumber varchar(500)
);
INSERT INTO @TMP VALUES('1232132|2343345435|34543534534534|3344562534')
,('23243|23423432432|34543534534235|3344562534')
,('23243|23423432432|34543534534236|3344562534')
,('23243|23423432432|34543534534237|3344562534')
,('23243|23423432432|34543534534238|3344562534')
,('23243|23423432432|34543534534239|3344562534')
;
SELECT
SUBSTRING(T.serialnumber,CI2.CI + 1,(CI3.CI - CI2.CI) -1) AS ITEM_OF_INTEREST
FROM @TMP T
CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber, 1) ) CI1(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,CI1.CI + 1) ) CI2(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),T.serialnumber,CI2.CI + 1) ) CI3(CI)
;
Result set
ITEM_OF_INTEREST
-----------------
34543534534534
34543534534235
34543534534236
34543534534237
34543534534238
34543534534239
December 18, 2022 at 11:29 am
Simple test harness to demonstrate the inefficiency of the string_split method 😉
😎
USE TEEST;
GO
SET NOCOUNT ON;
GO
DECLARE @SAMPLE_SIZE BIGINT = 1000000;
DECLARE @STR_BUCKET VARCHAR(20) = NULL;
DECLARE @TIMER TABLE (
T_TXT VARCHAR(100) NOT NULL
,T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME())
,T_CPU FLOAT NOT NULL DEFAULT (@@CPU_BUSY * CONVERT(FLOAT,@@TIMETICKS,0))
,T_IDLE FLOAT NOT NULL DEFAULT (@@IDLE * CONVERT(FLOAT,@@TIMETICKS,0))
,T_IO FLOAT NOT NULL DEFAULT (@@IO_BUSY * CONVERT(FLOAT,@@TIMETICKS,0))
);
-- Uncomment the next line to skip generating the test set.
-- /*
IF OBJECT_ID('dbo.TBL_TEST_SPLIT') IS NOT NULL DROP TABLE dbo.TBL_TEST_SPLIT;
CREATE TABLE dbo.TBL_TEST_SPLIT
(
T_STR VARCHAR(250) NOT NULL
)
;
INSERT INTO @TIMER(T_TXT) VALUES ('TEST GEN');
;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE)
ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
,TEST_STRINGS(T_STR) AS
(
SELECT
CONCAT
(
ABS(CHECKSUM(NEWID()))
,CHAR(124)
,ABS(CHECKSUM(NEWID()))
,CHAR(124)
,ABS(CHECKSUM(NEWID()))
,CHAR(124)
,ABS(CHECKSUM(NEWID()))
)
FROM NUMS NM
)
INSERT INTO dbo.TBL_TEST_SPLIT(T_STR)
SELECT
TS.T_STR
FROM TEST_STRINGS TS;
INSERT INTO @TIMER(T_TXT) VALUES ('TEST GEN');
-- */
INSERT INTO @TIMER(T_TXT) VALUES ('DRY RUN');
SELECT
@STR_BUCKET = TS.T_STR
FROM dbo.TBL_TEST_SPLIT TS;
INSERT INTO @TIMER(T_TXT) VALUES ('DRY RUN');
INSERT INTO @TIMER(T_TXT) VALUES ('STRING_SPLIT');
;WITH BASE_SPLIT AS
(
SELECT
SP.[value] AS ITEM_OF_INTEREST
,ROW_NUMBER() OVER
(
PARTITION BY TS.T_STR
ORDER BY @@VERSION) AS RID
FROM dbo.TBL_TEST_SPLIT TS
CROSS APPLY string_split(TS.T_STR,'|') SP
)
SELECT
@STR_BUCKET = BS.ITEM_OF_INTEREST
FROM BASE_SPLIT BS
WHERE BS.RID = 3;
INSERT INTO @TIMER(T_TXT) VALUES ('STRING_SPLIT');
INSERT INTO @TIMER(T_TXT) VALUES ('CHARINDEX');
SELECT
@STR_BUCKET = SUBSTRING(TS.T_STR,CI2.CI + 1,(CI3.CI - CI2.CI) -1)
FROM dbo.TBL_TEST_SPLIT TS
CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR, 1) ) CI1(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,CI1.CI + 1) ) CI2(CI)
CROSS APPLY (SELECT CHARINDEX(CHAR(124),TS.T_STR,CI2.CI + 1) ) CI3(CI)
;
INSERT INTO @TIMER(T_TXT) VALUES ('CHARINDEX');
SELECT
T.T_TXT
,MAX(T.T_CPU) - MIN(T.T_CPU) AS CPU
,MAX(T.T_IDLE) - MIN(T.T_IDLE) AS IDLE
,MAX(T.T_IO) - MIN(T.T_IO) AS IO
,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
FROM @TIMER T
GROUP BY T.T_TXT
ORDER BY DURATION ASC;
Result set (Gen11 i5 laptop)
T_TXT CPU IDLE IO DURATION
------------- -------- --------- ------ ---------
DRY RUN 93750 656250 0 102985
TEST GEN 1375000 10375000 0 1486256
CHARINDEX 3250000 24000000 0 3498358
STRING_SPLIT 6218750 45250000 31250 6586345
This is kind of obvious as any string-splitting method generates intermediate cardinality that is much greater than the charindex method.
December 18, 2022 at 3:52 pm
Here is a solution to get the value before the last pipe symbol, it will work on SQL Server 2012 or higher using this function: https://www.sqlservercentral.com/scripts/a-varcharmax-string_split-function-for-sql-2012-and-above-2, it is also faster than carsten.saastamoinen's solution.
SELECT REVERSE(X.value) ValueBeforeLastPipe
FROM #TMP t
CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
WHERE X.ordinal = 2
Here is some code for a comparison which does the operation on 130k rows:
-- Finding second last
DROP TABLE IF EXISTS #Tmp ;
create table #Tmp
(
SerialNumber VARCHAR (500)
);
INSERT INTO #Tmp
SELECT top(130000) T.* FROM (VALUES ('1232132|2343345435|2ndLast1|3344562534'),
('23243|23423432432|2ndLast2|3344562534'),
('23243|23|42343|2432|2ndLast3|3344562534'),
('23|24|3|2342|3432432|2ndLast4|3344562534'))t(c)
,SYS.all_columns A, SYS.all_columns B;
GO
set statistics time on
DECLARE @X VARCHAR(500);
WITH Data AS
(
SELECT NEWID() AS ID,-- ID for each row
CONCAT (SerialNumber, '|') AS SerialNumber-- For stop
FROM #Tmp
),
Split AS
(
SELECT ID,
LEFT (SerialNumber, CHARINDEX ('|', SerialNumber) - 1) AS No,
1 AS Position,
RIGHT (SerialNumber, LEN (SerialNumber) - CHARINDEX ('|', SerialNumber)) AS Rest
FROM Data
UNION ALL
SELECT ID,
LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
Position + 1,
RIGHT(Rest, LEN (Rest) - CHARINDEX ('|', Rest))
FROM Split
WHERE LEN (Rest) > 1
),
SplitRes AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
FROM Split
)
SELECT No ValueBeforeLastPipe, COUNT(*) Cound
FROM SplitRes
WHERE RowNum = 2
GROUP BY No;-- 2 could be a parameter
GO
DECLARE @X VARCHAR(500)
SELECT REVERSE(X.value) ValueBeforeLastPipe, COUNT(*) Count
FROM #TMP t
CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
WHERE X.ordinal = 2
GROUP BY X.value;
On my machine carsten.saastamoinen's solution runs in 12228 ms and mine in 814 ms
Viewing 15 posts - 1 through 15 (of 46 total)
You must be logged in to reply to this topic. Login to reply