December 19, 2022 at 6:57 pm
Just as a comparison of the methods given I ran the following script for 130k rows:
-- Finding second last
DROP TABLE IF EXISTS #Tmp ;
create table #Tmp (SerialNumber VARCHAR (500));
INSERT INTO #Tmp
SELECT top(130000) T.* FROM (VALUES ('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|2ndLast1|3344562534'),
('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|23423432432|2ndLast2|3344562534'),
('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|2432|2ndLast3|3344562534'),
('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|30|2342|3432432|2ndLast4|3344562534'))t(c)
,SYS.all_columns A, SYS.all_columns B;
GO
set statistics time on
GO
-- carsten.saastamoinen method
DECLARE @X VARCHAR(500);
WITH Data AS
(
SELECT NEWID() AS ID,-- ID for each row
CONCAT (SerialNumber, '|') AS SerialNumber-- For stop
FROM #Tmp
),
Split AS
(
SELECT ID,
LEFT (SerialNumber, CHARINDEX ('|', SerialNumber) - 1) AS No,
1 AS Position,
RIGHT (SerialNumber, LEN (SerialNumber) - CHARINDEX ('|', SerialNumber)) AS Rest
FROM Data
UNION ALL
SELECT ID,
LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
Position + 1,
RIGHT(Rest, LEN (Rest) - CHARINDEX ('|', Rest))
FROM Split
WHERE LEN (Rest) > 1
),
SplitRes AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
FROM Split
)
SELECT No, COUNT(*) Count
FROM SplitRes
WHERE RowNum = 2
GROUP BY No -- 2 could be a parameter
OPTION(MAXRECURSION 0);
GO
-- jonathan roberts method
DECLARE @X VARCHAR(500)
SELECT REVERSE(X.value) ValueBeforeLastPipe, COUNT(*) Count
FROM #TMP t
CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
WHERE X.ordinal = 2
GROUP BY X.value
GO
-- ScottPletcher method
select ValueBeforeLastPipe, COUNT(*) Count
from #tmp
cross apply (select charindex('|', reverse(serialnumber)) as last_bar) as ca1
cross apply (select charindex('|', reverse(left(serialnumber, len(serialnumber)-last_bar))) as next_to_last_bar) as ca2
cross apply (select case when last_bar = 0 or next_to_last_bar = 0 then ''
else SUBSTRING(serialnumber, LEN(serialnumber) - last_bar - next_to_last_bar + 2, next_to_last_bar - 1) end as ValueBeforeLastPipe) as ca3
GROUP BY ValueBeforeLastPipe;
GO
set statistics time off
Time taken for each query on my machine:
carsten.saastamoinen 176 seconds
Jonathan Roberts: 8 seconds
ScottPletcher: 3 seconds
So Scott's solution is about 60 times faster than carsten.saastamoinen and about 3 times as fast as Jonathan Roberts solution.
December 19, 2022 at 7:41 pm
I will stop here! The focus on which solutions to use is the big difference. My focus will always be a correct and a general solutions. Unless performance is important, because there are a lot of rows and executions happen many times each hour, I will create a solution with CROSS APPLY. But then the overall choice of solution is wrong. The solution should be changed and the use of CROSS APPLY is removed. A table should be created into which the values are inserted when inserting the strings with the many values.
If there are few rows, the focus should be on maintenance. The examples have shown that incorrect solutions are proposed because focus is performance. But what about using of tempdb. In order for this to be included in the considerations, there must be knowledge of the maximum length of VARCHAR columns in relation to actual content. The calculated need for memory is half of the max length of variable length columns. If the length of the value exceeds this estimated length, tempdb is used. If the length is significantly less, it puts too much strain on the system. This is all about memory grant. Etc. etc. etc...
For me, the following priority is important and my priority - so we will never agree!
1. The statement returns the correct result with the current data.
2. The statement gives the correct result tomorrow. Evaluate with data that the definition allows inserted into the tables but does not currently exist.
3. The statement is maintenance friendly.
4. The performance of the statement is acceptable.
Happy Christmas! I will travel to Paris!!!
December 19, 2022 at 8:27 pm
For me, the following priority is important and my priority - so we will never agree! 1. The statement returns the correct result with the current data. 2. The statement gives the correct result tomorrow. Evaluate with data that the definition allows inserted into the tables but does not currently exist. 3. The statement is maintenance friendly. 4. The performance of the statement is acceptable.
Happy Christmas! I will travel to Paris!!!
But Scott's solution is only 6 lines of code, yours is about 25 lines, so Scott's is arguably a lot more maintainable than yours. It also runs 60 times faster and to my knowledge they all return the correct result.
December 19, 2022 at 8:30 pm
And if you want the seven last?
December 19, 2022 at 8:43 pm
No, a solution like this is hopefully only on few rows - as mentioned earlier, else change the way to solve the problem. And if my solution runs in 10 seconds - what is the problem? And hopefully only one or few times a day! The CROSS APPLY have a statement for the last, another for the second last, another for the third last and one for the first. Hopefully not any changes in the data!!!! If the data is not selected at insert time, it must be because the other values are used in other solution!
December 19, 2022 at 8:46 pm
No, a solution like this is hopefully only on few rows - as mentioned earlier, else change the way to solve the problem. And if my solution runs in 10 seconds - what is the problem? And hopefully only one or few times a day! The CROSS APPLY have a statement for the last, another for the second last, another for the third last and one for the first. Hopefully not any changes in the data!!!! If the data is not selected at insert time, it must be because the other values are used in other solution!
Actually my code was written explicitly to make it handle data changes well while also being efficient.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 19, 2022 at 8:50 pm
No, a solution like this is hopefully only on few rows - as mentioned earlier, else change the way to solve the problem. And if my solution runs in 10 seconds - what is the problem? And hopefully only one or few times a day! The CROSS APPLY have a statement for the last, another for the second last, another for the third last and one for the first. Hopefully not any changes in the data!!!! If the data is not selected at insert time, it must be because the other values are used in other solution!
I don't know how you know that this is only for a few rows, will only need to be run a few times a day and might later need to be changed to extract from a different position within the string?
December 19, 2022 at 8:52 pm
Hopefully 10 seconds few times a day is a problem on you server!!!!
December 19, 2022 at 8:56 pm
Else the whole solution is wrong!
December 20, 2022 at 5:09 am
Hopefully 10 seconds few times a day is a problem on you server!!!!
It's not... but the attitude that will cause everything to be written in such a fashion, or worse, is. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2022 at 6:56 am
I have seen - and being removed - because I told
Attitude? It is OK for me to be removed from different threads because I not agree - yes, a positive attitude but a bad attitude only having one solution and call others many different things - it has happened for me many times - because only one solution can be used - never. And I do not imply that using tally as the only solution is bad, just showing other solutions. All possibilities in the language can be used and all problems are not solved the same way!!!!! And performance is not always the only goal, but quality is for me always vary important!!!!
December 20, 2022 at 2:57 pm
This was removed by the editor as SPAM
December 21, 2022 at 12:57 am
Fascinating discussions, but what was "wrong" with my solution? I heard stuff like it is not usable except on SQL 2022, but this IS the SQL 2022 forum, is it not? So why not use the 3rd parameter, and keep the other solutions down in the pre-2022 forums, where hopefully people not using 2022 are looking anyway. I'd think we'd want to learn and use 2022's new features/advantages and share them in this forum.
December 21, 2022 at 1:06 am
Fascinating discussions, but what was "wrong" with my solution? I heard stuff like it is not usable except on SQL 2022, but this IS the SQL 2022 forum, is it not? So why not use the 3rd parameter, and keep the other solutions down in the pre-2022 forums, where hopefully people not using 2022 are looking anyway. I'd think we'd want to learn and use 2022's new features/advantages and share them in this forum.
I couldn't test it as I don't have SQL 2022 installed. Your solution looks fine though.
Viewing 15 posts - 31 through 45 (of 46 total)
You must be logged in to reply to this topic. Login to reply