get a substring value from string

Question

get a substring value from string

December 20, 2022 at 2:57 pm

This was removed by the editor as SPAM

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 1

Just as a comparison of the methods given I ran the following script for 130k rows:

-- Finding second last
DROP TABLE IF EXISTS #Tmp ;
create table #Tmp (SerialNumber VARCHAR (500));


INSERT INTO #Tmp 
SELECT top(130000) T.* FROM (VALUES ('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|2ndLast1|3344562534'),
('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|23423432432|2ndLast2|3344562534'),
('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|2432|2ndLast3|3344562534'),
('1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|30|2342|3432432|2ndLast4|3344562534'))t(c)
,SYS.all_columns A, SYS.all_columns B;

GO
set statistics time on
GO
-- carsten.saastamoinen method
DECLARE @X VARCHAR(500);
WITH Data AS
(
    SELECT NEWID() AS ID,-- ID for each row
           CONCAT (SerialNumber, '|') AS SerialNumber-- For stop
      FROM #Tmp
),
Split AS
(
    SELECT ID,
           LEFT (SerialNumber, CHARINDEX ('|', SerialNumber) - 1) AS No,
           1 AS Position,
           RIGHT (SerialNumber, LEN (SerialNumber) - CHARINDEX ('|', SerialNumber)) AS Rest
      FROM Data
     UNION ALL
    SELECT ID,
           LEFT (Rest, CHARINDEX ('|', Rest) - 1) AS No,
           Position + 1,
           RIGHT(Rest, LEN (Rest) - CHARINDEX ('|', Rest))
      FROM Split
     WHERE LEN (Rest) > 1
),
SplitRes AS
(
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Position DESC) AS RowNum
      FROM Split
)
SELECT No, COUNT(*) Count
  FROM SplitRes
 WHERE RowNum = 2
 GROUP BY No -- 2 could be a parameter
 OPTION(MAXRECURSION 0);
GO
-- jonathan roberts method
DECLARE @X VARCHAR(500)
SELECT REVERSE(X.value) ValueBeforeLastPipe, COUNT(*) Count
  FROM #TMP t
 CROSS APPLY dbo.STRING_SPLIT(REVERSE(t.SerialNumber),'|') X
 WHERE X.ordinal = 2
 GROUP BY X.value
GO
-- ScottPletcher method
select ValueBeforeLastPipe, COUNT(*) Count
from #tmp
cross apply (select charindex('|', reverse(serialnumber)) as last_bar) as ca1
cross apply (select charindex('|', reverse(left(serialnumber, len(serialnumber)-last_bar))) as next_to_last_bar) as ca2
cross apply (select case when last_bar = 0 or next_to_last_bar = 0 then ''
                         else SUBSTRING(serialnumber, LEN(serialnumber) - last_bar - next_to_last_bar + 2, next_to_last_bar - 1) end  as ValueBeforeLastPipe) as ca3
GROUP BY ValueBeforeLastPipe;
GO 
set statistics time off

Time taken for each query on my machine:

carsten.saastamoinen 176 seconds

Jonathan Roberts: 8 seconds

ScottPletcher: 3 seconds

So Scott's solution is about 60 times faster than carsten.saastamoinen and about 3 times as fast as Jonathan Roberts solution.

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 2

I will stop here! The focus on which solutions to use is the big difference. My focus will always be a correct and a general solutions. Unless performance is important, because there are a lot of rows and executions happen many times each hour, I will create a solution with CROSS APPLY. But then the overall choice of solution is wrong. The solution should be changed and the use of CROSS APPLY is removed. A table should be created into which the values are inserted when inserting the strings with the many values.

If there are few rows, the focus should be on maintenance. The examples have shown that incorrect solutions are proposed because focus is performance. But what about using of tempdb. In order for this to be included in the considerations, there must be knowledge of the maximum length of VARCHAR columns in relation to actual content. The calculated need for memory is half of the max length of variable length columns. If the length of the value exceeds this estimated length, tempdb is used. If the length is significantly less, it puts too much strain on the system. This is all about memory grant. Etc. etc. etc...

For me, the following priority is important and my priority - so we will never agree!

1. The statement returns the correct result with the current data.

2. The statement gives the correct result tomorrow. Evaluate with data that the definition allows inserted into the tables but does not currently exist.

3. The statement is maintenance friendly.

4. The performance of the statement is acceptable.

Happy Christmas! I will travel to Paris!!!

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 3

carsten.saastamoinen wrote:

For me, the following priority is important and my priority - so we will never agree! 1. The statement returns the correct result with the current data. 2. The statement gives the correct result tomorrow. Evaluate with data that the definition allows inserted into the tables but does not currently exist. 3. The statement is maintenance friendly. 4. The performance of the statement is acceptable.
Happy Christmas! I will travel to Paris!!!

But Scott's solution is only 6 lines of code, yours is about 25 lines, so Scott's is arguably a lot more maintainable than yours. It also runs 60 times faster and to my knowledge they all return the correct result.

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 4

carsten.saastamoinen

Old Hand

Points: 341

December 19, 2022 at 8:30 pm

#4127649

And if you want the seven last?

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 5

carsten.saastamoinen wrote:

And if you want the seven last?

And if your server is performing like a dog because you have implemented the most inefficient code you can think of?

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 6

No, a solution like this is hopefully only on few rows - as mentioned earlier, else change the way to solve the problem. And if my solution runs in 10 seconds - what is the problem? And hopefully only one or few times a day! The CROSS APPLY have a statement for the last, another for the second last, another for the third last and one for the first. Hopefully not any changes in the data!!!! If the data is not selected at insert time, it must be because the other values are used in other solution!

ScottPletcher SSC Guru Points: 101119 More actions · Answer 7

carsten.saastamoinen wrote:

No, a solution like this is hopefully only on few rows - as mentioned earlier, else change the way to solve the problem. And if my solution runs in 10 seconds - what is the problem? And hopefully only one or few times a day! The CROSS APPLY have a statement for the last, another for the second last, another for the third last and one for the first. Hopefully not any changes in the data!!!! If the data is not selected at insert time, it must be because the other values are used in other solution!

Actually my code was written explicitly to make it handle data changes well while also being efficient.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 8

carsten.saastamoinen wrote:

No, a solution like this is hopefully only on few rows - as mentioned earlier, else change the way to solve the problem. And if my solution runs in 10 seconds - what is the problem? And hopefully only one or few times a day! The CROSS APPLY have a statement for the last, another for the second last, another for the third last and one for the first. Hopefully not any changes in the data!!!! If the data is not selected at insert time, it must be because the other values are used in other solution!

I don't know how you know that this is only for a few rows, will only need to be run a few times a day and might later need to be changed to extract from a different position within the string?

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 9

Hopefully 10 seconds few times a day is a problem on you server!!!!

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 10

carsten.saastamoinen

Old Hand

Points: 341

December 19, 2022 at 8:56 pm

#4127659

Else the whole solution is wrong!

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 11

carsten.saastamoinen wrote:

Hopefully 10 seconds few times a day is a problem on you server!!!!

It's not... but the attitude that will cause everything to be written in such a fashion, or worse, is. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 12

I have seen - and being removed - because I told

that DISTINCT do not order data - basic understanding.
that it is not better to write COUNT (primary key) or COUNT (1) instead of COUNT (*). All ends up with the same execution plan.
that FOR triggers is not executed before AFTER trigger - it's is just syntax
that the right way of counting rows in a table is not to select this information from the system tables
that the right position of the Cluster Key is not necessary last
that it is not faster to use a filtered index, because it is smaller - yes smaller, but the same information use the same number of pages - filtered or not.
that it is not the cardinality that decide the order of the columns in a multicolumn index, but the use and which other index exists.
that LEFT JOIN is not always the right join type
that all rules should not be evaluated in a program - the optimizer did not know and are able to use the rule. Data is selected many times for each insert or update. It's again performance that is the reason - but a wrong reason, because data is inserted ones and used many times. And all use the same rules - quality.
that floating point data type is not the right data type even it is faster to calculate - and not precisely
that it is not necessary to consider the length of a VARCHAR - wrong, important for data quality and performance
that it is not better using a value instead of NULL Even version 2022 have new functions where NULL is important for a right solution.
that we should not use a function to wash data when inserting, because use of a function do not execute in parallel. One customer, one .. in parallel. Quality before performance or ....? Even that I could use a staging area for washing the data and do it in parallel - one row! Because there is always an alternative for functions - yes but slower and ... All programs is not testing for 2 blanks in the middle, convert an empty string to NULL being sure that there is a value and not just an empty string, ...
...

Attitude? It is OK for me to be removed from different threads because I not agree - yes, a positive attitude but a bad attitude only having one solution and call others many different things - it has happened for me many times - because only one solution can be used - never. And I do not imply that using tally as the only solution is bad, just showing other solutions. All possibilities in the language can be used and all problems are not solved the same way!!!!! And performance is not always the only goal, but quality is for me always vary important!!!!

Ronzo Hall of Fame Points: 3917 More actions · Answer 13

Fascinating discussions, but what was "wrong" with my solution? I heard stuff like it is not usable except on SQL 2022, but this IS the SQL 2022 forum, is it not? So why not use the 3rd parameter, and keep the other solutions down in the pre-2022 forums, where hopefully people not using 2022 are looking anyway. I'd think we'd want to learn and use 2022's new features/advantages and share them in this forum.

Have Fun!
Ronzo

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 14

Ronzo wrote:

Fascinating discussions, but what was "wrong" with my solution? I heard stuff like it is not usable except on SQL 2022, but this IS the SQL 2022 forum, is it not? So why not use the 3rd parameter, and keep the other solutions down in the pre-2022 forums, where hopefully people not using 2022 are looking anyway. I'd think we'd want to learn and use 2022's new features/advantages and share them in this forum.

I couldn't test it as I don't have SQL 2022 installed. Your solution looks fine though.