get a substring value from string

Question

get a substring value from string

December 19, 2022 at 5:08 am

This was removed by the editor as SPAM

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 1

AC Roberts: you are making one of the classic mistakes!!! You just use a function without testing for a correct solution. ORDER BY (SELECT NULL) does not guarantee order. So your ordinal value is random! A table is ALWAYS unordered data and still unordered after this sort/order.

A classic example of using code that is not understood because it is unnecessarily complex. With errors in the solution, performance means absolutely nothing.

       SELECT s.value AS value,
              ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ordinal
FROM Separated AS s
WHERE s.value <> @separator;

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 2

carsten.saastamoinen wrote:

Most people are still using a version earlier than 2022! So just using the 2022 version to solve this problem is not the right solution - for some developers it will takes year before they change to 2022.
If the value is to be used several times a minute, a solution will hopefully be prepared where the value is stored in a separate column. And then 131,000 rows per minute are unlikely to be inserted. So not a problem!!!
The task is not to find the third value, but the second last. A solution with CHARINDEX is excluded, as it is not a general solution. What about changes in the string to 5 values. It is not the solution of this kind of problems to simply work out a solution if there are 4, if there are 5, ... Redundant code is as dangerous as redundant data. As far as possible, general solutions should be prepared!
As I have previously proven, Tally is only the most efficient if a lot of data is processed and the plan is cached. Hardly two prerequisites that are met for this type of problem - as indicated earlier, the solution must probably be in another way, where the individual values are extracted at the time of insertion.
And when we get to version 2022, we can fortunately forget about Tally. So making a bad solution with Tally in 2022 soon 2023, where this outdated method is used, can hardly be called the correct solution!

An incremental rCTE is rather handily beaten even by a WHILE loop and uses 8 times more I/O than the WHILE loop.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 3

And we can see earlier in this post how some people use tally from a function found on the net, because they have learned, that this is the only and fasted way for a solution. And more problematic, performance comes before correctness!!! If the question is about few rows we can see, that using a solution given a wrong result is used because they are told, that this is the only and right way for such a solution. If they was using a simple solution with no performance problems, they have a correct solution. As I was saying earlier, that if this statement should be executed several times each minute or hour with many rows - and more than 130K - none of the solutions should be used. A new table should be created with the value and the ordinal - not the ordinal from the tally function, which have never been an ordinal!

For me this example shows a total wrong way to look at a problem in 2022, where performance is the only goal for the solution. Try to use all the suggested statements on 10 or 100 rows - without a cached plan for any of the solution and find out, that maintenance and correctness is more important than performance! And remember to use both compile and execution time when evaluating!

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

carsten.saastamoinen wrote:

And we can see earlier in this post how some people use tally from a function found on the net, because they have learned, that this is the only and fasted way for a solution. And more problematic, performance comes before correctness!!! If the question is about few rows we can see, that using a solution given a wrong result is used because they are told, that this is the only and right way for such a solution. If they was using a simple solution with no performance problems, they have a correct solution. As I was saying earlier, that if this statement should be executed several times each minute or hour with many rows - and more than 130K - none of the solutions should be used. A new table should be created with the value and the ordinal - not the ordinal from the tally function, which have never been an ordinal!
For me this example shows a total wrong way to look at a problem in 2022, where performance is the only goal for the solution. Try to use all the suggested statements on 10 or 100 rows - without a cached plan for any of the solution and find out, that maintenance and correctness is more important than performance! And remember to use both compile and execution time when evaluating!

And as I'm saying, just because something has low usage doesn't mean that it should be low performing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 5

carsten.saastamoinen wrote:

AC Roberts: you are making one of the classic mistakes!!! You just use a function without testing for a correct solution. ORDER BY (SELECT NULL) does not guarantee order. So your ordinal value is random! A table is ALWAYS unordered data and still unordered after this sort/order.
A classic example of using code that is not understood because it is unnecessarily complex. With errors in the solution, performance means absolutely nothing.
SELECT s.value AS value,
       ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ordinal
  FROM Separated AS s
 WHERE s.value <> @separator;

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) does not do any sort. The order the the value column is taken from the separated string is the order the values appear in the string. There are no errors in the solution.

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 6

This is not correct! It is right that there is no sort, but no sort is not the same as data is in the same order as in the string. Please show me, where I can find this in the documentation ! Data inserted into a table without a Clustered Index is not read in the same order as inserted. It sounds like the claim, that DISTINCT sort the result. It is random and without any guarantee.

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 7

carsten.saastamoinen wrote:

This is not correct! It is right that there is no sort, but no sort is not the same as data is in the same order as in the string. Please show me, where I can find this in the documentation ! Data inserted into a table without a Clustered Index is not read in the same order as inserted. It sounds like the claim, that DISTINCT sort the result. It is random and without any guarantee.

I think for it to return it in an order that is different from the order of the string it would have to do the operation in parallel and then gathering the streams in a different order. The extraction of the values in the string cannot be done in parallel as the next value depends on the previous value so not something that can be parallelised. As you have said "So your ordinal value is random! ", perhaps you could supply an example of the function not working? It should be easy if the ordinal is random.

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 8

If the optimizer choose to use tempdb, data is not necessary read in the same order as inserted - the principle of using a heap. In this situation with few rows, the data will suddenly be read in the same order as inserted. But what happens, if we have many rows? Where is the border for using tempdb. It is not a rule with SELECT NULL, that data is read in same order as inserted, so we can not use this in my opinion if developing correct solutions. When is the result correct and when is it wrong. Data in a table is always unordered and only by specifying ORDER BY, the order can be used.

I have added 3 scripts. Two of the scripts is for making activity on the server - query and 3. I also insert many rows for being sure that it shows wat is expected. I insert 15,000 rows. The ROW_NUMBER and the inserted row number in the column Txt should have the same value if your conclusion is right. Execute number 1 from query1. Then number 2, 3 and 4 at the same time from each of the files - be fast starting the scripts ;). And test with number 5 from query 1.

This reply was modified 2 years ago by carsten.saastamoinen.

Attachments:

You must be logged in to view attached files.

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 9

carsten.saastamoinen wrote:

If the optimizer choose to use tempdb, data is not necessary read in the same order as inserted - the principle of using a heap. In this situation with few rows, the data will suddenly be read in the same order as inserted. But what happens, if we have many rows? Where is the border for using tempdb. It is not a rule with SELECT NULL, that data is read in same order as inserted, so we can not use this in my opinion if developing correct solutions. When is the result correct and when is it wrong. Data in a table is always unordered and only by specifying ORDER BY, the order can be used.
I have added 3 scripts. Two of the scripts is for making activity on the server - query and 3. I also insert many rows for being sure that it shows wat is expected. I insert 15,000 rows. The ROW_NUMBER and the inserted row number in the column Txt should have the same value if your conclusion is right. Execute number 1 from query1. Then number 2, 3 and 4 at the same time from each of the files - be fast starting the scripts ;). And test with number 5 from query 1.

You have not added any scripts.

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 10

carsten.saastamoinen wrote:

If the optimizer choose to use tempdb, data is not necessary read in the same order as inserted - the principle of using a heap. In this situation with few rows, the data will suddenly be read in the same order as inserted. But what happens, if we have many rows? Where is the border for using tempdb. It is not a rule with SELECT NULL, that data is read in same order as inserted, so we can not use this in my opinion if developing correct solutions. When is the result correct and when is it wrong. Data in a table is always unordered and only by specifying ORDER BY, the order can be used.
I have added 3 scripts. Two of the scripts is for making activity on the server - query and 3. I also insert many rows for being sure that it shows wat is expected. I insert 15,000 rows. The ROW_NUMBER and the inserted row number in the column Txt should have the same value if your conclusion is right. Execute number 1 from query1. Then number 2, 3 and 4 at the same time from each of the files - be fast starting the scripts ;). And test with number 5 from query 1.

I don't think those scripts are anything similar to the function to serve any purpose as a demonstration, they seem drop and recreate tables during a query.

Can you cause the function [dbo].[STRING_SPLIT] to come up with invalid results?

I've changed the function to remove the order by (select null) to remove any doubt:

IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
    EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT]() RETURNS TABLE AS RETURN SELECT 1 X') 
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
    @string    nvarchar(MAX), 
    @separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING 
AS RETURN
   WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
        Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
        T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 N FROM Y),
        Delim(Pos) AS (SELECT TOP(2147483647) t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0 ORDER BY t.N),
        Separated(value,ordinal) AS (SELECT TOP(2147483647) SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END, 
                                                       LEAD(d.Pos, 1, 2147483647) OVER (ORDER BY d.Pos) 
                                                       - CASE WHEN d.Pos = 0 THEN 2-LEN(@separator+'x') 
                                                              ELSE d.Pos END - (LEN(@separator+'x')-1)),
                                             ROW_NUMBER() OVER (ORDER BY d.Pos) ordinal
                                        FROM Delim d
                                       WHERE @string IS NOT NULL
                                       ORDER BY d.Pos)
       SELECT s.value value,
              ordinal
         FROM Separated s
        WHERE s.value <> @separator;

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 11

No, because I do not have a lot of users on my Laptop! But the scripts shows that even when executing with only 3 users, data are not read in the same order as written. And I will not use more time on it, because it is the way SQL Server works: For me it is important not to make a test and use this for showing that a functionality works in a specific way, but know how the space is used, .... The script 2 and 3 are used for use space for other tables, remove this tables and see, that the table t1 then - surely - use this space. When data is read the tables IAM is used and therefore not in the same order as written.

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 12

carsten.saastamoinen wrote:

No, because I do not have a lot of users on my Laptop! But the scripts shows that even when executing with only 3 users, data are not read in the same order as written. And I will not use more time on it, because it is the way SQL Server works: For me it is important not to make a test and use this for showing that a functionality works in a specific way, but know how the space is used, .... The script 2 and 3 are used for use space for other tables, remove this tables and see, that the table t1 then - surely - use this space. When data is read the tables IAM is used and therefore not in the same order as written.

Now put your money where your mouth is and run your code in the test harness I posted previously 😉

😎

By the way, you mentioned tempdb and your laptop, what are your specs and configurations?

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 13

carsten.saastamoinen

Old Hand

Points: 341

December 19, 2022 at 6:13 pm

#4127626

20 core (8 + 8 + 4) and 64GB RAM, 2 SSD disk.

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 14

carsten.saastamoinen wrote:

20 core (8 + 8 + 4) and 64GB RAM, 2 SSD disk.

And the SQL Server instance specs?

😎