December 18, 2022 at 5:00 pm
AC Roberts: you are making one of the classic mistakes!!! You just use a function without testing for a correct solution. ORDER BY (SELECT NULL) does not guarantee order. So your ordinal value is random! A table is ALWAYS unordered data and still unordered after this sort/order.
A classic example of using code that is not understood because it is unnecessarily complex. With errors in the solution, performance means absolutely nothing.
SELECT s.value AS value,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ordinal
FROM Separated AS s
WHERE s.value <> @separator;
December 18, 2022 at 9:32 pm
Most people are still using a version earlier than 2022! So just using the 2022 version to solve this problem is not the right solution - for some developers it will takes year before they change to 2022.
If the value is to be used several times a minute, a solution will hopefully be prepared where the value is stored in a separate column. And then 131,000 rows per minute are unlikely to be inserted. So not a problem!!!
The task is not to find the third value, but the second last. A solution with CHARINDEX is excluded, as it is not a general solution. What about changes in the string to 5 values. It is not the solution of this kind of problems to simply work out a solution if there are 4, if there are 5, ... Redundant code is as dangerous as redundant data. As far as possible, general solutions should be prepared!
As I have previously proven, Tally is only the most efficient if a lot of data is processed and the plan is cached. Hardly two prerequisites that are met for this type of problem - as indicated earlier, the solution must probably be in another way, where the individual values are extracted at the time of insertion.
And when we get to version 2022, we can fortunately forget about Tally. So making a bad solution with Tally in 2022 soon 2023, where this outdated method is used, can hardly be called the correct solution!
An incremental rCTE is rather handily beaten even by a WHILE loop and uses 8 times more I/O than the WHILE loop.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2022 at 10:13 pm
And we can see earlier in this post how some people use tally from a function found on the net, because they have learned, that this is the only and fasted way for a solution. And more problematic, performance comes before correctness!!! If the question is about few rows we can see, that using a solution given a wrong result is used because they are told, that this is the only and right way for such a solution. If they was using a simple solution with no performance problems, they have a correct solution. As I was saying earlier, that if this statement should be executed several times each minute or hour with many rows - and more than 130K - none of the solutions should be used. A new table should be created with the value and the ordinal - not the ordinal from the tally function, which have never been an ordinal!
For me this example shows a total wrong way to look at a problem in 2022, where performance is the only goal for the solution. Try to use all the suggested statements on 10 or 100 rows - without a cached plan for any of the solution and find out, that maintenance and correctness is more important than performance! And remember to use both compile and execution time when evaluating!
December 18, 2022 at 10:46 pm
And we can see earlier in this post how some people use tally from a function found on the net, because they have learned, that this is the only and fasted way for a solution. And more problematic, performance comes before correctness!!! If the question is about few rows we can see, that using a solution given a wrong result is used because they are told, that this is the only and right way for such a solution. If they was using a simple solution with no performance problems, they have a correct solution. As I was saying earlier, that if this statement should be executed several times each minute or hour with many rows - and more than 130K - none of the solutions should be used. A new table should be created with the value and the ordinal - not the ordinal from the tally function, which have never been an ordinal!
For me this example shows a total wrong way to look at a problem in 2022, where performance is the only goal for the solution. Try to use all the suggested statements on 10 or 100 rows - without a cached plan for any of the solution and find out, that maintenance and correctness is more important than performance! And remember to use both compile and execution time when evaluating!
And as I'm saying, just because something has low usage doesn't mean that it should be low performing.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2022 at 12:21 am
AC Roberts: you are making one of the classic mistakes!!! You just use a function without testing for a correct solution. ORDER BY (SELECT NULL) does not guarantee order. So your ordinal value is random! A table is ALWAYS unordered data and still unordered after this sort/order.
A classic example of using code that is not understood because it is unnecessarily complex. With errors in the solution, performance means absolutely nothing.
SELECT s.value AS value,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ordinal
FROM Separated AS s
WHERE s.value <> @separator;
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
does not do any sort. The order the the value column is taken from the separated string is the order the values appear in the string. There are no errors in the solution.
December 19, 2022 at 4:21 am
This is not correct! It is right that there is no sort, but no sort is not the same as data is in the same order as in the string. Please show me, where I can find this in the documentation ! Data inserted into a table without a Clustered Index is not read in the same order as inserted. It sounds like the claim, that DISTINCT sort the result. It is random and without any guarantee.
December 19, 2022 at 5:08 am
This was removed by the editor as SPAM
December 19, 2022 at 9:48 am
This is not correct! It is right that there is no sort, but no sort is not the same as data is in the same order as in the string. Please show me, where I can find this in the documentation ! Data inserted into a table without a Clustered Index is not read in the same order as inserted. It sounds like the claim, that DISTINCT sort the result. It is random and without any guarantee.
I think for it to return it in an order that is different from the order of the string it would have to do the operation in parallel and then gathering the streams in a different order. The extraction of the values in the string cannot be done in parallel as the next value depends on the previous value so not something that can be parallelised. As you have said "So your ordinal value is random! ", perhaps you could supply an example of the function not working? It should be easy if the ordinal is random.
December 19, 2022 at 10:58 am
If the optimizer choose to use tempdb, data is not necessary read in the same order as inserted - the principle of using a heap. In this situation with few rows, the data will suddenly be read in the same order as inserted. But what happens, if we have many rows? Where is the border for using tempdb. It is not a rule with SELECT NULL, that data is read in same order as inserted, so we can not use this in my opinion if developing correct solutions. When is the result correct and when is it wrong. Data in a table is always unordered and only by specifying ORDER BY, the order can be used.
I have added 3 scripts. Two of the scripts is for making activity on the server - query and 3. I also insert many rows for being sure that it shows wat is expected. I insert 15,000 rows. The ROW_NUMBER and the inserted row number in the column Txt should have the same value if your conclusion is right. Execute number 1 from query1. Then number 2, 3 and 4 at the same time from each of the files - be fast starting the scripts ;). And test with number 5 from query 1.
December 19, 2022 at 1:06 pm
If the optimizer choose to use tempdb, data is not necessary read in the same order as inserted - the principle of using a heap. In this situation with few rows, the data will suddenly be read in the same order as inserted. But what happens, if we have many rows? Where is the border for using tempdb. It is not a rule with SELECT NULL, that data is read in same order as inserted, so we can not use this in my opinion if developing correct solutions. When is the result correct and when is it wrong. Data in a table is always unordered and only by specifying ORDER BY, the order can be used.
I have added 3 scripts. Two of the scripts is for making activity on the server - query and 3. I also insert many rows for being sure that it shows wat is expected. I insert 15,000 rows. The ROW_NUMBER and the inserted row number in the column Txt should have the same value if your conclusion is right. Execute number 1 from query1. Then number 2, 3 and 4 at the same time from each of the files - be fast starting the scripts ;). And test with number 5 from query 1.
You have not added any scripts.
December 19, 2022 at 4:15 pm
If the optimizer choose to use tempdb, data is not necessary read in the same order as inserted - the principle of using a heap. In this situation with few rows, the data will suddenly be read in the same order as inserted. But what happens, if we have many rows? Where is the border for using tempdb. It is not a rule with SELECT NULL, that data is read in same order as inserted, so we can not use this in my opinion if developing correct solutions. When is the result correct and when is it wrong. Data in a table is always unordered and only by specifying ORDER BY, the order can be used.
I have added 3 scripts. Two of the scripts is for making activity on the server - query and 3. I also insert many rows for being sure that it shows wat is expected. I insert 15,000 rows. The ROW_NUMBER and the inserted row number in the column Txt should have the same value if your conclusion is right. Execute number 1 from query1. Then number 2, 3 and 4 at the same time from each of the files - be fast starting the scripts ;). And test with number 5 from query 1.
I don't think those scripts are anything similar to the function to serve any purpose as a demonstration, they seem drop and recreate tables during a query.
Can you cause the function [dbo].[STRING_SPLIT] to come up with invalid results?
I've changed the function to remove the order by (select null) to remove any doubt:
IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT]() RETURNS TABLE AS RETURN SELECT 1 X')
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
@string nvarchar(MAX),
@separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING
AS RETURN
WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 N FROM Y),
Delim(Pos) AS (SELECT TOP(2147483647) t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0 ORDER BY t.N),
Separated(value,ordinal) AS (SELECT TOP(2147483647) SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END,
LEAD(d.Pos, 1, 2147483647) OVER (ORDER BY d.Pos)
- CASE WHEN d.Pos = 0 THEN 2-LEN(@separator+'x')
ELSE d.Pos END - (LEN(@separator+'x')-1)),
ROW_NUMBER() OVER (ORDER BY d.Pos) ordinal
FROM Delim d
WHERE @string IS NOT NULL
ORDER BY d.Pos)
SELECT s.value value,
ordinal
FROM Separated s
WHERE s.value <> @separator;
December 19, 2022 at 5:03 pm
No, because I do not have a lot of users on my Laptop! But the scripts shows that even when executing with only 3 users, data are not read in the same order as written. And I will not use more time on it, because it is the way SQL Server works: For me it is important not to make a test and use this for showing that a functionality works in a specific way, but know how the space is used, .... The script 2 and 3 are used for use space for other tables, remove this tables and see, that the table t1 then - surely - use this space. When data is read the tables IAM is used and therefore not in the same order as written.
December 19, 2022 at 5:35 pm
No, because I do not have a lot of users on my Laptop! But the scripts shows that even when executing with only 3 users, data are not read in the same order as written. And I will not use more time on it, because it is the way SQL Server works: For me it is important not to make a test and use this for showing that a functionality works in a specific way, but know how the space is used, .... The script 2 and 3 are used for use space for other tables, remove this tables and see, that the table t1 then - surely - use this space. When data is read the tables IAM is used and therefore not in the same order as written.
Now put your money where your mouth is and run your code in the test harness I posted previously 😉
😎
By the way, you mentioned tempdb and your laptop, what are your specs and configurations?
December 19, 2022 at 6:13 pm
20 core (8 + 8 + 4) and 64GB RAM, 2 SSD disk.
Viewing 15 posts - 16 through 30 (of 46 total)
You must be logged in to reply to this topic. Login to reply