July 22, 2020 at 2:36 pm
No, it is NOT right! The function is created, so the comparison is only between execution of the 2 queries! If the function don't have a execution-plan in the execution cache, it must be compiled - schmabind or not schemabind!
July 22, 2020 at 3:27 pm
Ok so is the execution cache cleared of the tally function between each execution? Or according to what timeout period or ??
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
July 22, 2020 at 3:57 pm
No, the plan for a procedures, functions, ... when then execution cache is not big enough to hold all the plans - do you have a server that can keep all the execution plans in cache? And the plans are thrown out, depending of how often the procedure, function, .... is used. But maybe you use the tally function in so many queries, that the execution plan is always in cache!
July 22, 2020 at 4:41 pm
Aha ok yes afaik the cache is clear-able on local instances. I work mostly on Azure SQL these days and the DBCC statements to clear cache don't apply/work (maybe yet?). The tally based approach uses fewer resources if the plan is re-used, yes?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
July 22, 2020 at 5:28 pm
No, the plan for a procedures, functions, ... when then execution cache is not big enough to hold all the plans - do you have a server that can keep all the execution plans in cache? And the plans are thrown out, depending of how often the procedure, function, .... is used. But maybe you use the tally function in so many queries, that the execution plan is always in cache!
The execution plan for fnTally (for example) is incredibly trivial and virtually no time to recompile if needed.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2020 at 5:30 pm
In Azure, we also have limited resources!
July 22, 2020 at 8:54 pm
Okay ... time to throw my hat in the ring... I would not recommend storing the data in the database this way, as it's just not a proper representation of the need for a real-world result. However, that said, a simple single character splitting function can be paired with some simple string manipulation to achieve the desired result, and the performance appears to be great...
First, the table as originally specified, plus the splitting function:
USE PERSONAL;
GO
IF OBJECT_ID(N'dbo.TIDPrefix', N'U') IS NOT NULL
BEGIN;
DROP TABLE dbo.TIDPrefix;
END;
GO
CREATE TABLE dbo.TIDPrefix (
TypeID int,
TIDPrefix varchar(20) NOT NULL PRIMARY KEY CLUSTERED
);
GO
INSERT INTO dbo.TIDPrefix (TypeID, TIDPrefix)
SELECT 13, 'A[CL]' UNION ALL
SELECT 13, 'B[HQUW]' UNION ALL
SELECT 13, 'D[C]' UNION ALL
SELECT 13, 'G[AHPWZ]' UNION ALL
SELECT 13, 'I[ST]' UNION ALL
SELECT 13, 'N[Y]' UNION ALL
SELECT 13, 'P[BPS]' UNION ALL
SELECT 13, 'V[AR]' UNION ALL
SELECT 13, 'XP' UNION ALL
SELECT 13, 'Z';
GO
IF OBJECT_ID(N'dbo.fnSplit1', N'IF') IS NOT NULL
BEGIN;
DROP FUNCTION dbo.fnSplit1;
END;
GO
CREATE FUNCTION dbo.fnSplit1 (
@String varchar(8000)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
WITH N1 AS (
SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
N4 AS (
SELECT TOP (LEN(@String))
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N
FROM N1 AS A, N1 AS B, N1 AS C, N1 AS D
)
SELECT TOP 100 PERCENT
N AS ItemNumber,
SUBSTRING(@String, N, 1) AS Item
FROM N4
WHERE LEN(@String) > 0
ORDER BY N;
GO
And the test code:
USE PERSONAL;
GO
WITH RAW_DATA AS (
SELECT
TP.TIDPrefix,
CHARINDEX('[', TP.TIDPrefix) AS BPOS,
CASE
WHEN CHARINDEX('[', TP.TIDPrefix) > 0 THEN LEFT(TP.TIDPrefix, CHARINDEX('[', TP.TIDPrefix) - 1)
ELSE TP.TIDPrefix
END AS Prefix,
CASE
WHEN CHARINDEX('[', TP.TIDPrefix) > 0 THEN SUBSTRING(TP.TIDPrefix, CHARINDEX('[', TP.TIDPrefix) + 1, LEN(TP.TIDPrefix) - (CHARINDEX('[', TP.TIDPrefix) + 1))
ELSE ''
END AS PostFix
FROM dbo.TIDPrefix AS TP
)
SELECT
RD.TIDPrefix,
ISNULL(S.ItemNumber, 1) AS ItemNumber,
RD.Prefix + ISNULL(S.Item, '') AS ResultString
FROM RAW_DATA AS RD
OUTER APPLY.dbo.fnSplit1(RD.PostFix) AS S
ORDER BY RD.TIDPrefix, S.ItemNumber;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 23, 2020 at 6:56 pm
Not to be left out, here's my take on a possible solution: 🙂
DROP TABLE IF EXISTS #testdata;
CREATE TABLE #testdata -- Source data structure as specified by OP...
(
TypeId INT DEFAULT(13),
TIDPrefix VARCHAR(100)
);
INSERT INTO #testdata (TIDPrefix)
VALUES
-- Testdata provided by the OP...
('A[CL]'),
('B[HQUW]'),
('D[C]'),
('G[AHPWZ]'),
('I[ST]'),
('N[Y]'),
('P[BPS]'),
('V[AR]'),
('XP'),
('Z'),
--- Some additional unusual/irregular data...
('A[DLT]Z'),
('AL[MSB]ZT'),
('AX[NFB][ZT'),
('ATL[SUZ[ZT'),
('AW[[CUT'),
('AZ[[YOQ]')
;
-- My take on a possible solution...
WITH
DigitsZero2Nine AS (
SELECT Digit FROM
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) a(Digit)
),
NumbersOne2OneHundred AS ( -- Used to index the character string to explode (limited to 100 because of defined variable length
SELECT zn1.Digit * 10 + zn2.Digit + 1 AS Number
FROM DigitsZero2Nine zn1, DigitsZero2Nine zn2
),
SourceData AS (
SELECT
x.TypeId,
x.TIDPrefix,
CASE WHEN 0 IN (CHARINDEX('[',x.TIDPrefix,1),CHARINDEX(']',x.TIDPrefix,1)) THEN x.TIDPrefix ELSE LEFT(x.TIDPrefix,CHARINDEX('[',x.TIDPrefix,1)-1) END AS Prefix,
CASE WHEN 0 IN (CHARINDEX('[',x.TIDPrefix,1),CHARINDEX(']',x.TIDPrefix,1)) THEN '' ELSE SUBSTRING(x.TIDPrefix,CHARINDEX('[',x.TIDPrefix,1)+1,CHARINDEX(']',x.TIDPrefix,1)-CHARINDEX('[',x.TIDPrefix,1)-1) END AS Xplode,
CASE WHEN 0 IN (CHARINDEX('[',x.TIDPrefix,1),CHARINDEX(']',x.TIDPrefix,1),LEN(x.TIDPrefix)) THEN '' ELSE RIGHT(x.TIDPrefix,LEN(x.TIDPrefix)-CHARINDEX(']',x.TIDPrefix,1)) END AS Suffix
FROM #testdata x)
SELECT
sd.TypeId,
sd.TIDPrefix,
sd.Prefix,
sd.Xplode,
sd.Suffix,
n100.Number AS XplodeCharIndex,
SUBSTRING(sd.Xplode,n100.Number,1) AS XplodeChar,
sd.Prefix + SUBSTRING(sd.Xplode,n100.Number,1) + sd.Suffix AS TIDCombined
FROM SourceData sd
CROSS JOIN NumbersOne2OneHundred n100
WHERE n100.Number <= LEN(sd.Xplode)
OR n100.Number = 1 AND LEN(sd.Xplode) = 0
ORDER BY
TIDPrefix,
XplodeCharIndex
July 23, 2020 at 7:05 pm
Awesome!
July 23, 2020 at 8:05 pm
If parse and compile time is 20 ms, how long time takes recompile?
July 23, 2020 at 9:18 pm
In Azure, we also have limited resources!
Hey there, Carsten. I really appreciate that you have limited resources in Azure. It's taking me a while (had to write some code to create some decent test data) and I'll be back with more but I have confirmed your compile time observation.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 24, 2020 at 10:50 am
776/5000
"I really appreciate that you have limited resources in Azure". What a comment !!! I am not writing anything about that I have limited resources, but that there are also limited resources, when we use Azure! If we use vcore, 2 cores has 10.2 GB, 4 Cores 20.4 but double price, .... 80 cores 396 GB, but with a price that is 20 times bigger than 4 cores and 40 times bigger than 2 cores. And similarly for the other Azure models. I hope not, and I'm convinced, that dba's are not just allowed to upgrade indefinitely - it must be part of a professional dba's job, not just throwing money out the window.
And SQL Server works the same way whether it's on my local PC or it's in Azure. And I do not think it is possible in practice to turn up and down, up and down, ..., up and down for Azure resources several times during the day. So it's not the solution to a problem - only for the marketing department at MS!
For your information, the test is made on my PC with 8 core (2 quad I7) and 64 GB RAM!
July 24, 2020 at 10:04 pm
Yep. Understood. And, I agree... more core and ram isn't going to change the problem with compile times. But, so far, no one has played with more than about 6 rows on this problem. In my world, we work with some pretty decent size batches of more than a million rows. To use rCTEs on such things is a complete and totally unnecessary waste of resources and time... especially since a 23-26 second recompile pails in comparison to the actual runs that we do.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 25, 2020 at 12:20 pm
This script prints start/stop timestamps before/after running 2 queries. Query 1 counts rows generated by the tally function. Query 2 uses a recursive CTE. With n=10000, Q1 takes 0 ms. With n=10000, Q2 takes about 100 ms.
declare
@n int=10000,
@tally_start datetime2=sysutcdatetime();
print (cast(@tally_start as varchar));
select count(*) from dbo.fnTally(1, @n);
declare
@tally_end datetime2=sysutcdatetime();
print (cast(@tally_end as varchar));
print ('tally elapsed time: '+cast(datediff(ms, @tally_start, @tally_end) as varchar));
declare
@cte_start datetime2=sysutcdatetime();
print (cast(@cte_start as varchar));
;With NumberSequence( Number ) as
(
Select 1 as Number
union all
Select Number + 1
from NumberSequence
where Number < @n
)
Select count(*) From NumberSequence Option (MaxRecursion 32767);
declare
@cte_end datetime2=sysutcdatetime();
print (cast(@cte_end as varchar));
print ('cte elapsed time: '+cast(datediff(ms, @cte_start, @cte_end) as varchar));
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
July 25, 2020 at 12:53 pm
Sorry Steve for the slightly short and 'sour' remark, but you obviously did not understand the message. Tally is not a panacea for solving these kinds of tasks. If many rows are to be created, tally is the best solution, but if a few rows are to be created, the recursive solution is best. Whether a few or many rows are to be created does not necessarily depend on the other tables in the database. I have been working with wind turbine data - more than 500,000,000 rows - but still need to form a series of numbers, used to look at hours, days month, ... If this data is to be displayed in a graph on af screen, maybe only 20 - 40 values will be shown for not losing track.
The premise of this long discussion says nothing about the number of rows that are taken as a starting point - maybe it is 10, maybe it is 1,000,000. So one solution can be better than the other, when that knowledge is present. But it can first be decided when this is known!
In my world many solutions can be solved in different ways. It is importent to make a solutions which always works - so if things change, it will still work. But choose the best one (cpu-time, elapsed time, ....), depending on how the 'world' looks right now - and it is not a choose among only one possible solution.
Viewing 15 posts - 46 through 60 (of 71 total)
You must be logged in to reply to this topic. Login to reply