November 6, 2017 at 8:28 am
Alan.B - Thursday, November 2, 2017 6:32 PMI fixed Chris' Test harness code and my own solution which uses NGrams8K. It was faster. If you understand how NGrams8K works then you know that I essentially re-wrote Chris's query with one notable modification: The CHECKSUM around my Position removes two implicit conversions to int that you'll see in the execution plan for Chris' solution. Execution plans attached. Note that I also removed the rowid.
SELECT XCHar
FROM #ListOfStrings los
CROSS APPLY
(
SELECT TOP (1) token --, position
FROM dbo.ngrams8k(los.name,1)
WHERE CHARINDEX(token,SUBSTRING(los.name,1,CHECKSUM(position-1))) > 0
ORDER BY position
) ngEng(XCHar);
IF 1 = 1
BEGIN
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#ListOfStrings') IS NOT NULL DROP TABLE #ListOfStrings;SELECT TOP(100000)
[RowID] = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
[name] = CAST(NEWID() AS VARCHAR(36))
INTO #ListOfStrings
FROM SYS.COLUMNS a, SYS.COLUMNS b
END;--PRINT char(10)+'== John '+REPLICATE('=',50);
SET STATISTICS IO, TIME ON;
PRINT char(10)+'== EE '+REPLICATE('========================================',2);
SET STATISTICS IO, TIME ON;
WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N)),
POS_DIST AS
(
SELECT
LOS.RowID
,NUMS.N
,SUBSTRING(LOS.[name],NUMS.N,1) AS XCHAR
,CHARINDEX(SUBSTRING(LOS.[name],NUMS.N,1),LOS.[name],NUMS.N + 1) - NUMS.N AS DISTANCE
FROM #ListOfStrings LOS
CROSS APPLY
(
SELECT TOP(LEN(LOS.[name])) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
FROM T T1,T T22,T T3,T T4
) NUMS(N)
WHERE CHARINDEX(SUBSTRING(LOS.[name],NUMS.N,1),LOS.[name],NUMS.N + 1) - NUMS.N > 0
),
FIND_MATCH AS
(
SELECT
PD.RowID
,ROW_NUMBER() OVER (PARTITION BY PD.RowID ORDER BY PD.N + PD.DISTANCE) XRID
,PD.XCHAR
FROM POS_DIST PD
)
SELECT --FM.RowID,
FM.XCHAR
FROM FIND_MATCH FM
WHERE FM.XRID = 1
ORDER BY FM.RowID;
SET STATISTICS IO, TIME OFF;PRINT char(10)+'== Chris3 '+REPLICATE('========================================',2);
SET STATISTICS IO, TIME ON;WITH Tally -- Chris3
AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n))
SELECT -- c.RowID,
XCHAR = SUBSTRING(c.name, x.n, 1)
FROM #ListOfStrings c
CROSS APPLY
(
SELECT TOP(1) n
FROM (
SELECT TOP(LEN(c.name)) n = ROW_NUMBER() OVER (ORDER BY @@VERSION)
FROM Tally a, Tally b, Tally c
) t
WHERE CHARINDEX(SUBSTRING(c.name, t.n, 1), LEFT(c.name, n-1)) > 0
ORDER BY n
) x
ORDER BY c.RowID;
SET STATISTICS IO, TIME OFF;PRINT char(10)+'== Jason '+REPLICATE('========================================',2);
SET STATISTICS IO, TIME ON;
WITH
cte_psd AS
(
SELECT
los.RowID,
bin_val = CAST(p.pos AS BINARY(4)) + CAST(sv.sub_val AS BINARY(2)),
drnk = DENSE_RANK() OVER (PARTITION BY los.RowID, sv.sub_val ORDER BY p.pos)
FROM #ListOfStrings los
CROSS APPLY
(
SELECT TOP (LEN(los.name))
rn =ROW_NUMBER() OVER (ORDER BY n2.n)
FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) n1(n)
CROSS APPLY ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n2 (n) ) p (pos)
CROSS APPLY ( VALUES (SUBSTRING(los.name, p.pos, 1)) ) sv (sub_val)
)
SELECT --psd.RowID,
CAST(SUBSTRING(MIN(psd.bin_val), 5, 2) AS VARCHAR(2))
FROM cte_psd psd
WHERE psd.drnk > 1
GROUP BY psd.RowID
ORDER BY psd.RowID;
SET STATISTICS IO, TIME OFF;PRINT char(10)+'== Alan '+REPLICATE('========================================',2);
SET STATISTICS IO, TIME ON;
SELECT XCHar
FROM #ListOfStrings los
CROSS APPLY
(
SELECT TOP (1) token --, position
FROM dbo.ngrams8k(los.name,1)
WHERE CHARINDEX(token,SUBSTRING(los.name,1,CHECKSUM(position-1))) > 0
ORDER BY position
) ngEng(XCHar);SET STATISTICS IO, TIME OFF;
Results
== EE ================================================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 393 ms, elapsed time = 393 ms.
Table '#ListOfStrings______________________________________________________________________________________________________000000000013'. Scan count 9, logical reads 705, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 7283, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 7735 ms, elapsed time = 9586 ms.== Chris3 ================================================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 135, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ListOfStrings______________________________________________________________________________________________________000000000013'. Scan count 9, logical reads 705, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 407 ms, elapsed time = 579 ms.
== Jason ================================================================================
SQL Server parse and compile time:
CPU time = 74 ms, elapsed time = 74 ms.
Table '#ListOfStrings______________________________________________________________________________________________________000000000013'. Scan count 9, logical reads 705, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 8703 ms, elapsed time = 1534 ms.== Alan ================================================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 5 ms.
<NO WORKTABLE HERE>
Table '#ListOfStrings______________________________________________________________________________________________________000000000013'. Scan count 9, logical reads 705, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 347 ms, elapsed time = 462 ms.As you can see, my solution is worktable free. While typing this up, however, I discovered a problem with Chris' solution - he includes ORDER BY c.RowID which adds overhead. Removing that removed the worktable on his. Here's the updated performance tests comparing Chris' and my solution. I ran them with serial and parallel execution plans:
PRINT char(10)+'== Chris3 (serial)'+REPLICATE('========================================',2);
GO
DECLARE @st datetime = getdate(), @x char(1);
WITH Tally -- Chris3
AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n))
SELECT -- c.RowID,
@x = SUBSTRING(c.name, x.n, 1)
FROM #ListOfStrings c
CROSS APPLY
(
SELECT TOP(1) n
FROM (
SELECT TOP(LEN(c.name)) n = ROW_NUMBER() OVER (ORDER BY @@VERSION)
FROM Tally a, Tally b, Tally c
) t
WHERE CHARINDEX(SUBSTRING(c.name, t.n, 1), LEFT(c.name, n-1)) > 0
ORDER BY n
) x
OPTION (maxdop 1);
PRINT datediff(ms,@st,getdate());
GO 3PRINT char(10)+'== Alan (serial)'+REPLICATE('========================================',2);
GO
DECLARE @st datetime = getdate(), @x char(1);
SELECT @x = XCHar
FROM #ListOfStrings los
CROSS APPLY
(
SELECT TOP (1) token --, position
FROM dbo.ngrams8k(los.name,1)
WHERE CHARINDEX(token,SUBSTRING(los.name,1,CHECKSUM(position-1))) > 0
ORDER BY position
) ngEng(XCHar)
OPTION (maxdop 1);
PRINT datediff(ms,@st,getdate());
GO 3PRINT char(10)+'== Chris3 (parallel)'+REPLICATE('========================================',2);
GO
DECLARE @st datetime = getdate(), @x char(1);
WITH Tally -- Chris3
AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n))
SELECT -- c.RowID,
@x = SUBSTRING(c.name, x.n, 1)
FROM #ListOfStrings c
CROSS APPLY
(
SELECT TOP(1) n
FROM (
SELECT TOP(LEN(c.name)) n = ROW_NUMBER() OVER (ORDER BY @@VERSION)
FROM Tally a, Tally b, Tally c
) t
WHERE CHARINDEX(SUBSTRING(c.name, t.n, 1), LEFT(c.name, n-1)) > 0
ORDER BY n
) x
OPTION (querytraceon 8649);
PRINT datediff(ms,@st,getdate());
GO 3PRINT char(10)+'== Alan (parallel)'+REPLICATE('========================================',2);
GO
DECLARE @st datetime = getdate(), @x char(1);
SELECT @x = XCHar
FROM #ListOfStrings los
CROSS APPLY
(
SELECT TOP (1) token --, position
FROM dbo.ngrams8k(los.name,1)
WHERE CHARINDEX(token,SUBSTRING(los.name,1,CHECKSUM(position-1))) > 0
ORDER BY position
) ngEng(XCHar)
OPTION (querytraceon 8649);
PRINT datediff(ms,@st,getdate());
GO 3Results:
== Chris3 (serial)================================================================================
Beginning execution loop
174
170
163
Batch execution completed 3 times.== Alan (serial)================================================================================
Beginning execution loop
174
170
170
Batch execution completed 3 times.== Chris3 (parallel)================================================================================
Beginning execution loop
66
56
63
Batch execution completed 3 times.== Alan (parallel)================================================================================
Beginning execution loop
53
56
60
Batch execution completed 3 times.
Both identical. Ngrams just makes things simpler π
Certainly does π Thanks for this Alan, some improvements in here for my test harness.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 12, 2017 at 3:32 pm
By the logic of the request you should be looking not for the next occurence, but for the previous one.
Using John's harness I quickly baked this solution:WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
FROM sys.objects
)
SELECT
c.name,
c.object_id,
y.[FIRST RECURRING CHARACTER],
y.[FIRST RECURRING POSITION]
FROM sys.all_columns c
CROSS APPLY (
SELECT TOP 1
[FIRST RECURRING CHARACTER] = SUBSTRING(c.name,n,1),
[FIRST RECURRING POSITION] = n
FROM Tally t
WHERE n > 1 and t.n <= LEN(c.name)
and charindex(SUBSTRING(c.name,n,1), SUBSTRING(c.name,1, n-1), 1) > 0
ORDER BY n
) y;
Then I turned the page and found that Chris has already posted pretty much the same solution.
Appropriate logic made the queries easy winners.
_____________
Code for TallyGenerator
Viewing 2 posts - 46 through 46 (of 46 total)
You must be logged in to reply to this topic. Login to reply