October 30, 2017 at 4:27 pm
I think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier.
DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
;
SELECT TOP 1
NULLIF(SUBSTRING(@Str,N+1,1),'')
FROM dbo.Tally t
WHERE t.N <= LEN(@Str)
AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
ORDER BY t.N
;
I have NOT done any performance testing, yet. It's just a logical first-blush on my part.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2017 at 1:32 am
Jeff Moden - Monday, October 30, 2017 4:27 PMI think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier.
DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
;
SELECT TOP 1
NULLIF(SUBSTRING(@Str,N+1,1),'')
FROM dbo.Tally t
WHERE t.N <= LEN(@Str)
AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
ORDER BY t.N
;I have NOT done any performance testing, yet. It's just a logical first-blush on my part.
Hey @jeff - I just tested your query, I got something unexpected. The below query should return A but it returned NULL.
First solve the problem then write the code !
October 31, 2017 at 3:38 am
John Mitchell-245523 - Monday, October 30, 2017 10:38 AMGoodness, yes - I tried it with an on-the-fly table. Mine slowed down significantly and yours sped up a lot. I can't really explain that.I changed the tally CTE to this:
WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY x.m) AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(m)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(r)
)
and here are the times:Table 'Worktable'. Scan count 0, logical reads 217882, physical reads 0, read-ahead reads 187, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 40400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 100, logical reads 4700, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 1687 ms, elapsed time = 2331 ms.Table 'Worktable'. Scan count 17121, logical reads 50131, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 485 ms, elapsed time = 747 ms.John
Hey John, a quick investigation shows a filter immediately upstream of the CI scan of sysschobjs. The filter is object-level security:
has_access('CO',[Database].[sys].[sysschobjs].[id] as [o].[id])=(1)
and it operates against every single row.
Replace the reference to sysobjects with an inline tally table using VALUES and the two queries run in just about the same time.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 31, 2017 at 4:16 am
Chris
Good spot. I saw that when I looked in the execution plan. What still puzzles me, though, is that my code runs about three times as fast with the sys.objects tally CTE than it does with the virtual one! I've tried it on SQL Server 2008 R2 and 2014 and it's broadly similar. I even get a memory grant warning in the sys.objects execution plan but it still runs faster!
John
October 31, 2017 at 4:27 am
John Mitchell-245523 - Tuesday, October 31, 2017 4:16 AMChrisGood spot. I saw that when I looked in the execution plan. What still puzzles me, though, is that my code runs about three times as fast with the sys.objects tally CTE than it does with the virtual one! I've tried it on SQL Server 2008 R2 and 2014 and it's broadly similar. I even get a memory grant warning in the sys.objects execution plan but it still runs faster!
John
Your query invokes a table spool. The fast sysobjects plan stores one pull of the rows in the table spool and replays them as required for the join. My query pulls the rows as required from the table, hitting the filter every time.
With the inline tally version, it's the #ListOfStrings table rather than the tally table which is stored and replayed, and it appears to be less efficient.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 31, 2017 at 8:02 am
TheCTEGuy - Tuesday, October 31, 2017 1:32 AMJeff Moden - Monday, October 30, 2017 4:27 PMI think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier.
DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
;
SELECT TOP 1
NULLIF(SUBSTRING(@Str,N+1,1),'')
FROM dbo.Tally t
WHERE t.N <= LEN(@Str)
AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
ORDER BY t.N
;I have NOT done any performance testing, yet. It's just a logical first-blush on my part.
Hey @jeff - I just tested your query, I got something unexpected. The below query should return A but it returned NULL.
Please don't post pictures. Post code so that I can copy'n'paste.
Also, when you say "the first repeated character", I thought you meant the characters where supposed to be adjacent to each other so, no... my code won't do the job for what you want.
Shifting gears on this, why do you need to do this? Consider it to be a friendly mutual education. π
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2017 at 9:16 am
WITH cte AS (
SELECT t.N,SUBSTRING(@Str,N,1) AS [C],
ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Str,N,1) ORDER BY t.N DESC) AS [RowNum]
FROM dbo.Tally t
WHERE t.N <= LEN(@Str))
SELECT TOP(1) C
FROM cte
WHERE RowNum = 2
ORDER BY N ASC;
Far away is close at hand in the images of elsewhere.
Anon.
October 31, 2017 at 9:33 am
David Burrows - Tuesday, October 31, 2017 9:16 AM
WITH cte AS (
SELECT t.N,SUBSTRING(@Str,N,1) AS [C],
ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Str,N,1) ORDER BY t.N DESC) AS [RowNum]
FROM dbo.Tally t
WHERE t.N <= LEN(@Str))
SELECT TOP(1) C
FROM cte
WHERE RowNum = 2
ORDER BY N ASC;
Haven't tested it but that looks to be the ticket.
I hope the OP comes back with why they need to do it.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2017 at 10:50 am
Haven't tested it but that looks to be the ticket.
Except it will not handle repeats of 3 or more, this should though
WITH cte AS (
SELECT t.N,SUBSTRING(@Str,N,1) AS [C],
ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Str,N,1) ORDER BY t.N DESC) AS [RowNum]
FROM dbo.Tally t
WHERE t.N <= LEN(@Str))
SELECT TOP(1) C
FROM cte
WHERE RowNum >= 2
ORDER BY N ASC;
I hope the OP comes back with why they need to do it.
Yes, would be good to know
Far away is close at hand in the images of elsewhere.
Anon.
October 31, 2017 at 6:12 pm
Just because it looks like an interesting challenge... and because I think I understood the question differently...
Are you looking for the 1st character to be duplicated OR the character in the 1st duplication?
DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';
Is the correct answer "a" or "p"?... "a" is the is the character with a duplicate... but... "p" is actually the 1st character to be duplicated...
October 31, 2017 at 6:22 pm
Here are my solutions... One to fit either answer to my previous question...
DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';
-- if "a" is the correct answer...
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_Tally (n) AS (
SELECT TOP(LEN(@str))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b
)
SELECT TOP 1
position = MIN(t.n),
cv.char_val
FROM
cte_Tally t
CROSS APPLY ( VALUES (SUBSTRING(@str, t.n, 1)) ) cv (char_val)
GROUP BY
cv.char_val
HAVING
COUNT(1) > 1
ORDER BY
MIN(t.n);
GO
position char_val
-------------------- --------
1 a
...
DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';
-- if "p" is the correct answer...
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_Tally (n) AS (
SELECT TOP(LEN(@str))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b
),
cte_check_dupes AS (
SELECT
position = t.n,
cv.char_val,
dupe = dense_rank() OVER (PARTITION BY cv.char_val ORDER BY t.n)
FROM
cte_Tally t
CROSS APPLY ( VALUES (SUBSTRING(@str, t.n, 1)) ) cv (char_val)
)
SELECT TOP 1
fst_dup_loc = cd.position,
cd.char_val
FROM
cte_check_dupes cd
WHERE
cd.dupe > 1
ORDER BY
cd.position;
GO
fst_dup_loc char_val
-------------------- --------
21 p
November 1, 2017 at 2:14 am
Untested and just for fun
π
USE TEEST;
GO
IF object_id('tempdb..#ListOfStrings') IS NOT NULL DROP TABLE #ListOfStrings;
SELECT ROW_NUMBER() OVER (ORDER BY @@VERSION) AS ID,X.y
INTO #ListOfStrings
FROM (VALUES
('Ad'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb'),
('AdbicdA'),
('f1yf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Ad2bicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf3 Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbi4cdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbic5defghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Ad6bicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicde7fghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbi8cdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefg9hijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefgh1ijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicde2fghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghij3kiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefg4hijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijki5qtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiq6tfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghij7kiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtf8joen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijki9qtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjo0en gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb')
)X(y)
-- Inline Tally
SET STATISTICS IO, TIME ON;
;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))
,POS_DIST AS
(
SELECT
LOS.ID
,NUMS.N
,SUBSTRING(LOS.Y,NUMS.N,1) AS XCHAR
,CHARINDEX(SUBSTRING(LOS.Y,NUMS.N,1),LOS.y,NUMS.N + 1) - NUMS.N AS DISTANCE
FROM #ListOfStrings LOS
CROSS APPLY
(
SELECT TOP(LEN(LOS.y)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
FROM T T1,T T22,T T3,T T4
) NUMS(N)
WHERE CHARINDEX(SUBSTRING(LOS.Y,NUMS.N,1),LOS.y,NUMS.N + 1) - NUMS.N > 0
)
,FIND_MATCH AS
(
SELECT
PD.ID
,ROW_NUMBER() OVER (PARTITION BY PD.ID ORDER BY PD.N + PD.DISTANCE) XRID
,PD.XCHAR
FROM POS_DIST PD
)
SELECT
FM.ID
,FM.XCHAR
FROM FIND_MATCH FM
WHERE FM.XRID = 1;
SET STATISTICS IO, TIME OFF
The statistics(37 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ListOfStrings'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 29 ms.
November 1, 2017 at 2:38 am
Jason A. Long - Tuesday, October 31, 2017 6:12 PMJust because it looks like an interesting challenge... and because I think I understood the question differently...
Are you looking for the 1st character to be duplicated OR the character in the 1st duplication?
DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';
Is the correct answer "a" or "p"?... "a" is the is the character with a duplicate... but... "p" is actually the 1st character to be duplicated...
P should be returned coz it duplicated first.
First solve the problem then write the code !
November 1, 2017 at 2:42 am
Jeff Moden - Tuesday, October 31, 2017 8:02 AMTheCTEGuy - Tuesday, October 31, 2017 1:32 AMJeff Moden - Monday, October 30, 2017 4:27 PMI think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier.
DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
;
SELECT TOP 1
NULLIF(SUBSTRING(@Str,N+1,1),'')
FROM dbo.Tally t
WHERE t.N <= LEN(@Str)
AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
ORDER BY t.N
;I have NOT done any performance testing, yet. It's just a logical first-blush on my part.
Hey @jeff - I just tested your query, I got something unexpected. The below query should return A but it returned NULL.
Please don't post pictures. Post code so that I can copy'n'paste.
Also, when you say "the first repeated character", I thought you meant the characters where supposed to be adjacent to each other so, no... my code won't do the job for what you want.
Shifting gears on this, why do you need to do this? Consider it to be a friendly mutual education. π
@jeff - This is yet another interview question. I am just curious to how it could have been down in most optimized way. Hope this helps.
Thanks π
First solve the problem then write the code !
November 1, 2017 at 2:50 am
== John =============================================================
(39 rows affected)
Table 'Worktable'. Scan count 1, logical reads 2079, physical reads 0, ...
Table '#ListOfStrings_000000002824'. Scan count 1, logical reads 1, ...
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 32 ms.
== Chris ============================================================
(39 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, ...
Table '#ListOfStrings_000000002824'. Scan count 1, logical reads 1, ...
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 19 ms.
== EE ===============================================================
(39 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, ...
Table '#ListOfStrings_000000002824'. Scan count 1, logical reads 1, ...
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 11 ms.
Here's the code, I've tweaked them all so they generate the exact same result set:PRINT ''
PRINT '== John ========================================================================================================================================================';
SET STATISTICS IO, TIME ON;
WITH Tally -- John
AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n)),
Characters AS (
SELECT
c.RowID
--, c.Name
, SUBSTRING(c.name,n,1) AS CHR
, t.n
FROM Tally t
JOIN #ListOfStrings c
ON t.n <= LEN(c.name)
)
, Repeats AS (
SELECT
RowID
--, Name
, CHR
, ROW_NUMBER() OVER (PARTITION BY RowID, CHR ORDER BY n) AS RowNo
, n AS CharacterSeq
FROM Characters
)
, Recurrences AS (
SELECT
RowID
, CHR
, ROW_NUMBER() OVER (PARTITION BY RowID ORDER BY CharacterSeq) AS RecurSeq
FROM Repeats
WHERE RowNo = 2 -- first recurrence
)
SELECT
RowID
, CHR
FROM Recurrences
WHERE RecurSeq = 1
ORDER BY RowID;
SET STATISTICS IO, TIME OFF;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 16 through 30 (of 46 total)
You must be logged in to reply to this topic. Login to reply