October 30, 2017 at 1:55 am
I have a query that finds the first recurring character in a string.
For eg : if @STR = 'ABCCDA' then i should C as answer as it recurred first and not A(it also recurred but C recurred before it). If there are no recurring charcters then it should return either NULL or none.
I am currently using tally table (in other words a number table) and inner join.
Is there a bettre way to do the same ?
DECLARE @STR VARCHAR(50) = 'Adbcdefghijki'
;WITH CTE AS(
SELECT SUBSTRING(@str,n,1) CHR,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RN FROM dbo.tally
WHERE LEN(@str) >= N
)
SELECT TOP 1 CASE WHEN A.CHR IS NOT NULL THEN A.CHR ELSE 'NONE' END AS FIRST_RECURRING_CHARACTER FROM CTE A
INNER JOIN CTE B
ON A.CHR = B.CHR
WHERE A.RN > B.RN
ORDER BY A.RN
GO
First solve the problem then write the code !
October 30, 2017 at 5:00 am
Try this , tweak the script to suit your needs.
--declare variable, type: table
DECLARE @MyTable TABLE (Input NVARCHAR(30))
--insert sample values
INSERT INTO @MyTable (Input)
VALUES ('abccda')
--here CTE begins:
;WITH CTE AS
(
--initial query
SELECT Input, CONVERT(VARCHAR(1),LEFT(Input,1)) AS Letter, RIGHT(Input, LEN(Input)-1) AS Remainder
FROM @MyTable
WHERE LEN(Input)>1
--recursive part
UNION ALL
--recursive query
SELECT Input, CONVERT(VARCHAR(1),LEFT(Remainder,1)) AS Letter,
RIGHT(Remainder, LEN(Remainder)-1) AS Remainder
FROM CTE
WHERE LEN(Remainder)>0
)
SELECT Input, Letter, COUNT(Letter) AS CountOfLetter
FROM CTE
GROUP BY Input, Letter
HAVING COUNT(Letter)>1
ORDER by 2
SQL 2000/2005/2008/2012 DBA - MCTS/MCITP
October 30, 2017 at 5:07 am
Kenny Jozi - Monday, October 30, 2017 5:00 AMTry this , tweak the script to suit your needs.
--declare variable, type: table
DECLARE @MyTable TABLE (Input NVARCHAR(30))--insert sample values
INSERT INTO @MyTable (Input)
VALUES ('abccda')--here CTE begins:
;WITH CTE AS
(
--initial query
SELECT Input, CONVERT(VARCHAR(1),LEFT(Input,1)) AS Letter, RIGHT(Input, LEN(Input)-1) AS Remainder
FROM @MyTable
WHERE LEN(Input)>1
--recursive part
UNION ALL
--recursive query
SELECT Input, CONVERT(VARCHAR(1),LEFT(Remainder,1)) AS Letter,
RIGHT(Remainder, LEN(Remainder)-1) AS Remainder
FROM CTE
WHERE LEN(Remainder)>0
)
SELECT Input, Letter, COUNT(Letter) AS CountOfLetter
FROM CTE
GROUP BY Input, Letter
HAVING COUNT(Letter)>1
ORDER by 2
Your query is also giving me 2 scan counts. And also its using RECURSIVE CTE which are not a good( Jeff once told me) when we talk about performance.
Any other way ?
First solve the problem then write the code !
October 30, 2017 at 5:22 am
This should perform better than either a self-join or a recursive CTE. The first CTE isn't part of the solution - I just used it because I don't have a Tally table on my server. Whichever solution you choose, make sure that if you turn it into a function, you make it an inline table-valued function and not a scalar-valued function.
DECLARE @STR VARCHAR(50) = 'Adbcdefghijki';
WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
FROM sys.objects
)
, Characters AS (
SELECT
SUBSTRING(@str,n,1) AS CHR
, n
FROM Tally
WHERE n <= LEN(@str)
)
, Repeats AS (
SELECT
CHR
, ROW_NUMBER() OVER (PARTITION BY CHR ORDER BY n) AS RowNo
, n AS CharacterSeq
FROM Characters
)
, Recurrences AS (
SELECT
CHR
, RowNo
, CharacterSeq
, ROW_NUMBER() OVER (ORDER BY CharacterSeq) AS RecurSeq
FROM Repeats
WHERE RowNo = 2 -- first recurrence
)
SELECT
CHR
, CharacterSeq
FROM Recurrences
WHERE RecurSeq = 1;
John
October 30, 2017 at 5:29 am
John Mitchell-245523 - Monday, October 30, 2017 5:22 AMThis should perform better than either a self-join or a recursive CTE. The first CTE isn't part of the solution - I just used it because I don't have a Tally table on my server. Whichever solution you choose, make sure that if you turn it into a function, you make it an inline table-valued function and not a scalar-valued function.
DECLARE @STR VARCHAR(50) = 'Adbcdefghijki';
WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
FROM sys.objects
)
, Characters AS (
SELECT
SUBSTRING(@str,n,1) AS CHR
, n
FROM Tally
WHERE n <= LEN(@str)
)
, Repeats AS (
SELECT
CHR
, ROW_NUMBER() OVER (PARTITION BY CHR ORDER BY n) AS RowNo
, n AS CharacterSeq
FROM Characters
)
, Recurrences AS (
SELECT
CHR
, RowNo
, CharacterSeq
, ROW_NUMBER() OVER (ORDER BY CharacterSeq) AS RecurSeq
FROM Repeats
WHERE RowNo = 2 -- first recurrence
)
SELECT
CHR
, CharacterSeq
FROM Recurrences
WHERE RecurSeq = 1;John
Thats good @john-2 - Actually I dont know why I was after RANK() func .... anyways good one!
First solve the problem then write the code !
October 30, 2017 at 6:30 am
Here's a compact alternative:
DECLARE @STR VARCHAR(100) = 'AdbicdA'
SELECT TOP(1) [CHR] = SUBSTRING(@STR,n,1)
FROM dbo.Tally t
WHERE LEN(@STR) >= n
ORDER BY ISNULL(NULLIF(CHARINDEX(SUBSTRING(@STR,n,1),@STR,n+1),0),99999)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 30, 2017 at 6:35 am
@chris-2 the code you posted seems to be giving wrong result
First solve the problem then write the code !
October 30, 2017 at 7:17 am
TheCTEGuy - Monday, October 30, 2017 6:35 AM@chris-2 the code you posted seems to be giving wrong result
Thanks - not enough testing. Here's a fix:
SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(@STR,n,1)
FROM dbo.Tally t
CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(@STR,n,1),@STR,n+1)) x
WHERE LEN(@STR) >= n AND x.NextPos > 0
ORDER BY x.NextPos
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 30, 2017 at 8:40 am
Chris
Your code is certainly more compact, but I couldn't find any way to make it faster than mine. I don't know whether that's to do with all the string manipulations that yours does, or the way I extrapolated it all out in order to return the first duplicated character in the sys.all_columns view of a largeish database. Anyway, here are the results:-- John
WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
FROM sys.objects
)
, Characters AS (
SELECT
c.object_id
, c.column_id
, c.name
, SUBSTRING(c.name,n,1) AS CHR
, t.n
FROM Tally t
JOIN sys.all_columns c
ON t.n <= LEN(c.name)
)
, Repeats AS (
SELECT
object_id
, column_id
, name
, CHR
, ROW_NUMBER() OVER (PARTITION BY object_id, column_id, CHR ORDER BY n) AS RowNo
, n AS CharacterSeq
FROM Characters
)
, Recurrences AS (
SELECT
name
, CHR
, CharacterSeq
, ROW_NUMBER() OVER (PARTITION BY object_id, column_id ORDER BY CharacterSeq) AS RecurSeq
FROM Repeats
WHERE RowNo = 2 -- first recurrence
)
SELECT
name
, CHR
, CharacterSeq
FROM Recurrences
WHERE RecurSeq = 1;
--Table 'Worktable'. Scan count 8562, logical reads 17887, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'sysschobjs'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--SQL Server Execution Times:
-- CPU time = 391 ms, elapsed time = 872 ms.
-- Chris
WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
FROM sys.objects
)
SELECT
c.name
, c.object_id
, y.FIRST_RECURRING_CHARACTER
FROM sys.all_columns c
CROSS APPLY (
SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(c.name,n,1)
FROM Tally t
CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(c.name,n,1),c.name,n+1)) x
WHERE LEN(c.name) >= n AND x.NextPos > 0
ORDER BY x.NextPos
) y;
--Table 'Worktable'. Scan count 17121, logical reads 50128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'sysschobjs'. Scan count 3852, logical reads 142524, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--SQL Server Execution Times:
-- CPU time = 2516 ms, elapsed time = 3469 ms.
October 30, 2017 at 9:14 am
IF object_id('tempdb..#ListOfStrings') IS NOT NULL DROP TABLE #ListOfStrings;
SELECT *
INTO #ListOfStrings
FROM (VALUES
('Ad'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb'),
('AdbicdA'),
('f1yf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Ad2bicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf3 Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbi4cdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbic5defghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Ad6bicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicde7fghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbi8cdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefg9hijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefgh1ijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicde2fghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghij3kiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefg4hijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijki5qtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
('Adbicdefghijkiq6tfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghij7kiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtf8joen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
('fyf Adbicdefghijki9qtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
('Adbicdefghijkiqtfjo0en gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb')
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 30, 2017 at 9:50 am
Chris
Yes, you appear to have changed my PARTITION BY clauses. The first one should be PARTITION BY name, CHR; the second should be PARTITION BY name. Of course, this relies on names being unique, which they are in your data set.
John
October 30, 2017 at 9:56 am
John Mitchell-245523 - Monday, October 30, 2017 9:50 AMChrisYes, you appear to have changed my PARTITION BY clauses. The first one should be PARTITION BY name, CHR; the second should be PARTITION BY name. Of course, this relies on names being unique, which they are in your data set.
John
Found some time to get back onto it and instead of using the name as a uniquifier I'm putting in a RowID, it's slendererer fewer characters. Sorry about that.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 30, 2017 at 10:17 am
PRINT '== Chris ========================================================================================================================================================';
SET STATISTICS IO, TIME ON;
WITH Tally AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM sys.objects)
SELECT
c.name
, y.FIRST_RECURRING_CHARACTER
FROM #ListOfStrings c
CROSS APPLY (
SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(c.name,n,1)
FROM Tally t
CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(c.name,n,1),c.name,n+1)) x
WHERE LEN(c.name) >= n AND x.NextPos > 0
ORDER BY x.NextPos
) y
ORDER BY NAME;
SET STATISTICS IO, TIME OFF;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 30, 2017 at 10:38 am
Goodness, yes - I tried it with an on-the-fly table. Mine slowed down significantly and yours sped up a lot. I can't really explain that.
I changed the tally CTE to this:WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY x.m) AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(m)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(r)
)
and here are the times:Table 'Worktable'. Scan count 0, logical reads 217882, physical reads 0, read-ahead reads 187, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 40400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 100, logical reads 4700, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1687 ms, elapsed time = 2331 ms.
Table 'Worktable'. Scan count 17121, logical reads 50131, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 485 ms, elapsed time = 747 ms.
John
October 30, 2017 at 10:46 am
John Mitchell-245523 - Monday, October 30, 2017 10:38 AMGoodness, yes - I tried it with an on-the-fly table. Mine slowed down significantly and yours sped up a lot. I can't really explain that.I changed the tally CTE to this:
WITH Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY x.m) AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(m)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(r)
)
and here are the times:Table 'Worktable'. Scan count 0, logical reads 217882, physical reads 0, read-ahead reads 187, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 40400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 100, logical reads 4700, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 1687 ms, elapsed time = 2331 ms.Table 'Worktable'. Scan count 17121, logical reads 50131, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 485 ms, elapsed time = 747 ms.John
I guess the plan of one of the queries (or possibly both) is sensitive to small changes. Might be worth spending more time on this.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 46 total)
You must be logged in to reply to this topic. Login to reply