How to find the first recurring character of a string.

  • I think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier. 


    DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
    ;
     SELECT TOP 1
            NULLIF(SUBSTRING(@Str,N+1,1),'')
       FROM dbo.Tally t
      WHERE t.N <= LEN(@Str)
        AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
      ORDER BY t.N
    ;

    I have NOT done any performance testing, yet.  It's just a logical first-blush on my part.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, October 30, 2017 4:27 PM

    I think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier. 


    DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
    ;
     SELECT TOP 1
            NULLIF(SUBSTRING(@Str,N+1,1),'')
       FROM dbo.Tally t
      WHERE t.N <= LEN(@Str)
        AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
      ORDER BY t.N
    ;

    I have NOT done any performance testing, yet.  It's just a logical first-blush on my part.

    Hey @jeff - I just tested your query, I got something unexpected. The below query should return A but it returned NULL.

    First solve the problem then write the code !

  • John Mitchell-245523 - Monday, October 30, 2017 10:38 AM

    Goodness, yes - I tried it with an on-the-fly table.  Mine slowed down significantly and yours sped up a lot.  I can't really explain that.

    I changed the tally CTE to this:
    WITH Tally AS (
      SELECT ROW_NUMBER() OVER (ORDER BY x.m) AS n
      FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(m)
        CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(r)
      )

    and here are the times:
    Table 'Worktable'. Scan count 0, logical reads 217882, physical reads 0, read-ahead reads 187, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 40400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 100, logical reads 4700, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 1687 ms, elapsed time = 2331 ms.

    Table 'Worktable'. Scan count 17121, logical reads 50131, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 485 ms, elapsed time = 747 ms.

    John

    Hey John, a quick investigation shows a filter immediately upstream of the CI scan of sysschobjs. The filter is object-level security:
    has_access('CO',[Database].[sys].[sysschobjs].[id] as [o].[id])=(1)
    and it operates against every single row.
    Replace the reference to sysobjects with an inline tally table using VALUES and the two queries run in just about the same time.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris

    Good spot.  I saw that when I looked in the execution plan.  What still puzzles me, though, is that my code runs about three times as fast with the sys.objects tally CTE than it does with the virtual one!  I've tried it on SQL Server 2008 R2 and 2014 and it's broadly similar.  I even get a memory grant warning in the sys.objects execution plan but it still runs faster!

    John

  • John Mitchell-245523 - Tuesday, October 31, 2017 4:16 AM

    Chris

    Good spot.  I saw that when I looked in the execution plan.  What still puzzles me, though, is that my code runs about three times as fast with the sys.objects tally CTE than it does with the virtual one!  I've tried it on SQL Server 2008 R2 and 2014 and it's broadly similar.  I even get a memory grant warning in the sys.objects execution plan but it still runs faster!

    John

    Your query invokes a table spool. The fast sysobjects plan stores one pull of the rows in the table spool and replays them as required for the join. My query pulls the rows as required from the table, hitting the filter every time.

    With the inline tally version, it's the #ListOfStrings table rather than the tally table which is stored and replayed, and it appears to be less efficient.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • TheCTEGuy - Tuesday, October 31, 2017 1:32 AM

    Jeff Moden - Monday, October 30, 2017 4:27 PM

    I think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier. 


    DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
    ;
     SELECT TOP 1
            NULLIF(SUBSTRING(@Str,N+1,1),'')
       FROM dbo.Tally t
      WHERE t.N <= LEN(@Str)
        AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
      ORDER BY t.N
    ;

    I have NOT done any performance testing, yet.  It's just a logical first-blush on my part.

    Hey @jeff - I just tested your query, I got something unexpected. The below query should return A but it returned NULL.

    Please don't post pictures.  Post code so that I can copy'n'paste.

    Also, when you say "the first repeated character", I thought you meant the characters where supposed to be adjacent to each other so, no... my code won't do the job for what you want.

    Shifting gears on this, why do you need to do this?  Consider it to be a friendly mutual education. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)


  • WITH cte AS (
     SELECT t.N,SUBSTRING(@Str,N,1) AS [C],
       ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Str,N,1) ORDER BY t.N DESC) AS [RowNum]
     FROM dbo.Tally t
     WHERE t.N <= LEN(@Str))
    SELECT TOP(1) C
    FROM cte
    WHERE RowNum = 2
    ORDER BY N ASC;

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows - Tuesday, October 31, 2017 9:16 AM


    WITH cte AS (
     SELECT t.N,SUBSTRING(@Str,N,1) AS [C],
       ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Str,N,1) ORDER BY t.N DESC) AS [RowNum]
     FROM dbo.Tally t
     WHERE t.N <= LEN(@Str))
    SELECT TOP(1) C
    FROM cte
    WHERE RowNum = 2
    ORDER BY N ASC;

    Haven't tested it but that looks to be the ticket.

    I hope the OP comes back with why they need to do it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Haven't tested it but that looks to be the ticket.

    Except it will not handle repeats of 3 or more, this should though


    WITH cte AS (
        SELECT t.N,SUBSTRING(@Str,N,1) AS [C],
               ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Str,N,1) ORDER BY t.N DESC) AS [RowNum]
        FROM dbo.Tally t
        WHERE t.N <= LEN(@Str))
    SELECT TOP(1) C
    FROM cte
    WHERE RowNum >= 2
    ORDER BY N ASC;

    I hope the OP comes back with why they need to do it.

    Yes, would be good to know

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Just because it looks like an interesting challenge... and because I think I understood the question differently...
    Are you looking for the 1st character to be duplicated OR the character in the 1st duplication?

    DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';

    Is the correct answer "a" or "p"?... "a" is the is the character with a duplicate... but... "p" is actually the 1st character to be duplicated...

  • Here are my solutions... One to fit either answer to my previous question...

    DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';

    -- if "a" is the correct answer...
    WITH
        cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
        cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
        cte_Tally (n) AS (
            SELECT TOP(LEN(@str))
                ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
            FROM
                cte_n2 a CROSS JOIN cte_n2 b
            )
    SELECT TOP 1
        position = MIN(t.n),
        cv.char_val
    FROM
        cte_Tally t
        CROSS APPLY ( VALUES (SUBSTRING(@str, t.n, 1)) ) cv (char_val)
    GROUP BY
        cv.char_val
    HAVING
        COUNT(1) > 1
    ORDER BY
        MIN(t.n);
    GO

    position     char_val
    -------------------- --------
    1       a

    ...

    DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';

    -- if "p" is the correct answer...
    WITH
        cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
        cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
        cte_Tally (n) AS (
            SELECT TOP(LEN(@str))
                ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
            FROM
                cte_n2 a CROSS JOIN cte_n2 b
            ),
        cte_check_dupes AS (
            SELECT
                position = t.n,
                cv.char_val,
                dupe = dense_rank() OVER (PARTITION BY cv.char_val ORDER BY t.n)
            FROM
                cte_Tally t
                CROSS APPLY ( VALUES (SUBSTRING(@str, t.n, 1)) ) cv (char_val)
            )
    SELECT TOP 1
        fst_dup_loc = cd.position,
        cd.char_val
    FROM
        cte_check_dupes cd
    WHERE
        cd.dupe > 1
    ORDER BY
        cd.position;
    GO

    fst_dup_loc    char_val
    -------------------- --------
    21       p

  • Untested and just for fun
    😎

    USE TEEST;
    GO

    IF object_id('tempdb..#ListOfStrings') IS NOT NULL DROP TABLE #ListOfStrings;
    SELECT ROW_NUMBER() OVER (ORDER BY @@VERSION) AS ID,X.y
    INTO #ListOfStrings
    FROM (VALUES
    ('Ad'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('Adbbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb'),
    ('AdbicdA'),
    ('f1yf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Ad2bicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf3 Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbi4cdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('Adbic5defghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Ad6bicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicde7fghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbi8cdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefg9hijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
    ('Adbicdefgh1ijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicde2fghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghij3kiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefg4hijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijki5qtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
    ('Adbicdefghijkiq6tfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghij7kiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtf8joen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
    ('fyf Adbicdefghijki9qtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
    ('Adbicdefghijkiqtfjo0en gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb')
    )X(y)

    -- Inline Tally
    SET STATISTICS IO, TIME ON;
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))
    ,POS_DIST AS
    (
      SELECT
       LOS.ID
       ,NUMS.N
       ,SUBSTRING(LOS.Y,NUMS.N,1) AS XCHAR
       ,CHARINDEX(SUBSTRING(LOS.Y,NUMS.N,1),LOS.y,NUMS.N + 1) - NUMS.N AS DISTANCE
      FROM #ListOfStrings LOS
      CROSS APPLY
      (
       SELECT TOP(LEN(LOS.y)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
       FROM T T1,T T22,T T3,T T4
      ) NUMS(N)
      WHERE CHARINDEX(SUBSTRING(LOS.Y,NUMS.N,1),LOS.y,NUMS.N + 1) - NUMS.N > 0
    )
    ,FIND_MATCH AS
    (
    SELECT
      PD.ID
     ,ROW_NUMBER() OVER (PARTITION BY PD.ID ORDER BY PD.N + PD.DISTANCE) XRID
     ,PD.XCHAR
    FROM  POS_DIST  PD
    )
    SELECT
      FM.ID
     ,FM.XCHAR
    FROM  FIND_MATCH FM
    WHERE FM.XRID = 1;

    SET STATISTICS IO, TIME OFF

    The statistics
    (37 row(s) affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#ListOfStrings'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 15 ms, elapsed time = 29 ms.

  • Jason A. Long - Tuesday, October 31, 2017 6:12 PM

    Just because it looks like an interesting challenge... and because I think I understood the question differently...
    Are you looking for the 1st character to be duplicated OR the character in the 1st duplication?

    DECLARE @STR VARCHAR(8000) = 'abcdefghijklmnopqrstponmlkjihgfedcba';

    Is the correct answer "a" or "p"?... "a" is the is the character with a duplicate... but... "p" is actually the 1st character to be duplicated...

    P should be returned coz it duplicated first.

    First solve the problem then write the code !

  • Jeff Moden - Tuesday, October 31, 2017 8:02 AM

    TheCTEGuy - Tuesday, October 31, 2017 1:32 AM

    Jeff Moden - Monday, October 30, 2017 4:27 PM

    I think that maybe looking at the 2nd character of a two character match may make optimization of all this a bit easier. 


    DECLARE @STR VARCHAR(100) = 'AdbiicdffA'
    ;
     SELECT TOP 1
            NULLIF(SUBSTRING(@Str,N+1,1),'')
       FROM dbo.Tally t
      WHERE t.N <= LEN(@Str)
        AND (SUBSTRING(@Str,t.N,1) = SUBSTRING(@Str,t.N+1,1) OR t.N = LEN(@Str))
      ORDER BY t.N
    ;

    I have NOT done any performance testing, yet.  It's just a logical first-blush on my part.

    Hey @jeff - I just tested your query, I got something unexpected. The below query should return A but it returned NULL.

    Please don't post pictures.  Post code so that I can copy'n'paste.

    Also, when you say "the first repeated character", I thought you meant the characters where supposed to be adjacent to each other so, no... my code won't do the job for what you want.

    Shifting gears on this, why do you need to do this?  Consider it to be a friendly mutual education. πŸ˜‰

    @jeff - This is yet another interview question. I am just curious to how it could have been down in most optimized way. Hope this helps.

    Thanks πŸ™‚

    First solve the problem then write the code !

  • Nice job, Eirikur. It's the fastest by a useful margin:


    == John =============================================================

    (39 rows affected)

    Table 'Worktable'. Scan count 1, logical reads 2079, physical reads 0, ...

    Table '#ListOfStrings_000000002824'. Scan count 1, logical reads 1, ...

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 32 ms.

    == Chris ============================================================

    (39 rows affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, ...

    Table '#ListOfStrings_000000002824'. Scan count 1, logical reads 1, ...

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 19 ms.

    == EE ===============================================================

    (39 rows affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, ...

    Table '#ListOfStrings_000000002824'. Scan count 1, logical reads 1, ...

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 11 ms.


    Here's the code, I've tweaked them all so they generate the exact same result set:
    PRINT ''
    PRINT '== John ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    WITH Tally -- John
     AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n)),
    Characters AS (
      SELECT
           c.RowID
     --,   c.Name
        ,   SUBSTRING(c.name,n,1) AS CHR
        ,   t.n
      FROM Tally t
      JOIN #ListOfStrings c
      ON t.n <= LEN(c.name)
      )
    , Repeats AS (
      SELECT
           RowID
     --,   Name   
     ,   CHR
        ,   ROW_NUMBER() OVER (PARTITION BY RowID, CHR ORDER BY n) AS RowNo
        ,   n AS CharacterSeq
      FROM Characters
      )
    , Recurrences AS (
      SELECT
            RowID   
      ,   CHR
      ,   ROW_NUMBER() OVER (PARTITION BY RowID ORDER BY CharacterSeq) AS RecurSeq
      FROM Repeats
      WHERE RowNo = 2 -- first recurrence
      )
    SELECT
        RowID
    ,   CHR
    FROM Recurrences
    WHERE RecurSeq = 1
    ORDER BY RowID;
    SET STATISTICS IO, TIME OFF;

    PRINT ''
    PRINT '== Chris ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    WITH Tally -- Chris
     AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n))
    SELECT
         c.RowID
    ,    y.XCHAR
    FROM #ListOfStrings c
    CROSS APPLY (
        SELECT TOP(1) [XCHAR] = SUBSTRING(c.name,n,1)
        FROM Tally t
        CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(c.name,n,1),c.name,n+1)) x
        WHERE LEN(c.name) >= n AND x.NextPos > 0
        ORDER BY x.NextPos
        ) y
    ORDER BY c.RowID;
    SET STATISTICS IO, TIME OFF;
    PRINT ''
    PRINT '== EE ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N))
     ,POS_DIST AS
     (
       SELECT
        LOS.RowID
       ,NUMS.N
       ,SUBSTRING(LOS.[name],NUMS.N,1) AS XCHAR
       ,CHARINDEX(SUBSTRING(LOS.[name],NUMS.N,1),LOS.[name],NUMS.N + 1) - NUMS.N AS DISTANCE
       FROM #ListOfStrings  LOS
       CROSS APPLY
       (
        SELECT TOP(LEN(LOS.[name])) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
        FROM T T1,T T22,T T3,T T4
       ) NUMS(N)
       WHERE CHARINDEX(SUBSTRING(LOS.[name],NUMS.N,1),LOS.[name],NUMS.N + 1) - NUMS.N > 0
    )
    ,FIND_MATCH AS
    (
    SELECT
      PD.RowID
     ,ROW_NUMBER() OVER (PARTITION BY PD.RowID ORDER BY PD.N + PD.DISTANCE) XRID
     ,PD.XCHAR
    FROM  POS_DIST  PD
    )
    SELECT
      FM.RowID
     ,FM.XCHAR
    FROM  FIND_MATCH  FM
    WHERE FM.XRID = 1
    ORDER BY FM.RowID;
    SET STATISTICS IO, TIME OFF;
    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply