How to compare strings by letters in certain places

  • Hello All,

    I need help to compare strings, please. For example, I have two strings with various words: one has six letters long words and the other - with five. I need to find out the words which have certain letters in certain positions. For example I want to see Betty for five letters and Bertha for six, if there is a request to find all the words / strings with same letters in the positions 1,2 and 4. I need some basic help, which I will then try to extend, please.

    So far I have done only the following and obviously far away from what I need. Any help will be truly appreciated:

    DECLARE @six_letters NVARCHAR(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed'  ;
    DECLARE @five_letters NVARCHAR(MAX) = ' floor, beTtY, carry, trees, sleep, tells';

    SELECT UPPER(LTRIM(value)) AS six_legth
    FROM STRING_SPLIT(@six_letters, ',')
    WHERE UPPER(LEFT(LTRIM(value),2)) IN (
    SELECT UPPER(LEFT(LTRIM(value),2))
    FROM STRING_SPLIT(@five_letters, ',')
    )
    ORDER BY 1;


    SELECT UPPER(LTRIM(value)) AS five_length
    FROM STRING_SPLIT(@five_letters, ',')
    WHERE UPPER(LEFT(LTRIM(value),2)) IN (
    SELECT UPPER(LEFT(LTRIM(value),2))
    FROM STRING_SPLIT(@six_letters, ',')
    )
    ORDER BY 1;
  • Based on the example data provided, what output are you hoping for?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • DECLARE @six_letters NVARCHAR(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed'  ;
    DECLARE @five_letters NVARCHAR(MAX) = ' floor, beTtY, carry, trees, sleep, tells';

    declare @input_letter_pairs table (letter nchar(1),
    position int,
    unique(letter, position));

    insert @input_letter_pairs(letter, position) values
    ('b', 1),
    ('e', 2);

    ;with six_letters_cte(ItemNumber, Item, n, split_letter) as (
    select dsl1.*, t.n, substring(dsl1_trim.item_trim, t.n, 1) split_letter
    from
    dbo.DelimitedSplitN4K(trim(@six_letters), ',') dsl1
    cross apply
    (select iif(left(dsl1.Item, 1)=' ', right(dsl1.Item, len(dsl1.Item)-1), dsl1.Item) item_trim ) dsl1_trim
    cross apply
    fnTally(1, len(dsl1_trim.item_trim)) t)
    select * /*count(*) matched_pair_count*/
    from
    six_letters_cte slc
    join
    @input_letter_pairs ilp on slc.n=ilp.position
    and slc.split_letter=ilp.letter;

    • This reply was modified 4 years, 4 months ago by  Steve Collins.
    • This reply was modified 4 years, 4 months ago by  Steve Collins.
    • This reply was modified 4 years, 4 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I would expect to see Bertha and Betty (doesn't matter the order nor if it is all lower / UPPER case).

    Thank you.

  • Here's for six letters.  It returns Bertha.  To do for five letters would follow a similar pattern.

    DECLARE @six_letters NVARCHAR(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed'  ;

    declare @input_letter_pairs table (letter nchar(1),
    position int,
    unique(letter, position));

    insert @input_letter_pairs(letter, position) values
    ('b', 1),
    ('e', 2);

    ;with
    six_letters_cte(ItemNumber, Item, n, split_letter) as (
    select dsl1.*, t.n, substring(dsl1_trim.item_trim, t.n, 1) split_letter
    from
    dbo.DelimitedSplitN4K(trim(@six_letters), ',') dsl1
    cross apply
    (select iif(left(dsl1.Item, 1)=' ', right(dsl1.Item, len(dsl1.Item)-1), dsl1.Item) item_trim ) dsl1_trim
    cross apply
    dbo.fnTally(1, len(dsl1_trim.item_trim)) t)
    select ItemNumber, Item
    from
    six_letters_cte slc
    join
    @input_letter_pairs ilp on slc.n=ilp.position
    and slc.split_letter=ilp.letter
    group by
    ItemNumber, Item
    having count(*)>1;

    • This reply was modified 4 years, 4 months ago by  Steve Collins.
    • This reply was modified 4 years, 4 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thank you, Steve

     

    Sadly none of your examples works for me from the Studio - getting various error messages.

     

  • Maybe you do not have an fnTally function?  If so that's easily remedied.  Have a look at this thread.  What are the specific messages?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • BOR15K wrote:

    I need some basic help, which I will then try to extend, please.

    Since very few people ever come back to tell us, I'll make you a deal...  Tell us what the extended use is for and I'll show you the solution I just wrote that might already be extended to do what you ultimately need it for.  😉

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BOR15K wrote:

    Thank you, Steve

    Sadly none of your examples works for me from the Studio - getting various error messages.

    The code probably worked just fine for Steve.  When you get such messages, you should post them so we can help you figure things out. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Take this for a spin...


    DECLARE
    @array_6 varchar(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed',
    @array_5 varchar(MAX) = ' floor, beTtY, carry, trees, sleep, tells',
    @search_positions varchar(12) = '1,2,4';

    WITH
    cte_a6 AS (
    SELECT
    val_6 = CONVERT(char(6), TRIM(ss6.value)),
    match_on = STRING_AGG(lp6.letter_in_pos, '') WITHIN GROUP (ORDER BY t6.n)
    FROM
    STRING_SPLIT(@array_6, ',') ss6
    CROSS APPLY (VALUES (1),(2),(3),(4),(5),(6) ) t6 (n)
    CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss6.value), t6.n, 1)) ) lp6 (letter_in_pos)
    WHERE
    t6.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
    GROUP BY
    ss6.value
    ),
    cte_a5 AS (
    SELECT
    val_5 = CONVERT(char(5), TRIM(ss5.value)),
    match_on = STRING_AGG(lp5.letter_in_pos, '') WITHIN GROUP (ORDER BY t5.n)
    FROM
    STRING_SPLIT(@array_5, ',') ss5
    CROSS APPLY (VALUES (1),(2),(3),(4),(5) ) t5 (n)
    CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss5.value), t5.n, 1)) ) lp5 (letter_in_pos)
    WHERE
    t5.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
    GROUP BY
    ss5.value
    )
    SELECT
    a6.val_6,
    --a6.match_on,
    a5.val_5
    --a5.match_on
    FROM
    cte_a6 a6
    JOIN cte_a5 a5
    ON a6.match_on = a5.match_on;

    Returns...

    val_6  val_5
    ------ -----
    Bertha beTtY

    • This reply was modified 4 years, 4 months ago by  Jason A. Long.
    • This reply was modified 4 years, 4 months ago by  Jason A. Long.
  • Jason A. Long wrote:

    Take this for a spin...


    DECLARE
    @array_6 varchar(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed',
    @array_5 varchar(MAX) = ' floor, beTtY, carry, trees, sleep, tells',
    @search_positions varchar(12) = '1,2,4';

    WITH
    cte_a6 AS (
    SELECT
    val_6 = CONVERT(char(6), TRIM(ss6.value)),
    match_on = STRING_AGG(lp6.letter_in_pos, '') WITHIN GROUP (ORDER BY t6.n)
    FROM
    STRING_SPLIT(@array_6, ',') ss6
    CROSS APPLY (VALUES (1),(2),(3),(4),(5),(6) ) t6 (n)
    CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss6.value), t6.n, 1)) ) lp6 (letter_in_pos)
    WHERE
    t6.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
    GROUP BY
    ss6.value
    ),
    cte_a5 AS (
    SELECT
    val_5 = CONVERT(char(5), TRIM(ss5.value)),
    match_on = STRING_AGG(lp5.letter_in_pos, '') WITHIN GROUP (ORDER BY t5.n)
    FROM
    STRING_SPLIT(@array_5, ',') ss5
    CROSS APPLY (VALUES (1),(2),(3),(4),(5) ) t5 (n)
    CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss5.value), t5.n, 1)) ) lp5 (letter_in_pos)
    WHERE
    t5.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
    GROUP BY
    ss5.value
    )
    SELECT
    a6.val_6,
    --a6.match_on,
    a5.val_5
    --a5.match_on
    FROM
    cte_a6 a6
    JOIN cte_a5 a5
    ON a6.match_on = a5.match_on;

    Returns...

    val_6  val_5
    ------ -----
    Bertha beTtY

    Heh... it's amazing how similar that is to the code I wrote (still holding out on that because I really am curious what the end game is for this very interesting problem).  We even used similar column names.  You used "Match_On" and I used "MatchOn".

    One difference is that I don't have a limit on the word widths and I don't use a numeric sequence.  I also used a single CTE for the "positions" whereas you used one in each WHERE clause... both will produce similar execution plans though because a CTE is re-executed every time it's called.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    BOR15K wrote:

    Thank you, Steve

    Sadly none of your examples works for me from the Studio - getting various error messages.

    The code probably worked just fine for Steve.  When you get such messages, you should post them so we can help you figure things out. 😉

    Yes, thank you for the benefit of the doubt here.  I'm afraid I've made a mess of it though.  I was playing around with splitters and I've updated the code several times with different ones.  At one point I changed it to string_split and that would've caused errors.  It should've been (the whole time) dbo.DelimitedSplitN4K.  Anyway, Jason's code looks great and it's doesn't use a tally tvf or an ordinal splitter so it's more appropriate

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thank you All!

    Steve, apologies - I had no intention to offend you. I have looked into fnTally / the link you have kindly provided.

     

    Much obliged.

  • Jeff Moden wrote:

    Heh... it's amazing how similar that is to the code I wrote (still holding out on that because I really am curious what the end game is for this very interesting problem).  We even used similar column names.  You used "Match_On" and I used "MatchOn".

    One difference is that I don't have a limit on the word widths and I don't use a numeric sequence.  I also used a single CTE for the "positions" whereas you used one in each WHERE clause... both will produce similar execution plans though because a CTE is re-executed every time it's called.

    Jeff - Considering the fact that that I'm consistently impressed by the solutions that you come up with, I'm forced to take that as a compliment. 😀  I hope you decide to post your solution no matter how the OP responds. I always seem to find some new nugget of gold in your solutions.

    Anyway, your comment regarding the limit on word length got me thinking... I initially did it that way because the OP was pretty clear about the word lengths in the criteria so I figured that would be a safe move to make at the time. That said, your comment did shake something loose and it occurred to me that I didn't need the inline tally at all!!!

    It, in fact, occurred to me that using a sequential tally to produce rows for every character and then filtering it later with another list of numbers was just plain stupid! Why not just use the list of numbers all by itself?

    It eliminates the word length limits, the unneeded tally rows, the tally itself and the WHERE clause... and it makes for a cleaner bit of code and faster compile times.

     

    DECLARE
    @array_6 varchar(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed',
    @array_5 varchar(MAX) = ' floor, beTtY, carry, trees, sleep, tells',
    @search_positions varchar(12) = '1,2,4';

    WITH
    cte_a6 AS (
    SELECT
    val_6 = CONVERT(varchar(20), TRIM(ss6.value)),
    match_on = STRING_AGG(lp6.letter_in_pos, '') WITHIN GROUP (ORDER BY sp6.n)
    FROM
    STRING_SPLIT(@array_6, ',') ss6
    CROSS APPLY (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp) sp6 (n)
    CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss6.value), sp6.n, 1)) ) lp6 (letter_in_pos)
    GROUP BY
    ss6.value
    ),
    cte_a5 AS (
    SELECT
    val_5 = CONVERT(varchar(20), TRIM(ss5.value)),
    match_on = STRING_AGG(lp5.letter_in_pos, '') WITHIN GROUP (ORDER BY sp5.n)
    FROM
    STRING_SPLIT(@array_5, ',') ss5
    CROSS APPLY (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp) sp5 (n)
    CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss5.value), sp5.n, 1)) ) lp5 (letter_in_pos)
    GROUP BY
    ss5.value
    )
    SELECT
    a6.val_6,
    --a6.match_on,
    a5.val_5
    --a5.match_on
    FROM
    cte_a6 a6
    JOIN cte_a5 a5
    ON a6.match_on = a5.match_on;
    GO

     

    • This reply was modified 4 years, 4 months ago by  Jason A. Long.
  • BOR15K wrote:

    Thank you All!

    Steve, apologies - I had no intention to offend you. I have looked into fnTally / the link you have kindly provided.

    Much obliged.

    Thank you for the kind words!  I think I slightly misunderstood the requirements as well.  Jason's code made it clear.  Please keep at it with the questions if there's anything additional.  You said you wanted to extend it yourself and I respect that 🙂

    But so... it's not over yet!  Or I hope it's not.  What would lead you to investigate a set based solution to a problem of this nature?  Apparently, Jeff has some code he references " both will produce similar execution plans..." but besides mine I only see one set of code.  Also, I have an alternate approach I'm trying as well and we'll see.  If it works then I want a speed test against these guys.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply