August 14, 2020 at 11:09 am
Hello All,
I need help to compare strings, please. For example, I have two strings with various words: one has six letters long words and the other - with five. I need to find out the words which have certain letters in certain positions. For example I want to see Betty for five letters and Bertha for six, if there is a request to find all the words / strings with same letters in the positions 1,2 and 4. I need some basic help, which I will then try to extend, please.
So far I have done only the following and obviously far away from what I need. Any help will be truly appreciated:
DECLARE @six_letters NVARCHAR(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed' ;
DECLARE @five_letters NVARCHAR(MAX) = ' floor, beTtY, carry, trees, sleep, tells';
SELECT UPPER(LTRIM(value)) AS six_legth
FROM STRING_SPLIT(@six_letters, ',')
WHERE UPPER(LEFT(LTRIM(value),2)) IN (
SELECT UPPER(LEFT(LTRIM(value),2))
FROM STRING_SPLIT(@five_letters, ',')
)
ORDER BY 1;
SELECT UPPER(LTRIM(value)) AS five_length
FROM STRING_SPLIT(@five_letters, ',')
WHERE UPPER(LEFT(LTRIM(value),2)) IN (
SELECT UPPER(LEFT(LTRIM(value),2))
FROM STRING_SPLIT(@six_letters, ',')
)
ORDER BY 1;
August 14, 2020 at 2:22 pm
Based on the example data provided, what output are you hoping for?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 14, 2020 at 2:52 pm
DECLARE @six_letters NVARCHAR(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed' ;
DECLARE @five_letters NVARCHAR(MAX) = ' floor, beTtY, carry, trees, sleep, tells';
declare @input_letter_pairs table (letter nchar(1),
position int,
unique(letter, position));
insert @input_letter_pairs(letter, position) values
('b', 1),
('e', 2);
;with six_letters_cte(ItemNumber, Item, n, split_letter) as (
select dsl1.*, t.n, substring(dsl1_trim.item_trim, t.n, 1) split_letter
from
dbo.DelimitedSplitN4K(trim(@six_letters), ',') dsl1
cross apply
(select iif(left(dsl1.Item, 1)=' ', right(dsl1.Item, len(dsl1.Item)-1), dsl1.Item) item_trim ) dsl1_trim
cross apply
fnTally(1, len(dsl1_trim.item_trim)) t)
select * /*count(*) matched_pair_count*/
from
six_letters_cte slc
join
@input_letter_pairs ilp on slc.n=ilp.position
and slc.split_letter=ilp.letter;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
August 14, 2020 at 8:11 pm
I would expect to see Bertha and Betty (doesn't matter the order nor if it is all lower / UPPER case).
Thank you.
August 14, 2020 at 9:37 pm
Here's for six letters. It returns Bertha. To do for five letters would follow a similar pattern.
DECLARE @six_letters NVARCHAR(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed' ;
declare @input_letter_pairs table (letter nchar(1),
position int,
unique(letter, position));
insert @input_letter_pairs(letter, position) values
('b', 1),
('e', 2);
;with
six_letters_cte(ItemNumber, Item, n, split_letter) as (
select dsl1.*, t.n, substring(dsl1_trim.item_trim, t.n, 1) split_letter
from
dbo.DelimitedSplitN4K(trim(@six_letters), ',') dsl1
cross apply
(select iif(left(dsl1.Item, 1)=' ', right(dsl1.Item, len(dsl1.Item)-1), dsl1.Item) item_trim ) dsl1_trim
cross apply
dbo.fnTally(1, len(dsl1_trim.item_trim)) t)
select ItemNumber, Item
from
six_letters_cte slc
join
@input_letter_pairs ilp on slc.n=ilp.position
and slc.split_letter=ilp.letter
group by
ItemNumber, Item
having count(*)>1;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
August 14, 2020 at 9:48 pm
Thank you, Steve
Sadly none of your examples works for me from the Studio - getting various error messages.
August 15, 2020 at 2:07 am
I need some basic help, which I will then try to extend, please.
Since very few people ever come back to tell us, I'll make you a deal... Tell us what the extended use is for and I'll show you the solution I just wrote that might already be extended to do what you ultimately need it for. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 15, 2020 at 2:09 am
Thank you, Steve
Sadly none of your examples works for me from the Studio - getting various error messages.
The code probably worked just fine for Steve. When you get such messages, you should post them so we can help you figure things out. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 15, 2020 at 3:46 am
Take this for a spin...
DECLARE
@array_6 varchar(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed',
@array_5 varchar(MAX) = ' floor, beTtY, carry, trees, sleep, tells',
@search_positions varchar(12) = '1,2,4';
WITH
cte_a6 AS (
SELECT
val_6 = CONVERT(char(6), TRIM(ss6.value)),
match_on = STRING_AGG(lp6.letter_in_pos, '') WITHIN GROUP (ORDER BY t6.n)
FROM
STRING_SPLIT(@array_6, ',') ss6
CROSS APPLY (VALUES (1),(2),(3),(4),(5),(6) ) t6 (n)
CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss6.value), t6.n, 1)) ) lp6 (letter_in_pos)
WHERE
t6.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
GROUP BY
ss6.value
),
cte_a5 AS (
SELECT
val_5 = CONVERT(char(5), TRIM(ss5.value)),
match_on = STRING_AGG(lp5.letter_in_pos, '') WITHIN GROUP (ORDER BY t5.n)
FROM
STRING_SPLIT(@array_5, ',') ss5
CROSS APPLY (VALUES (1),(2),(3),(4),(5) ) t5 (n)
CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss5.value), t5.n, 1)) ) lp5 (letter_in_pos)
WHERE
t5.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
GROUP BY
ss5.value
)
SELECT
a6.val_6,
--a6.match_on,
a5.val_5
--a5.match_on
FROM
cte_a6 a6
JOIN cte_a5 a5
ON a6.match_on = a5.match_on;
Returns...
val_6 val_5
------ -----
Bertha beTtY
August 15, 2020 at 5:12 am
Take this for a spin...
DECLARE
@array_6 varchar(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed',
@array_5 varchar(MAX) = ' floor, beTtY, carry, trees, sleep, tells',
@search_positions varchar(12) = '1,2,4';
WITH
cte_a6 AS (
SELECT
val_6 = CONVERT(char(6), TRIM(ss6.value)),
match_on = STRING_AGG(lp6.letter_in_pos, '') WITHIN GROUP (ORDER BY t6.n)
FROM
STRING_SPLIT(@array_6, ',') ss6
CROSS APPLY (VALUES (1),(2),(3),(4),(5),(6) ) t6 (n)
CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss6.value), t6.n, 1)) ) lp6 (letter_in_pos)
WHERE
t6.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
GROUP BY
ss6.value
),
cte_a5 AS (
SELECT
val_5 = CONVERT(char(5), TRIM(ss5.value)),
match_on = STRING_AGG(lp5.letter_in_pos, '') WITHIN GROUP (ORDER BY t5.n)
FROM
STRING_SPLIT(@array_5, ',') ss5
CROSS APPLY (VALUES (1),(2),(3),(4),(5) ) t5 (n)
CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss5.value), t5.n, 1)) ) lp5 (letter_in_pos)
WHERE
t5.n IN (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp)
GROUP BY
ss5.value
)
SELECT
a6.val_6,
--a6.match_on,
a5.val_5
--a5.match_on
FROM
cte_a6 a6
JOIN cte_a5 a5
ON a6.match_on = a5.match_on;Returns...
val_6 val_5
------ -----
Bertha beTtY
Heh... it's amazing how similar that is to the code I wrote (still holding out on that because I really am curious what the end game is for this very interesting problem). We even used similar column names. You used "Match_On" and I used "MatchOn".
One difference is that I don't have a limit on the word widths and I don't use a numeric sequence. I also used a single CTE for the "positions" whereas you used one in each WHERE clause... both will produce similar execution plans though because a CTE is re-executed every time it's called.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 15, 2020 at 12:12 pm
BOR15K wrote:Thank you, Steve
Sadly none of your examples works for me from the Studio - getting various error messages.
The code probably worked just fine for Steve. When you get such messages, you should post them so we can help you figure things out. 😉
Yes, thank you for the benefit of the doubt here. I'm afraid I've made a mess of it though. I was playing around with splitters and I've updated the code several times with different ones. At one point I changed it to string_split and that would've caused errors. It should've been (the whole time) dbo.DelimitedSplitN4K. Anyway, Jason's code looks great and it's doesn't use a tally tvf or an ordinal splitter so it's more appropriate
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
August 15, 2020 at 12:16 pm
Thank you All!
Steve, apologies - I had no intention to offend you. I have looked into fnTally / the link you have kindly provided.
Much obliged.
August 15, 2020 at 1:54 pm
Heh... it's amazing how similar that is to the code I wrote (still holding out on that because I really am curious what the end game is for this very interesting problem). We even used similar column names. You used "Match_On" and I used "MatchOn".
One difference is that I don't have a limit on the word widths and I don't use a numeric sequence. I also used a single CTE for the "positions" whereas you used one in each WHERE clause... both will produce similar execution plans though because a CTE is re-executed every time it's called.
Jeff - Considering the fact that that I'm consistently impressed by the solutions that you come up with, I'm forced to take that as a compliment. 😀 I hope you decide to post your solution no matter how the OP responds. I always seem to find some new nugget of gold in your solutions.
Anyway, your comment regarding the limit on word length got me thinking... I initially did it that way because the OP was pretty clear about the word lengths in the criteria so I figured that would be a safe move to make at the time. That said, your comment did shake something loose and it occurred to me that I didn't need the inline tally at all!!!
It, in fact, occurred to me that using a sequential tally to produce rows for every character and then filtering it later with another list of numbers was just plain stupid! Why not just use the list of numbers all by itself?
It eliminates the word length limits, the unneeded tally rows, the tally itself and the WHERE clause... and it makes for a cleaner bit of code and faster compile times.
DECLARE
@array_6 varchar(MAX) = 'hurrah, buzzed, Bertha, patter, mettle, holler, collie, yuppie, rabble, tallow, brooks, tassel, meddle, callus, peddle, sobbed, kneels, getter, brooms, dulled, maggot, webbed',
@array_5 varchar(MAX) = ' floor, beTtY, carry, trees, sleep, tells',
@search_positions varchar(12) = '1,2,4';
WITH
cte_a6 AS (
SELECT
val_6 = CONVERT(varchar(20), TRIM(ss6.value)),
match_on = STRING_AGG(lp6.letter_in_pos, '') WITHIN GROUP (ORDER BY sp6.n)
FROM
STRING_SPLIT(@array_6, ',') ss6
CROSS APPLY (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp) sp6 (n)
CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss6.value), sp6.n, 1)) ) lp6 (letter_in_pos)
GROUP BY
ss6.value
),
cte_a5 AS (
SELECT
val_5 = CONVERT(varchar(20), TRIM(ss5.value)),
match_on = STRING_AGG(lp5.letter_in_pos, '') WITHIN GROUP (ORDER BY sp5.n)
FROM
STRING_SPLIT(@array_5, ',') ss5
CROSS APPLY (SELECT TRY_CONVERT(int, sp.value) FROM STRING_SPLIT(@search_positions, ',') sp) sp5 (n)
CROSS APPLY ( VALUES (SUBSTRING(TRIM(ss5.value), sp5.n, 1)) ) lp5 (letter_in_pos)
GROUP BY
ss5.value
)
SELECT
a6.val_6,
--a6.match_on,
a5.val_5
--a5.match_on
FROM
cte_a6 a6
JOIN cte_a5 a5
ON a6.match_on = a5.match_on;
GO
August 15, 2020 at 1:57 pm
Thank you All!
Steve, apologies - I had no intention to offend you. I have looked into fnTally / the link you have kindly provided.
Much obliged.
Thank you for the kind words! I think I slightly misunderstood the requirements as well. Jason's code made it clear. Please keep at it with the questions if there's anything additional. You said you wanted to extend it yourself and I respect that 🙂
But so... it's not over yet! Or I hope it's not. What would lead you to investigate a set based solution to a problem of this nature? Apparently, Jeff has some code he references " both will produce similar execution plans..." but besides mine I only see one set of code. Also, I have an alternate approach I'm trying as well and we'll see. If it works then I want a speed test against these guys.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 1 through 15 (of 40 total)
You must be logged in to reply to this topic. Login to reply