identifying different patterns

Question

Post reply

identifying different patterns

Abhijit More

SSCertifiable

Points: 6897
More actions
May 20, 2014 at 8:17 am

#293475

declare @string varchar(500) = 'Carborator Break Engine Oil'
declare @tblPattern table (ID INT IDENTITY, Pattern VARCHAR(500), PatternIdentifier INT)
---Single Word Pattern
insert@tblPattern(Pattern, PatternIdentifier)
SELECTWord, 0
FROM(SELECT @string AS phrase) p
CROSS APPLY (
SELECT CAST('<a>'+REPLACE(phrase,' ','</a><a>')+'</a>'
---+ CASE WHEN CHARINDEX(' ', phrase)> 0 THEN '<a>'+ phrase +'</a>' ELSE '' END
AS XML) xml1
) t1
CROSS APPLY (SELECT n.value('.','varchar(255)') AS word FROM xml1.nodes('a') x(n) ) t2
select * from @tblPattern
HOW CAN I GENERATE DIFFERENT PATTERNS FROM ABOVE STRING?
EXPECTED OUTPUT:
Carborator
Break
Engine
Oil
Carborator Break
Carborator Break Engine
Carborator Break Engine Oil
Engine Oil
Break Engine Oil
Abhijit - http://abhijitmore.wordpress.com

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

Using the splitter function referenced in my signature you could do this.

declare @string varchar(500) = 'Carborator Break Engine Oil';

declare @tblPattern table (ID INT IDENTITY, Pattern VARCHAR(500), PatternIdentifier INT);

;WITH split1 AS

(

SELECT s.*

FROM (values (@string)) t(c)

CROSS APPLY dbo.DelimitedSplit8K(t.c,' ') s

),

combos AS

(

SELECTrn = ROW_NUMBER() OVER (order by (select null)),

[c4] = s1.Item+' '+coalesce(s2.Item,'')+' '+coalesce(s3.Item,'')+' '+coalesce(s4.Item,''),

[c3] = s1.Item+' '+coalesce(s2.Item,'')+' '+coalesce(s3.Item,''),

[c2] = s1.Item+' '+coalesce(s2.Item,'')

FROM split1 s1

LEFT JOIN split1 s2

ON s2.ItemNumber = s1.ItemNumber+1

LEFT JOIN split1 s3

ON s3.ItemNumber = s2.ItemNumber+1

LEFT JOIN split1 s4

ON s4.ItemNumber = s3.ItemNumber+1

)

SELECT item FROM split1 UNION ALL

SELECT [c4] FROM combos WHERE rn < 4 UNION ALL

SELECT [c3] FROM combos WHERE rn = 1 UNION ALL

SELECT [c2] FROM combos WHERE rn = 1;

This will work when you have 4 words (three spaces)... this would need to be tweaked based on your requirement.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

mickyT SSChampion Points: 10360 More actions · Answer 2

Hi

Here's another method using the same splitter and with the same conditions as Alan's. I thought a cube may give you the results you need

declare @string varchar(500) = 'Carborator Break Engine Oil';

with split as (

select itemnumber, item

from dbo.DelimitedSplit8K(@string,' ')

),

pivoted as (

select max(case when itemnumber = 1 then item end) word1,

max(case when itemnumber = 2 then item end) word2,

max(case when itemnumber = 3 then item end) word3,

max(case when itemnumber = 4 then item end) word4

from split s

),

concatcube as (

select ltrim(concat(word1, isnull(' ' + word2,''), isnull(' ' + word3,''), isnull(' ' + word4,''))) pattern,

case when word1 is null then '-' else '#' end +

case when word2 is null then '-' else '#' end +

case when word3 is null then '-' else '#' end +

case when word4 is null then '-' else '#' end testPattern

from pivoted p

group by cube(word1, word2, word3, word4)

),

keeppatterns as (

select testPattern

from (VALUES

('#---'),

('-#--'),

('--#-'),

('---#'),

('##--'),

('###-'),

('####'),

('-###'),

('--##')

) tp(testPattern)

)

select pattern

from concatcube c

inner join keeppatterns k on c.testPattern = k.testPattern

order by pattern;

It's a tad verbose, but I wanted to show the breakdown of the steps.

Edit: removed a surplus replace from an earlier version

Alan Burstein SSC Guru Points: 61141 More actions · Answer 3

mickyT (5/20/2014)
Hi
Here's another method using the same splitter and with the same conditions as Alan's. I though cube may give you the results you need
declare @string varchar(500) = 'Carborator Break Engine Oil';
with split as (
select itemnumber, item
from dbo.DelimitedSplit8K(@string,' ')
),
pivoted as (
select max(case when itemnumber = 1 then item end) word1,
max(case when itemnumber = 2 then item end) word2,
max(case when itemnumber = 3 then item end) word3,
max(case when itemnumber = 4 then item end) word4
from split s
),
concatcube as (
select ltrim(concat(word1, isnull(' ' + word2,''), isnull(' ' + word3,''), isnull(' ' + word4,''))) pattern,
case when word1 is null then '-' else '#' end +
case when word2 is null then '-' else '#' end +
case when word3 is null then '-' else '#' end +
case when word4 is null then '-' else '#' end testPattern
from pivoted p
group by cube(word1, word2, word3, word4)
),
keeppatterns as (
select testPattern
from (VALUES
('#---'),
('-#--'),
('--#-'),
('---#'),
('##--'),
('###-'),
('####'),
('-###'),
('--##')
) tp(testPattern)
)
select replace(pattern,'-','') pattern
from concatcube c
inner join keeppatterns k on c.testPattern = k.testPattern
order by pattern;
It's a tad verbose, but I wanted to show the breakdown of the steps.

This is excellent Micky. Brilliant use of cube too. Well done!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

mickyT SSChampion Points: 10360 More actions · Answer 4

Thanks Alan :blush: and then I see my spelling mistake

Abhijit More SSCertifiable Points: 6897 More actions · Answer 5

thanks guys you saved me!

addition to this I am also searching how many times the word has been repeated within the same column

below is the script

;WITH

CTE

AS(SELECTdistinct pattern

FROM @phrase w

CROSS APPLY dbo.WordSplitter(w.DESC_COL) p

WHERE LEN(ltrim(rtrim(pattern))) > 1 and pattern != ''

)

,CTE2

AS

(SELECTpattern,

(SELECT COUNT(1) FROM @phrase WHERE DESC_COL LIKE '%' + pattern + '%') Occurance

FROMCTE

GROUPBY pattern

)

SELECT * FROM cte2

the problem here is it works very well with small amount of data and I have huge amount of data in desc (approx. 7 million) out of which i need to extract this word occurrence

Abhijit - http://abhijitmore.wordpress.com

mickyT SSChampion Points: 10360 More actions · Answer 6

Abhijit More (5/22/2014)
thanks guys you saved me!
addition to this I am also searching how many times the word has been repeated within the same column
below is the script
;WITH
CTE
AS(SELECTdistinct pattern
FROM @phrase w
CROSS APPLY dbo.WordSplitter(w.DESC_COL) p
WHERE LEN(ltrim(rtrim(pattern))) > 1 and pattern != ''
)
,CTE2
AS
(SELECTpattern,
(SELECT COUNT(1) FROM @phrase WHERE DESC_COL LIKE '%' + pattern + '%') Occurance
FROMCTE
GROUPBY pattern
)
SELECT * FROM cte2
the problem here is it works very well with small amount of data and I have huge amount of data in desc (approx. 7 million) out of which i need to extract this word occurrence

Not a problem, however to help you on the query above we will need some sample data. You are selecting from @phrase which we can't see. I assume that you are working from a real table. Post the DDL for that and some example data.

One thing, are you using the delimitedSplit8K that Alan pointed you at? There are other splitters out there that are really slow.

thava SSCrazy Points: 2275 More actions · Answer 7

this is my contribution

declare @string varchar(500) = 'Carborator Break Engine Oil';

with split as (

SELECT itemnumber, item, COUNT(*) OVER () AS cnt

from dbo.DelimitedSplit8K(@string,' ')

),Recursive_cte AS(

SELECT itemnumber, item FROM [split]

UNION ALL

SELECT s.itemnumber, s.item+' '+r.item

FROM [split] s, Recursive_cte r

WHERE s.cnt<=4 AND r.item NOT LIKE '%'+s.item+'%'

)

SELECT * FROM Recursive_cte

Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script [/url]

identifying different patterns

Cookies on SQLServerCentral