identifying different patterns

  • declare @string varchar(500) = 'Carborator Break Engine Oil'

    declare @tblPattern table (ID INT IDENTITY, Pattern VARCHAR(500), PatternIdentifier INT)

    ---Single Word Pattern

    insert@tblPattern(Pattern, PatternIdentifier)

    SELECTWord, 0

    FROM(SELECT @string AS phrase) p

    CROSS APPLY (

    SELECT CAST('<a>'+REPLACE(phrase,' ','</a><a>')+'</a>'

    ---+ CASE WHEN CHARINDEX(' ', phrase)> 0 THEN '<a>'+ phrase +'</a>' ELSE '' END

    AS XML) xml1

    ) t1

    CROSS APPLY (SELECT n.value('.','varchar(255)') AS word FROM xml1.nodes('a') x(n) ) t2

    select * from @tblPattern

    HOW CAN I GENERATE DIFFERENT PATTERNS FROM ABOVE STRING?

    EXPECTED OUTPUT:

    Carborator

    Break

    Engine

    Oil

    Carborator Break

    Carborator Break Engine

    Carborator Break Engine Oil

    Engine Oil

    Break Engine Oil

    Abhijit - http://abhijitmore.wordpress.com

  • Using the splitter function referenced in my signature you could do this.

    declare @string varchar(500) = 'Carborator Break Engine Oil';

    declare @tblPattern table (ID INT IDENTITY, Pattern VARCHAR(500), PatternIdentifier INT);

    ;WITH split1 AS

    (

    SELECT s.*

    FROM (values (@string)) t(c)

    CROSS APPLY dbo.DelimitedSplit8K(t.c,' ') s

    ),

    combos AS

    (

    SELECTrn = ROW_NUMBER() OVER (order by (select null)),

    [c4] = s1.Item+' '+coalesce(s2.Item,'')+' '+coalesce(s3.Item,'')+' '+coalesce(s4.Item,''),

    [c3] = s1.Item+' '+coalesce(s2.Item,'')+' '+coalesce(s3.Item,''),

    [c2] = s1.Item+' '+coalesce(s2.Item,'')

    FROM split1 s1

    LEFT JOIN split1 s2

    ON s2.ItemNumber = s1.ItemNumber+1

    LEFT JOIN split1 s3

    ON s3.ItemNumber = s2.ItemNumber+1

    LEFT JOIN split1 s4

    ON s4.ItemNumber = s3.ItemNumber+1

    )

    SELECT item FROM split1 UNION ALL

    SELECT [c4] FROM combos WHERE rn < 4 UNION ALL

    SELECT [c3] FROM combos WHERE rn = 1 UNION ALL

    SELECT [c2] FROM combos WHERE rn = 1;

    This will work when you have 4 words (three spaces)... this would need to be tweaked based on your requirement.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hi

    Here's another method using the same splitter and with the same conditions as Alan's. I thought a cube may give you the results you need

    declare @string varchar(500) = 'Carborator Break Engine Oil';

    with split as (

    select itemnumber, item

    from dbo.DelimitedSplit8K(@string,' ')

    ),

    pivoted as (

    select max(case when itemnumber = 1 then item end) word1,

    max(case when itemnumber = 2 then item end) word2,

    max(case when itemnumber = 3 then item end) word3,

    max(case when itemnumber = 4 then item end) word4

    from split s

    ),

    concatcube as (

    select ltrim(concat(word1, isnull(' ' + word2,''), isnull(' ' + word3,''), isnull(' ' + word4,''))) pattern,

    case when word1 is null then '-' else '#' end +

    case when word2 is null then '-' else '#' end +

    case when word3 is null then '-' else '#' end +

    case when word4 is null then '-' else '#' end testPattern

    from pivoted p

    group by cube(word1, word2, word3, word4)

    ),

    keeppatterns as (

    select testPattern

    from (VALUES

    ('#---'),

    ('-#--'),

    ('--#-'),

    ('---#'),

    ('##--'),

    ('###-'),

    ('####'),

    ('-###'),

    ('--##')

    ) tp(testPattern)

    )

    select pattern

    from concatcube c

    inner join keeppatterns k on c.testPattern = k.testPattern

    order by pattern;

    It's a tad verbose, but I wanted to show the breakdown of the steps.

    Edit: removed a surplus replace from an earlier version

  • mickyT (5/20/2014)


    Hi

    Here's another method using the same splitter and with the same conditions as Alan's. I though cube may give you the results you need

    declare @string varchar(500) = 'Carborator Break Engine Oil';

    with split as (

    select itemnumber, item

    from dbo.DelimitedSplit8K(@string,' ')

    ),

    pivoted as (

    select max(case when itemnumber = 1 then item end) word1,

    max(case when itemnumber = 2 then item end) word2,

    max(case when itemnumber = 3 then item end) word3,

    max(case when itemnumber = 4 then item end) word4

    from split s

    ),

    concatcube as (

    select ltrim(concat(word1, isnull(' ' + word2,''), isnull(' ' + word3,''), isnull(' ' + word4,''))) pattern,

    case when word1 is null then '-' else '#' end +

    case when word2 is null then '-' else '#' end +

    case when word3 is null then '-' else '#' end +

    case when word4 is null then '-' else '#' end testPattern

    from pivoted p

    group by cube(word1, word2, word3, word4)

    ),

    keeppatterns as (

    select testPattern

    from (VALUES

    ('#---'),

    ('-#--'),

    ('--#-'),

    ('---#'),

    ('##--'),

    ('###-'),

    ('####'),

    ('-###'),

    ('--##')

    ) tp(testPattern)

    )

    select replace(pattern,'-','') pattern

    from concatcube c

    inner join keeppatterns k on c.testPattern = k.testPattern

    order by pattern;

    It's a tad verbose, but I wanted to show the breakdown of the steps.

    This is excellent Micky. Brilliant use of cube too. Well done!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks Alan :blush: and then I see my spelling mistake 🙂

  • thanks guys you saved me!

    addition to this I am also searching how many times the word has been repeated within the same column

    below is the script

    ;WITH

    CTE

    AS(SELECTdistinct pattern

    FROM @phrase w

    CROSS APPLY dbo.WordSplitter(w.DESC_COL) p

    WHERE LEN(ltrim(rtrim(pattern))) > 1 and pattern != ''

    )

    ,CTE2

    AS

    (SELECTpattern,

    (SELECT COUNT(1) FROM @phrase WHERE DESC_COL LIKE '%' + pattern + '%') Occurance

    FROMCTE

    GROUPBY pattern

    )

    SELECT * FROM cte2

    the problem here is it works very well with small amount of data and I have huge amount of data in desc (approx. 7 million) out of which i need to extract this word occurrence 🙁

    Abhijit - http://abhijitmore.wordpress.com

  • Abhijit More (5/22/2014)


    thanks guys you saved me!

    addition to this I am also searching how many times the word has been repeated within the same column

    below is the script

    ;WITH

    CTE

    AS(SELECTdistinct pattern

    FROM @phrase w

    CROSS APPLY dbo.WordSplitter(w.DESC_COL) p

    WHERE LEN(ltrim(rtrim(pattern))) > 1 and pattern != ''

    )

    ,CTE2

    AS

    (SELECTpattern,

    (SELECT COUNT(1) FROM @phrase WHERE DESC_COL LIKE '%' + pattern + '%') Occurance

    FROMCTE

    GROUPBY pattern

    )

    SELECT * FROM cte2

    the problem here is it works very well with small amount of data and I have huge amount of data in desc (approx. 7 million) out of which i need to extract this word occurrence 🙁

    Not a problem, however to help you on the query above we will need some sample data. You are selecting from @phrase which we can't see. I assume that you are working from a real table. Post the DDL for that and some example data.

    One thing, are you using the delimitedSplit8K that Alan pointed you at? There are other splitters out there that are really slow.

  • this is my contribution

    declare @string varchar(500) = 'Carborator Break Engine Oil';

    with split as (

    SELECT itemnumber, item, COUNT(*) OVER () AS cnt

    from dbo.DelimitedSplit8K(@string,' ')

    ),Recursive_cte AS(

    SELECT itemnumber, item FROM [split]

    UNION ALL

    SELECT s.itemnumber, s.item+' '+r.item

    FROM [split] s, Recursive_cte r

    WHERE s.cnt<=4 AND r.item NOT LIKE '%'+s.item+'%'

    )

    SELECT * FROM Recursive_cte

    Every rule in a world of bits and bytes, can be bend or eventually be broken
    MyBlog About Common dialog control
    A Visualizer for viewing SqlCommand object script [/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply