Determine which pattern comes first in a string<!-- 864 --><!-- 864 -->

  • dwain.c (8/24/2014)


    I confess to a bit of lack of understanding here (meaning in what you expect your output to be), but since you mentioned PatternSplitLoop (presumably the one from my article) I felt obliged to reply in some fashion.

    Are you looking for a solution something like this? This lists either his, hers or mine, whichever appears first (as a word) in the string. A word being any string bounded by non-letters.

    WITH cteText(string) AS (

    SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL

    SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL

    SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL

    SELECT 'more_stuff.yours\mine\hers\his_falafel'

    )

    SELECT string, item

    FROM

    (

    SELECT string, item, rn=ROW_NUMBER() OVER (PARTITION BY string ORDER BY ItemNumber)

    FROM cteText a

    CROSS APPLY dbo.PatternSplitCM(string, '[a-z]') b

    WHERE [Matched]=1 AND item IN ('his', 'hers', 'mine')

    ) a

    WHERE rn=1;

    Two points here though:

    - If you're referring to the PatternSplitLoop function I think you are, use PatternSplitCM as it is much faster.

    - PatternSplitCM will probably suffer the same performance degradation using VARCHAR(MAX) that DelimitedSplit8K has.

    Actually, yeah, I was sort of confused. I had gone looking for PatternSplitCM because I remember it quite fondly from my previous experience using it to clean out undesirable characters from phone numbers, and found PatternSplitLoop. Can't recall if it was from the same article I read or not, but was kind of rushing and maybe just missed SplitCM at the bottom here. Either way, thank you. I'll give it a run at some point today and report back. Though Louis' solution worked quite fast with just a little tinkering to get the it up to specs, I should probably give this a shot.

    Thanks

  • Just having a quick fun with this problem, this code probably has more of an entertainment value than being an actual solution (couldn't help it):-D

    😎

    USE tempdb;

    GO

    WITH cteText(string) AS (

    SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL

    SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL

    SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL

    SELECT 'more_stuff.yours\mine\hers\his_falafel'

    )

    ,cteMatchString(MSTR) AS (SELECT MSTR FROM (VALUES ('his'),('hers'),('mine'),('yours')) AS X(MSTR))

    ,ctePrefixes(PRFX) AS (SELECT PRFX FROM (VALUES ('.'),('_'),('\')) AS X(PRFX))

    ,ctePRFX_PAT(PMSTR,MSTR) AS (SELECT P.PRFX + M.MSTR AS PMSTR,MSTR FROM cteMatchString M CROSS JOIN ctePrefixes P)

    ,FINAL_DATA AS

    (

    SELECT

    CT.string

    ,CM.MSTR

    ,ROW_NUMBER() OVER (PARTITION BY CT.string ORDER BY CHARINDEX(CM.PMSTR,CT.string,1)) AS POS_ORDER

    FROM cteText CT

    OUTER APPLY ctePRFX_PAT CM

    WHERE CHARINDEX(CM.PMSTR,CT.string,1) > 0

    )

    SELECT

    FD.string

    ,FD.MSTR

    ,FD.POS_ORDER

    FROM FINAL_DATA FD

    Results

    string MSTR POS_ORDER

    --------------------------------------------------------------- ----- ----------

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato hers 1

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato his 2

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato mine 3

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato yours 4

    more_stuff.yours\mine\hers\his_falafel yours 1

    more_stuff.yours\mine\hers\his_falafel mine 2

    more_stuff.yours\mine\hers\his_falafel hers 3

    more_stuff.yours\mine\hers\his_falafel his 4

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein his 1

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein hers 2

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein mine 3

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein yours 4

    path_like.things_mine\hers.some_else\his_garbage\yours_sneakers mine 1

    path_like.things_mine\hers.some_else\his_garbage\yours_sneakers hers 2

  • Eirikur Eiriksson (8/25/2014)


    Just having a quick fun with this problem, this code probably has more of an entertainment value than being an actual solution (couldn't help it):-D

    😎

    USE tempdb;

    GO

    WITH cteText(string) AS (

    SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL

    SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL

    SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL

    SELECT 'more_stuff.yours\mine\hers\his_falafel'

    )

    ,cteMatchString(MSTR) AS (SELECT MSTR FROM (VALUES ('his'),('hers'),('mine'),('yours')) AS X(MSTR))

    ,ctePrefixes(PRFX) AS (SELECT PRFX FROM (VALUES ('.'),('_'),('\')) AS X(PRFX))

    ,ctePRFX_PAT(PMSTR,MSTR) AS (SELECT P.PRFX + M.MSTR AS PMSTR,MSTR FROM cteMatchString M CROSS JOIN ctePrefixes P)

    ,FINAL_DATA AS

    (

    SELECT

    CT.string

    ,CM.MSTR

    ,ROW_NUMBER() OVER (PARTITION BY CT.string ORDER BY CHARINDEX(CM.PMSTR,CT.string,1)) AS POS_ORDER

    FROM cteText CT

    OUTER APPLY ctePRFX_PAT CM

    WHERE CHARINDEX(CM.PMSTR,CT.string,1) > 0

    )

    SELECT

    FD.string

    ,FD.MSTR

    ,FD.POS_ORDER

    FROM FINAL_DATA FD

    Results

    string MSTR POS_ORDER

    --------------------------------------------------------------- ----- ----------

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato hers 1

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato his 2

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato mine 3

    i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato yours 4

    more_stuff.yours\mine\hers\his_falafel yours 1

    more_stuff.yours\mine\hers\his_falafel mine 2

    more_stuff.yours\mine\hers\his_falafel hers 3

    more_stuff.yours\mine\hers\his_falafel his 4

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein his 1

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein hers 2

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein mine 3

    oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein yours 4

    path_like.things_mine\hers.some_else\his_garbage\yours_sneakers mine 1

    path_like.things_mine\hers.some_else\his_garbage\yours_sneakers hers 2

    That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.

    Thanks

  • sqldriver (8/25/2014)


    That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.

    Just demonstrating a method of filtering out the false positives, i.e. "this"-->"his". As I stated before, it is not a perfect solution but could be a base for one though. The weakness would be a recurrence of a pattern within the "string".

    😎

  • Eirikur Eiriksson (8/25/2014)


    sqldriver (8/25/2014)


    That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.

    Just demonstrating a method of filtering out the false positives, i.e. "this"-->"his". As I stated before, it is not a perfect solution but could be a base for one though. The weakness would be a recurrence of a pattern within the "string".

    😎

    Ah, I see. In my attempt to make this easy for everyone to work with, I made it more complicated. Actual values that I'm searching for are like .b. and .ib., and not terribly duplicate-prone within each string.

    On the plus side, you've won a fabulous MUSK OX!

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply