Determine which pattern comes first in a string

Question

Determine which pattern comes first in a string

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

sqldriver SSChampion Points: 10440 More actions · Answer 1

dwain.c (8/24/2014)
I confess to a bit of lack of understanding here (meaning in what you expect your output to be), but since you mentioned PatternSplitLoop (presumably the one from my article) I felt obliged to reply in some fashion.
Are you looking for a solution something like this? This lists either his, hers or mine, whichever appears first (as a word) in the string. A word being any string bounded by non-letters.
WITH cteText(string) AS (
SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL
SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL
SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL
SELECT 'more_stuff.yours\mine\hers\his_falafel'
)
SELECT string, item
FROM
(
SELECT string, item, rn=ROW_NUMBER() OVER (PARTITION BY string ORDER BY ItemNumber)
FROM cteText a
CROSS APPLY dbo.PatternSplitCM(string, '[a-z]') b
WHERE [Matched]=1 AND item IN ('his', 'hers', 'mine')
) a
WHERE rn=1;
Two points here though:
- If you're referring to the PatternSplitLoop function I think you are, use PatternSplitCM as it is much faster.
- PatternSplitCM will probably suffer the same performance degradation using VARCHAR(MAX) that DelimitedSplit8K has.

Actually, yeah, I was sort of confused. I had gone looking for PatternSplitCM because I remember it quite fondly from my previous experience using it to clean out undesirable characters from phone numbers, and found PatternSplitLoop. Can't recall if it was from the same article I read or not, but was kind of rushing and maybe just missed SplitCM at the bottom here. Either way, thank you. I'll give it a run at some point today and report back. Though Louis' solution worked quite fast with just a little tinkering to get the it up to specs, I should probably give this a shot.

Thanks

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 2

Just having a quick fun with this problem, this code probably has more of an entertainment value than being an actual solution (couldn't help it):-D

😎

USE tempdb;

GO

WITH cteText(string) AS (

SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL

SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL

SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL

SELECT 'more_stuff.yours\mine\hers\his_falafel'

)

,cteMatchString(MSTR) AS (SELECT MSTR FROM (VALUES ('his'),('hers'),('mine'),('yours')) AS X(MSTR))

,ctePrefixes(PRFX) AS (SELECT PRFX FROM (VALUES ('.'),('_'),('\')) AS X(PRFX))

,ctePRFX_PAT(PMSTR,MSTR) AS (SELECT P.PRFX + M.MSTR AS PMSTR,MSTR FROM cteMatchString M CROSS JOIN ctePrefixes P)

,FINAL_DATA AS

(

SELECT

CT.string

,CM.MSTR

,ROW_NUMBER() OVER (PARTITION BY CT.string ORDER BY CHARINDEX(CM.PMSTR,CT.string,1)) AS POS_ORDER

FROM cteText CT

OUTER APPLY ctePRFX_PAT CM

WHERE CHARINDEX(CM.PMSTR,CT.string,1) > 0

)

SELECT

FD.string

,FD.MSTR

,FD.POS_ORDER

FROM FINAL_DATA FD

Results

string MSTR POS_ORDER

--------------------------------------------------------------- ----- ----------

i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato hers 1

i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato his 2

i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato mine 3

i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato yours 4

more_stuff.yours\mine\hers\his_falafel yours 1

more_stuff.yours\mine\hers\his_falafel mine 2

more_stuff.yours\mine\hers\his_falafel hers 3

more_stuff.yours\mine\hers\his_falafel his 4

oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein his 1

oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein hers 2

oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein mine 3

oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein yours 4

path_like.things_mine\hers.some_else\his_garbage\yours_sneakers mine 1

path_like.things_mine\hers.some_else\his_garbage\yours_sneakers hers 2

sqldriver SSChampion Points: 10440 More actions · Answer 3

Eirikur Eiriksson (8/25/2014)
Just having a quick fun with this problem, this code probably has more of an entertainment value than being an actual solution (couldn't help it):-D
😎
USE tempdb;
GO
WITH cteText(string) AS (
SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL
SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL
SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL
SELECT 'more_stuff.yours\mine\hers\his_falafel'
)
,cteMatchString(MSTR) AS (SELECT MSTR FROM (VALUES ('his'),('hers'),('mine'),('yours')) AS X(MSTR))
,ctePrefixes(PRFX) AS (SELECT PRFX FROM (VALUES ('.'),('_'),('\')) AS X(PRFX))
,ctePRFX_PAT(PMSTR,MSTR) AS (SELECT P.PRFX + M.MSTR AS PMSTR,MSTR FROM cteMatchString M CROSS JOIN ctePrefixes P)
,FINAL_DATA AS
(
SELECT
CT.string
,CM.MSTR
,ROW_NUMBER() OVER (PARTITION BY CT.string ORDER BY CHARINDEX(CM.PMSTR,CT.string,1)) AS POS_ORDER
FROM cteText CT
OUTER APPLY ctePRFX_PAT CM
WHERE CHARINDEX(CM.PMSTR,CT.string,1) > 0
)
SELECT
FD.string
,FD.MSTR
,FD.POS_ORDER
FROM FINAL_DATA FD
Results
string MSTR POS_ORDER
--------------------------------------------------------------- ----- ----------
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato hers 1
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato his 2
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato mine 3
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato yours 4
more_stuff.yours\mine\hers\his_falafel yours 1
more_stuff.yours\mine\hers\his_falafel mine 2
more_stuff.yours\mine\hers\his_falafel hers 3
more_stuff.yours\mine\hers\his_falafel his 4
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein his 1
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein hers 2
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein mine 3
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein yours 4
path_like.things_mine\hers.some_else\his_garbage\yours_sneakers mine 1
path_like.things_mine\hers.some_else\his_garbage\yours_sneakers hers 2

That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.

Thanks

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 4

sqldriver (8/25/2014)
That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.

Just demonstrating a method of filtering out the false positives, i.e. "this"-->"his". As I stated before, it is not a perfect solution but could be a base for one though. The weakness would be a recurrence of a pattern within the "string".

😎

sqldriver SSChampion Points: 10440 More actions · Answer 5

Eirikur Eiriksson (8/25/2014)
sqldriver (8/25/2014)
That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.
Just demonstrating a method of filtering out the false positives, i.e. "this"-->"his". As I stated before, it is not a perfect solution but could be a base for one though. The weakness would be a recurrence of a pattern within the "string".
😎

Ah, I see. In my attempt to make this easy for everyone to work with, I made it more complicated. Actual values that I'm searching for are like .b. and .ib., and not terribly duplicate-prone within each string.

On the plus side, you've won a fabulous MUSK OX!