August 25, 2014 at 8:11 am
dwain.c (8/24/2014)
I confess to a bit of lack of understanding here (meaning in what you expect your output to be), but since you mentioned PatternSplitLoop (presumably the one from my article) I felt obliged to reply in some fashion.Are you looking for a solution something like this? This lists either his, hers or mine, whichever appears first (as a word) in the string. A word being any string bounded by non-letters.
WITH cteText(string) AS (
SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL
SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL
SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL
SELECT 'more_stuff.yours\mine\hers\his_falafel'
)
SELECT string, item
FROM
(
SELECT string, item, rn=ROW_NUMBER() OVER (PARTITION BY string ORDER BY ItemNumber)
FROM cteText a
CROSS APPLY dbo.PatternSplitCM(string, '[a-z]') b
WHERE [Matched]=1 AND item IN ('his', 'hers', 'mine')
) a
WHERE rn=1;
Two points here though:
- If you're referring to the PatternSplitLoop function I think you are, use PatternSplitCM as it is much faster.
- PatternSplitCM will probably suffer the same performance degradation using VARCHAR(MAX) that DelimitedSplit8K has.
Actually, yeah, I was sort of confused. I had gone looking for PatternSplitCM because I remember it quite fondly from my previous experience using it to clean out undesirable characters from phone numbers, and found PatternSplitLoop. Can't recall if it was from the same article I read or not, but was kind of rushing and maybe just missed SplitCM at the bottom here. Either way, thank you. I'll give it a run at some point today and report back. Though Louis' solution worked quite fast with just a little tinkering to get the it up to specs, I should probably give this a shot.
Thanks
August 25, 2014 at 10:59 am
Just having a quick fun with this problem, this code probably has more of an entertainment value than being an actual solution (couldn't help it):-D
😎
USE tempdb;
GO
WITH cteText(string) AS (
SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL
SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL
SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL
SELECT 'more_stuff.yours\mine\hers\his_falafel'
)
,cteMatchString(MSTR) AS (SELECT MSTR FROM (VALUES ('his'),('hers'),('mine'),('yours')) AS X(MSTR))
,ctePrefixes(PRFX) AS (SELECT PRFX FROM (VALUES ('.'),('_'),('\')) AS X(PRFX))
,ctePRFX_PAT(PMSTR,MSTR) AS (SELECT P.PRFX + M.MSTR AS PMSTR,MSTR FROM cteMatchString M CROSS JOIN ctePrefixes P)
,FINAL_DATA AS
(
SELECT
CT.string
,CM.MSTR
,ROW_NUMBER() OVER (PARTITION BY CT.string ORDER BY CHARINDEX(CM.PMSTR,CT.string,1)) AS POS_ORDER
FROM cteText CT
OUTER APPLY ctePRFX_PAT CM
WHERE CHARINDEX(CM.PMSTR,CT.string,1) > 0
)
SELECT
FD.string
,FD.MSTR
,FD.POS_ORDER
FROM FINAL_DATA FD
Results
string MSTR POS_ORDER
--------------------------------------------------------------- ----- ----------
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato hers 1
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato his 2
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato mine 3
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato yours 4
more_stuff.yours\mine\hers\his_falafel yours 1
more_stuff.yours\mine\hers\his_falafel mine 2
more_stuff.yours\mine\hers\his_falafel hers 3
more_stuff.yours\mine\hers\his_falafel his 4
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein his 1
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein hers 2
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein mine 3
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein yours 4
path_like.things_mine\hers.some_else\his_garbage\yours_sneakers mine 1
path_like.things_mine\hers.some_else\his_garbage\yours_sneakers hers 2
August 25, 2014 at 12:21 pm
Eirikur Eiriksson (8/25/2014)
Just having a quick fun with this problem, this code probably has more of an entertainment value than being an actual solution (couldn't help it):-D😎
USE tempdb;
GO
WITH cteText(string) AS (
SELECT 'oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein' UNION ALL
SELECT 'i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato' UNION ALL
SELECT 'path_like.things_mine\hers.some_else\his_garbage\yours_sneakers' UNION ALL
SELECT 'more_stuff.yours\mine\hers\his_falafel'
)
,cteMatchString(MSTR) AS (SELECT MSTR FROM (VALUES ('his'),('hers'),('mine'),('yours')) AS X(MSTR))
,ctePrefixes(PRFX) AS (SELECT PRFX FROM (VALUES ('.'),('_'),('\')) AS X(PRFX))
,ctePRFX_PAT(PMSTR,MSTR) AS (SELECT P.PRFX + M.MSTR AS PMSTR,MSTR FROM cteMatchString M CROSS JOIN ctePrefixes P)
,FINAL_DATA AS
(
SELECT
CT.string
,CM.MSTR
,ROW_NUMBER() OVER (PARTITION BY CT.string ORDER BY CHARINDEX(CM.PMSTR,CT.string,1)) AS POS_ORDER
FROM cteText CT
OUTER APPLY ctePRFX_PAT CM
WHERE CHARINDEX(CM.PMSTR,CT.string,1) > 0
)
SELECT
FD.string
,FD.MSTR
,FD.POS_ORDER
FROM FINAL_DATA FD
Results
string MSTR POS_ORDER
--------------------------------------------------------------- ----- ----------
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato hers 1
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato his 2
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato mine 3
i.am.a._thing.hers\his_things\mine.refrigerator\yours_potato yours 4
more_stuff.yours\mine\hers\his_falafel yours 1
more_stuff.yours\mine\hers\his_falafel mine 2
more_stuff.yours\mine\hers\his_falafel hers 3
more_stuff.yours\mine\hers\his_falafel his 4
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein his 1
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein hers 2
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein mine 3
oh_look_at_this.thing.his\hers_stuff\mine.craft\yours_protein yours 4
path_like.things_mine\hers.some_else\his_garbage\yours_sneakers mine 1
path_like.things_mine\hers.some_else\his_garbage\yours_sneakers hers 2
That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.
Thanks
August 25, 2014 at 12:33 pm
sqldriver (8/25/2014)
That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.
Just demonstrating a method of filtering out the false positives, i.e. "this"-->"his". As I stated before, it is not a perfect solution but could be a base for one though. The weakness would be a recurrence of a pattern within the "string".
😎
August 25, 2014 at 12:43 pm
Eirikur Eiriksson (8/25/2014)
sqldriver (8/25/2014)
That's pretty cool. I like the position column. I don't think you need the extra step with the prefixes, though. Their position (at least in my actual data) isn't a very reliable indicator of anything.Just demonstrating a method of filtering out the false positives, i.e. "this"-->"his". As I stated before, it is not a perfect solution but could be a base for one though. The weakness would be a recurrence of a pattern within the "string".
😎
Ah, I see. In my attempt to make this easy for everyone to work with, I made it more complicated. Actual values that I'm searching for are like .b. and .ib., and not terribly duplicate-prone within each string.
On the plus side, you've won a fabulous MUSK OX!
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply