Find second occurence of a word

Question

Find second occurence of a word

PSB

SSChampion

Points: 13195
More actions
April 23, 2012 at 12:25 pm

#273092

Hi,
I have a table named ABC. It has a field named Y and Z.
Y='ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'
I want to find the second instance of Word ABCDEFG from the field and then delete everyhing after that
e.g
Y='ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'
becomes
Y='ABCDEFG (9105) CDD'
Thanks,
PSB

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323518 More actions · Answer 1

PSB (4/23/2012)
Hi,
I have a table named ABC. It has a field named Y and Z.
Y='ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'
I want to find the second instance of Word ABCDEFG from the field and then delete everyhing after that
e.g
Y='ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'
becomes
Y='ABCDEFG (9105) CDD'
Thanks,
PSB

for specific strings, you can use a function like CHARINDEX2:

if you don't know which word is repeating, ie the first string in each row, that'll be a little harder.

UPDATE ABC

SET Y = CASE

WHEN dbo.CHARINDEX2('ABCDEFG', Y, 2) > 0

THEN SUBSTRING(Y,1,dbo.CHARINDEX2('ABCDEFG', Y, 2))

ELSE Y

END

/*

Example:

SELECT dbo.CHARINDEX2('a', 'abbabba', 3)

returns the location of the third occurrence of 'a'

which is 7

*/

CREATE FUNCTION CHARINDEX2(

@TargetStr varchar(8000),

@SearchedStr varchar(8000),

@Occurrence int)

RETURNS int

AS

BEGIN

DECLARE @pos int, @counter int, @ret int

SET @pos = CHARINDEX(@TargetStr, @SearchedStr)

SET @counter = 1

IF @Occurrence = 1

SET @ret = @pos

ELSE

BEGIN

WHILE (@counter < @Occurrence)

BEGIN

SELECT @ret = CHARINDEX(@TargetStr, @SearchedStr, @pos + 1)

SET @counter = @counter + 1

SET @pos = @ret

END

RETURN(@ret)

END

GO

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

drew.allen SSC Guru Points: 76989 More actions · Answer 2

CHARINDEX() has an option parameter (start_location) that can be used to skip the first occurence of the specified string. If you want an example, you'll have to provide more information including sample data and expected results per the Forum Etiquette[/url]

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Geoff A SSChampion Points: 11511 More actions · Answer 3

Declare @test-2 varchar(100)

select @test-2 = 'ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'

select LEFT(@test,patindex('%ABCDEFG%',substring(@test,patindex('%ABCDEFG%',@test)+1,LEN(@test))))

GSquared SSC Guru Points: 260824 More actions · Answer 4

If you don't know what string will appear twice, or if one will at all, try something like this:

DECLARE @String VARCHAR(100) = 'ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING' ;

;

WITH Seeds(Seed)

AS (SELECT R

FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (R)),

Numbers(Number)

AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)

FROM Seeds AS S1

CROSS JOIN Seeds AS S2),

Parser(Parsed, Seq)

AS (SELECT SUBSTRING(@String + ' ', Number, CHARINDEX(' ', @String + ' ', Number) - Number),

Number

FROM Numbers

WHERE Number <= LEN(@String)

AND SUBSTRING(' ' + @String, Number, 1) = ' ')

SELECT (

SELECT Parsed + ' '

FROM Parser

WHERE Seq < ISNULL((SELECT MIN(P2.Seq)

FROM Parser AS P1

INNER JOIN Parser AS P2

ON P1.Parsed = P2.Parsed

AND P1.Seq < P2.Seq), (SELECT MAX(Seq)

FROM Parser))

ORDER BY Parser.Seq

FOR XML PATH(''),

TYPE).value('.[1]', 'varchar(100)') ;

Note that instead of a string variable, you can use a column and set this up to be a Cross/Outer Apply to a table.

The first two CTEs just define a Numbers table in RAM instead of on disk. It's just a table of integer numbers between 1 and 100 (in this case). By adding more Cross Applies, you can achieve an arbitrary length that will be adequate to your maximum string size.

There are some slightly faster string parsing methods, but they only matter if the strings are quite long. The best, in that case, is a CLR function. You can Bing/Google "SQL string parser" and find articles on that subject. This one will work if your string values are reasonably short.

The final piece first selects the first substring that repeats itself, whatever that substring is. If there isn't one, it selects the final substring (MAX(Seq) + 1) and thus includes all pieces, using a simple IsNull construct. You can use Coalesce instead of IsNull if ANSI SQL is important to you. Then it takes those pieces, and uses a FOR XML trick to concatenate them back together. This part is usually very efficient. I specifically include converting the XML value back to varchar in order to make sure it correctly handles things like ampersands in the original string.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 5

Just to be sure, I tested my version with your original example string, and with:

DECLARE @String VARCHAR(100) = 'ABCDEFG (9105) CDD NOT REPEATING & (1690) ENGINEERING' ;

Nothing in that string repeats, so it returns the whole string.

Also a correction: I tried turning this into a simple Cross Apply without encapsulating it into a UDF first, and no joy. Can't reference itself recursively, of course.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

PSB SSChampion Points: 13195 More actions · Answer 6

I tried to use

Declare @test-2 varchar(100)

select @test-2 = 'ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'

select LEFT(@test,patindex('%ABCDEFG%',substring(@test,patindex('%ABCDEFG%',@test)+1,LEN(@test))))

If I have only the

'ABCDEFG (9105) CDD' then it returns blank .

'ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'

returned ''ABCDEFG (9105) CDD' which is correct

'ABCDEFG (9105) CDD' retunned blank

Thanks,

PSB

GSquared SSC Guru Points: 260824 More actions · Answer 7

You'll have to use a CASE statement that first tests if the PatIndex value is greater than zero, and if so use that, and if not, just return the string as-is.

Again, that will only work if ABCDEFG is a constant in the data and is the only repeating pattern you're looking for.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dwain Camps SSC Guru Points: 86968 More actions · Answer 8

This may also work for you.

DECLARE @word VARCHAR(MAX)

SET @word = 'ABCDEFG'

DECLARE @t TABLE (words VARCHAR(max))

INSERT INTO @t

SELECT 'ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'

UNION ALL SELECT 'ABCDEQG (9105) CDD ABCDEFG (1690) ENGINEERING'

SELECT CASE CHARINDEX(@word, SUBSTRING(words, 1+CHARINDEX(@word, words, 1), LEN(words)), 1)

WHEN 0 THEN words

ELSE SUBSTRING(words, 1, CHARINDEX(@word, SUBSTRING(words, 1+CHARINDEX(@word, words, 1), LEN(words)), 1)-1)

END As words

FROM @t

Edit: Replaced my silly, earlier solution, even though it also worked.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86968 More actions · Answer 9

This solution is a little more general in that it can remove any first word if it repeats within the string, instead of the one word (@word) set up in my prior suggestion.

DECLARE @t TABLE (words VARCHAR(max))

INSERT INTO @t

SELECT 'ABCDEFG (9105) CDD ABCDEFG (1690) ENGINEERING'

UNION ALL SELECT 'ABCDEQG (9105) CDD ABCDEFG (1690) ENGINEERING'

;WITH cte AS (

SELECT SUBSTRING(words, 1, CHARINDEX(' ', words)) As word, words

FROM @t)

SELECT CASE CHARINDEX(word, SUBSTRING(words, 1+CHARINDEX(word, words, 1), LEN(words)), 1)

WHEN 0 THEN words

ELSE SUBSTRING(words, 1, CHARINDEX(word, SUBSTRING(words, 1+CHARINDEX(word, words, 1), LEN(words)), 1)-1)

END As words

FROM cte

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St