October 28, 2007 at 9:44 pm
I'm trying to extract multiple strings that exist between two different delimiters....
It's almost but not quit HTML, just custom tags that are being used.
this same functionality would be ideal for searching for a lot of different things.
in this case, the sample data is from the public resource Gutenberg Dictionary...some words have more than one definition. definitions are between "<def>" tags and there is often more than one.
without the tally table, for the first definition, i was able to use this SQL:
[font="Courier New"]
declare
@pre varchar(10),
@post varchar(10),
@pr int,
@po int,
@st int
set @pre = '<def>'
set @post = '</def>'
set @pr = len(@pre)
set @po = len(@post)
set @st = 1
SELECT top 30 substring(DefinitionAsHTML,charindex(@pre,DefinitionAsHTML,@st)+ @pr,charindex(@post,DefinitionAsHTML,@st)- charindex(@pre,DefinitionAsHTML,@st) -@pr ) As Word ,
*
FROM #tmp
WHERE charindex(@pre ,DefinitionAsHTML,@st) > 0
AND charindex(@post ,DefinitionAsHTML,@st) > 0[/font]
the only way i could think of processing records with more than one "<def>" tag,was to add a column to the table,a dn update the column with the charindex of the </def> tag, adn then and use the same function above, using the column value for the @st value...
this is a little repetitive because some words have more than 20 definitions. i was 3 definitions in when i realized there's gotta be a better way.
can anyone rub my nose in how to adapt this to work witht he Tally table to get all the definitions in a select?
sample data is too wide to post and is wider than a varchar(8000), but you can copy from this link
Lowell
October 29, 2007 at 5:03 am
Lowell,
I just tried the below query and its giving what exactly you require....
SELECTT1.N, T2.NumDef,
SUBSTRING( T2.DefinitionAsHTML, ( CASE T1.N WHEN 1 THEN 1 ELSE T1.N + @pr END ),
( CASE CHARINDEX( @post, T2.DefinitionAsHTML, T1.N + @pr )
WHEN 0 THEN LEN( T2.DefinitionAsHTML ) - T1.N + @pr
ELSE CHARINDEX( @post, T2.DefinitionAsHTML, T1.N + @pr ) - ( T1.N + @pr )
END ) ) AS String
FROMdbo.Tally T1
CROSS APPLY #tmp T2
WHEREN <= LEN( T2.DefinitionAsHTML )
AND SUBSTRING( T2.DefinitionAsHTML, T1.N, @pr ) = @pr
--Ramesh
October 29, 2007 at 5:21 am
Thanks Ramesh; i'm looking it over now; i must have got a copy paste error, as it returns an erro based on something in the data;
Conversion failed when converting the varchar value ' B' to data type int.
I'll keep tweaking it to see if i can get some results (and make some sense out of it in my own mind)
Lowell
October 29, 2007 at 6:48 am
Ramesh that was spot on... the only thing that was missing from your sample was a single character due to my poorly named variables:
WHEREN <= LEN( T2.DefinitionAsHTML )AND SUBSTRING( T2.DefinitionAsHTML, T1.N, @pr ) = @pre
that's the peice i was puzzling over...how was your code finding the starting string.
This will go a long way to helping me understand the tally concept, as it's a real example to me and not an abtract one with someone elses project.
Thanks
Lowell
October 29, 2007 at 6:56 am
Lowell (10/29/2007)
Ramesh that was spot on... the only thing that was missing from your sample was a single character due to my poorly named variables:WHEREN <= LEN( T2.DefinitionAsHTML )AND SUBSTRING( T2.DefinitionAsHTML, T1.N, @pr ) = @pre
that's the peice i was puzzling over...how was your code finding the starting string.
This will go a long way to helping me understand the tally concept, as it's a real example to me and not an abtract one with someone elses project.
Thanks
I didn't knew i missed a char over there;)....I'd copied it from SSMS:)
I glad it helped solve the problem....:D
--Ramesh
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply