June 3, 2010 at 11:53 pm
Hi All,
I have a column in one of SQL2000 table which has dhtml data stored froom an old ASP application. As we are migrating to n SQL and .net we want to extract the current data from Production table. Any one of you aware how I can extract the text from the dhtml code stored as text in the SQL.
Please refer to sample data attached.
Thanks,
Nikhil
June 7, 2010 at 11:36 am
Nikhil,
What text exactly do you want to 'extract' from this? The entire database column? If that is the case, you could simply read the column data as text and write it out to your new database. Or are you looking to extract certain pieces of what would be the text displayed in the browser when this 'page' was viewed? If so, which pieces?
Rob Schripsema
Propack, Inc.
June 7, 2010 at 2:26 pm
Please follow the following steps:
1. Get data from the database using the ASP code
2. REmove the HTML tags from the Data retrieved
June 7, 2010 at 3:50 pm
Looks like you have some info from the CodeProject site on using Regular Expressions or straight parsing algorithms to extract the text you're looking for. My recommendation would be to write a CLR function (using VB or C#) in SQL that will strip off the HTML tags (per the CodeProject article) and return just the text. Then you can call that function from a SELECT statement, pass the DHTML column through the function, and write the output to a new table or wherever you need to go with it.
Not sure if that's the kind of help you were looking for here, but it looks like you're most of the way there already.;-)
Rob Schripsema
Propack, Inc.
June 7, 2010 at 6:38 pm
I dont think CLR could be used here as an approach. Though it is one of the options.
1. The enviornment currently used in legacy system where ASP and Sql server 2000 is used.
2. SQL server 2000 as per my understanding doesnt all SQLCLR programming , besides it will require .Net scripting skills I am not sure if resources could be available.
3. Besides to use SQLCLR you have to evaluate in terms of memory usage and performance as it works directly with SQL Server. In case of any memory leak your database could also crash. So before using it compartively lot of testing is needed
June 8, 2010 at 6:25 am
Just give it a try...
-- To get the list of Tags removed....
Declare @vTagNameTable Table(Tag varchar(max))
-- My HTML
Declare @v-2 varchar(max)
Set @v-2 = '<html DIR="LTR" xmlns:tool="http://www.microsoft.com/tooltip" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ddue="http://ddue.schemas.microsoft.com/authoring/2003/5" xmlns:MSHelp="http://msdn.microsoft.com/mshelp"><A>Atif Sheikh < Atif
<Bold>Sheikh</Bold> </HTML> Atif <><><>>'
Declare @vCharI1 int -- Used to locate '<'
Declare @vCharI2 int -- Used to locate '>' to get Tag Name
Declare @vCharICheck int -- Used to Check for Tag Name
Declare @vEndTagName varchar(1000)
Declare @vStartPos int
Set @vStartPos = 0
While 1=1
begin
Set @vCharI1 = CharIndex('<',@v,@vStartPos)
if @vCharI1 > 0
begin
-- Get the Tag Name, if it is a HTML Tag
Set @vCharI2 = CharIndex('>',@v,@vCharI1 + 1)
if @vCharI2 > 0
begin
-- Check <>
if @vCharI2 = @vCharI1 + 1
begin
Set @vStartPos = @vCharI2 + 1
end
else
begin
Set @vCharICheck = CharIndex('<',@v,@vCharI1 + 1)
if (@vCharICheck < @vCharI2)
begin
-- Last < found was not start of tag. Just < character / sign
if @vCharICheck > 0
Set @vStartPos = @vCharICheck
else
begin
-- @vCharICheck = 0
-- Its a Last TAG...
Set @vEndTagName = SubString(@v,@vCharI1,(@vCharI2-@vCharI1) + 1)
if Not Exists(Select 1 from @vTagNameTable where Tag = @vEndTagName)
begin
Insert into @vTagNameTable
Select @vEndTagName
Set @v-2 = Replace(@v,@vEndTagName,'')
end
Set @vStartPos = @vCharI2
end
if @vStartPos = Len(@v)
begin
Select @vStartPos , Len(@v),@vCharICheck,@vCharI1
print 'Over Here....'
BREAK
end
end
else
begin
-- ITS A TAG....
Set @vEndTagName = SubString(@v,@vCharI1,(@vCharI2-@vCharI1) + 1)
if Not Exists(Select 1 from @vTagNameTable where Tag = @vEndTagName)
begin
Insert into @vTagNameTable
Select @vEndTagName
Set @v-2 = Replace(@v,@vEndTagName,'')
end
--Set @vStartPos = @vCharI2 + 1
Set @vStartPos = @vCharI1
if @vStartPos > Len(@v)
begin
print 'Here....'
BREAK
end
end
end
end
else
BREAK -- No Tag
end
else
BREAK
end
Select * from @vTagNameTable
Select @v-2 as StringWithoutTags
June 9, 2010 at 10:57 pm
I used the below script and I am not getting clean text.
I want to extract the whole text which a browser would display for my dhtml code. The below script is giving some unrecognised characters. I am running short of time for other things, so was not able to look into the issue and posts. Thanks for all the replies Guys!
USE [OPINIKHIL]
GO
/****** Object: StoredProcedure [dbo].[udf_StripHTML] Script Date: 06/10/2010 10:22:42 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE procedure [dbo].[udf_StripHTML]
(@HTMLText varchar(8000))
--RETURNS varchar(8000)
AS
BEGIN
DECLARE @Start int
DECLARE @End int
DECLARE @Length int
-- Replace the HTML entity & with the '&' character (this needs to be done first, as
-- '&' might be double encoded as '&')
SET @Start = CHARINDEX('&am;p;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity < with the '<' character
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity > with the '>' character
SET @Start = CHARINDEX('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>')
SET @Start = CHARINDEX('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity & with the '&' character
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity with the ' ' character
SET @Start = CHARINDEX('', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ')
SET @Start = CHARINDEX('', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END
-- Replace any
tags with a newline
SET @Start = CHARINDEX('
', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10))
SET @Start = CHARINDEX('
', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace any
tags with a newline
SET @Start = CHARINDEX('
', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)')
SET @Start = CHARINDEX('
', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace any
tags with a newline
SET @Start = CHARINDEX('
', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)')
SET @Start = CHARINDEX('
', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END
-- Remove anything between <whatever> tags
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0)
BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
END
--select LTRIM(RTRIM(@HTMLText))
insert into nikhil values( LTRIM(RTRIM(@HTMLText)))
--RETURN LTRIM(RTRIM(@HTMLText))
END
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply