November 19, 2004 at 10:56 am
Does anyone know how to return the number of occurances of a string from a TEXT field. Thanks in advance.
November 19, 2004 at 1:17 pm
An easy way to do this with a varchar(8000) is to do something like:
declare @instring varchar(8000)
set @instring = 'abcabcabc'
declare @teststring varchar(100)
set @teststring = 'abc'
SELECT (LEN(@instring) - LEN(REPLACE(@instring, @teststring, ''))) / LEN(@teststring)
Unfortunately, REPLACE won't work on a TEXT column... If your data is less than 8001 characters, you can use SUBSTRING to get around that:
create table #a(blah text)
insert #a values ('abcabcabc')
declare @teststring varchar(100)
set @teststring = 'abc'
SELECT (LEN(SUBSTRING(blah, 1, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 1, 8000), @teststring, ''))) / LEN(@teststring)
FROM #a
If your data is greater than 8000 characters but you know an approximate max, one solution might be to just add these together, changing the range on the SUBSTRING:
SELECT ((LEN(SUBSTRING(blah, 1, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 1, 8000), @teststring, ''))) / LEN(@teststring))
+ ((LEN(SUBSTRING(blah, 8001, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 8001, 8000), @teststring, ''))) / LEN(@teststring))
+ ((LEN(SUBSTRING(blah, 16001, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 16001, 8000), @teststring, ''))) / LEN(@teststring))
...
FROM #a
I don't know if that's the most efficient way to handle this, but it is at least somewhat workable
--
Adam Machanic
whoisactive
November 19, 2004 at 4:52 pm
If you are using SQL 2000 fortunately both charindex and substring can work with a text datatype value and text can be used as a data for inputs in a function. So you can do this and get what you want.
/*-////////////////////////////////////////////////////////////////////////////////////////////////////////
* Created By: James Travis
* Created On: 11/19/2004
* Inputs: 2 (text, varchar(8000)) Output: Integer
* Purpose: Returns the number of occurances of a user defined string in the text data
* based on user input in @varFind
///////////////////////////////////////////////////////////////////////////////////////////////////////-*/
CREATE FUNCTION dbo.udf_cntintxt (@varText text, @varFind varchar(8000))
RETURNS int
AS
BEGIN
declare @cnt int,
@pos int,
@pre int
set @cnt = 0
set @pos = 0
set @pre = 0
while @pos < datalength(@varText)
begin
set @pre = IsNull(NullIf(@pos,0),1)
set @pos = charindex(@varFind, Substring(@varText, @pos, (datalength(@varText) - @pos) + 1))
if @pos > 0
begin
set @cnt = @cnt + 1
set @pos = (@pos - 1) + @pre + len(@varFind)
end
else
set @pos = datalength(@varText)
end
return(@cnt)
END
November 20, 2004 at 5:42 pm
CHARINDEX will not search beyond 8000 characters in SQL Server 2000... So that function doesn't quite work as expected:
create table testsearch(textcol text)
go
declare @bigstring varchar(8000)
set @bigstring = replicate('0', 8000)
declare @bigstring2 varchar(8000)
set @bigstring2 = replicate('1', 8000)
exec ('insert testsearch values(''' + @bigstring + @bigstring2 + ''')')
go
select dbo.udf_cntintxt(textcol, '1'), substring(textcol, 8000, 2)
from testsearch
go
drop table testsearch
go
--
Adam Machanic
whoisactive
November 22, 2004 at 8:37 pm
I thought about this some more and realized that the solution I posted won't work if the search string starts, e.g. at character 7999 and ends at 8001... I believe the following solves that problem. This relies upon a sequence table, Numbers, that has a column Number populated with every number between 1 and the length of the longest string you'd like to search:
create table testsearch(textcol text)
go
declare @bigstring varchar(8000)
set @bigstring = replicate('0', 8000)
declare @bigstring2 varchar(8000)
set @bigstring2 = replicate('1', 8000)
exec ('insert testsearch values(''' + @bigstring + @bigstring2 + ''')')
go
declare @searchstring varchar(20)
set @searchstring = '1'
select count(*)
from testsearch x
join Numbers nStart on nStart.Number <= datalength(x.textcol)
where substring(x.textcol, nstart.number, len(@searchstring)) = @searchstring
go
drop table testsearch
go
However, this has another, more subtle issue: What should be returned when searching for 'bab' in the string 'babab'? Regular expressions engines will generally return 1; this method will return 2.
--
Adam Machanic
whoisactive
November 23, 2004 at 2:53 am
Thanks Adam. Losts of food for thought here. In the end I decided to split my text field into 3 varchar(8000) fields using substring.
Like your last post I used a bit of overlap:
search_blockA = substring(@newXML,0,7000),
search_blockB = substring(@newXML,6980,7020),
search_blockC = substring(@newXML,13980,7020)
Then I searched each search_block individually using simple "search" functions (combination of replace / len / divide ).
Of course as you say this isn't perfect, if the overlapping puts a word in both blocks, but I can live with this.
Thanks for your input.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply