September 22, 2006 at 8:20 am
I have a string and I would like to return only the first 10 words of the string and also some times the string might contain 8 words. IN that case I need to retun those 8 words.
How can I do that.
Thanks.
September 22, 2006 at 8:53 am
I found an user defined function and it worked really good.
Thanks.
September 22, 2006 at 8:59 am
Please post the final solution so that the users who find this post on search engines may get an answer.
September 22, 2006 at 9:52 am
for 2005 Server
create a CLR function as [C#] using .NET Regex Class
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static String ShowMatch(String s, String match_pattern)
{
// Put your code here
Regex r = new Regex(match_pattern, RegexOptions.None | RegexOptions.Multiline);
return r.Match(s).ToString();
}
};
after building an assembly for the function, use the function to retrieve the words out of the InputString using Regular Expression pattern:
select InputString, dbo.ShowMatch(OriginalString, '^(\b\w+\x20+){8,10}') as EightToTenWordsFromTheStart
from MyTable
September 22, 2006 at 10:01 am
correction to my prev post:
regex needs t be enhanced to take care of the situation when there are only 8 words, and one needs to get them all:
^(\b\w+(\x20+|$)){10}|^(\b\w+(\x20+|$)){8}
select InputString, dbo.ShowMatch(InputString,
'^(\b\w+(\x20+|$)){10}|^(\b\w+(\x20+|$)){8}') as EightToTenWordsFromTheStart
from MyTable
handles cases:
Input: one two trhee four five six seven eight nine ten eleven
returns: one two trhee four five six seven eight nine ten
Input: one two trhee four five six seven eight
returns: one two trhee four five six seven eight
September 22, 2006 at 10:02 am
Great, thanx for the help.
September 22, 2006 at 6:22 pm
Nice function, Serqei... how many rows per second are you able to process with it?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2006 at 11:38 pm
Remi... remember my response when you asked about DTS??? This is a perfect example.
Ok... now for the rest of us humans If you...
... then this is for you
First, if after all this time, you STILL haven't created a Tally table, now's the time to do it
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
... then, create this function (does a little over 10k rows per second... a bit slow but tolerable because it's about 10 times faster than a WHILE loop ) ...
CREATE FUNCTION dbo.fnLeftWords
/******************************************************************************
This function accepts a string of characters and returns the left "x" number
of words using the provided delimiter. Using "0" for the desired number of
words returns the full string.
******************************************************************************/
(
@String VARCHAR(8000), --The string to return the left "x" words from
@DesiredWords INT, --The number of words to return from @String
@Delimiter VARCHAR(1) --The single character that separates words
)
RETURNS VARCHAR(8000)
AS
BEGIN
--===== Declare local variables
DECLARE @Work VARCHAR(8000), --Holds modified input string
@Posit INT, --Position of a delimiter in the string
@Count INT --Number of delimiters that have occurred
SET @Count = 0 --No delimiters have occurred, yet
--===== Convert the input with something not likely to be a delimiter.
-- and we're going to count delimiters to find words.
SET @Work = REPLACE(@String,@Delimiter,CHAR(1))+CHAR(1)
--===== Using the Tally table to replace a loop, find the position
-- of the delimiter after the desired word
SELECT @Count = @Count+1,
@Posit = CASE
WHEN @Count = @DesiredWords
THEN N
ELSE @Posit
END
FROM dbo.Tally
WHERE SUBSTRING(@Work,N,1) = CHAR(1)
AND N < LEN(@Work)
--===== Return all but the last delimiter after the number of desired words
RETURN (SELECT SUBSTRING(@String,1,ISNULL(@Posit-1,8000)))
END
GO
...and use it...
SET NOCOUNT ON
SELECT dbo.fnLeftWords(NULL,10,' ')
SELECT dbo.fnLeftWords('',10,' ')
SELECT dbo.fnLeftWords('one',10,' ')
SELECT dbo.fnLeftWords('one two',10,' ')
SELECT dbo.fnLeftWords('one two three',10,' ')
SELECT dbo.fnLeftWords('one two three four',10,' ')
SELECT dbo.fnLeftWords('one two three four five',10,' ')
SELECT dbo.fnLeftWords('one two three four five six',10,' ')
SELECT dbo.fnLeftWords('one two three four five six seven',10,' ')
SELECT dbo.fnLeftWords('one two three four five six seven eight',10,' ')
SELECT dbo.fnLeftWords('one two three four five six seven eight nine',10,' ')
SELECT dbo.fnLeftWords('one two three four five six seven eight nine ten',10,' ')
SELECT dbo.fnLeftWords('one two three four five six seven eight nine ten eleven',10,' ')
SELECT dbo.fnLeftWords('one two three four five six seven eight nine ten eleven twelve',10,' ')
...or...
SELECT dbo.fnLeftWords(acolumn,#ofwords,somedelimiter)
FROM yourtable
--Jeff Moden
Change is inevitable... Change for the better is not.
September 23, 2006 at 9:18 pm
hi Jeff, hope u r doing fine.
I just run a search in a table of 250k recs on a varchar(4000) fld for a regex pattern of medium complexity using a function similar to what I posted. On a table w/o indices it completed the search and returned 10000 matches in 16 sec which is apprx 15k recs/sec. Good enough for me. Easier to write too, btw.
Regards,
Sergei
September 23, 2006 at 9:28 pm
Jeff if u can spare a minute, please explain in a few words what did u mean by
***Remi... remember my response when you asked about DTS??? This is a perfect example.**
Thanks
Sergei
September 24, 2006 at 2:00 pm
Outstanding... thank you for the feedback. I always wondered how the performance of a CLR would compare.
Yep, you are correct... the CLR is easier to write AND has the utility of being able to use other RegEx. I just wanted to make sure that folks who don't have 2005 or may don't have "C" could do the same thing if the needed to...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2006 at 2:16 pm
Sure... although I have to eat a bit of crow on this one
A frequent poster by the name of Serqiy and I took exception to someone using DTS to do very simple ETL job with a slight hook in it and Remi asked why. My explanation was that a lot of people use other tools/applications, like VBS via DTS for example, to overcome their lack of knowledge in SQL Server 2000 (they either can't or won't take the time to figure out how do it in SQL). (BTW... not saying that you have a lack of knowledge here, read on). Many times, those work arounds perform in a much less than optimal fashion and for THAT particular problem, DTS was simply not the answer.
Here's where I bite into that crow a bit... the "C" function you wrote to allow RexEx without a bunch of hoopla is superb (don't get a big head on me, now ) especially in light of the fact that it beats the native T-SQL solution. Your CLR is definitely an exception to the rule so far as what I've seen folks write (not necessarily all CLR's either), so far. Nice job!
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2006 at 3:33 pm
Good enough, you say?
Go to Google and start search by some long string.
Number of searched pages is definetely more than 10000, and result will be returned in less than a second.
I've got strict request to bring search result from 1 mill. rows in less than 5 second.
10000 in 15 sec is not good enough anymore.
BTW. Why do you need SQL Server? What's wrong with file system?
_____________
Code for TallyGenerator
September 24, 2006 at 4:02 pm
What's the best way to use the file system to do such a quick search?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2006 at 6:01 pm
Which search do you mean?
Like Google?
There is no way. It must be proper database solution. 95% done on "storing strings" part, and 5% on "searching and retrieving" part.
Like 10000 per 15 sec.?
Read file into memory and do the search from application code.
This is what Sergei's code is actually doing.
_____________
Code for TallyGenerator
Viewing 15 posts - 1 through 15 (of 39 total)
You must be logged in to reply to this topic. Login to reply