words of a string

  • I have a string and I would like to return only the first 10 words of the string and also some times the string might contain 8 words. IN that case I need to retun those 8 words.

    How can I do that.

     

    Thanks.

     

  • I found an user defined function and it worked really good.

     

    Thanks.

  • Please post the final solution so that the users who find this post on search engines may get an answer.

  • for 2005 Server

    create a CLR function as [C#] using .NET Regex Class

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    using System.Text.RegularExpressions;

    public partial class UserDefinedFunctions

    {

      

        [Microsoft.SqlServer.Server.SqlFunction]

        public static String ShowMatch(String s, String match_pattern)

        {

            // Put your code here

            Regex r = new Regex(match_pattern, RegexOptions.None | RegexOptions.Multiline);

            return r.Match(s).ToString();

        }

    };

    after building an assembly for the function, use the function to retrieve the words out of the InputString using Regular Expression pattern:

    select InputString, dbo.ShowMatch(OriginalString, '^(\b\w+\x20+){8,10}') as EightToTenWordsFromTheStart

    from MyTable

  • correction to my prev post:

    regex needs t be enhanced to take care of the situation when there are only 8 words, and one needs to get them all:

    ^(\b\w+(\x20+|$)){10}|^(\b\w+(\x20+|$)){8}

    select InputString, dbo.ShowMatch(InputString,

    '^(\b\w+(\x20+|$)){10}|^(\b\w+(\x20+|$)){8}') as EightToTenWordsFromTheStart

    from MyTable

    handles cases:

    Input: one two trhee four five six seven eight nine ten eleven

    returns: one two trhee four five six seven eight nine ten

    Input:  one two trhee four five six seven eight

    returns: one two trhee four five six seven eight

  • Great, thanx for the help.

  • Nice function, Serqei... how many rows per second are you able to process with it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Remi... remember my response when you asked about DTS??? This is a perfect example.

    Ok... now for the rest of us humans   If you...

    1. Don't have SQL Server 2005 or...
    2. Don't have a clue what a CLR is or...
    3. Don't have a DBA that will let you use CLR's or...
    4. Don't have C or...
    5. Don't know how to program in C or...
    6. Don't have a clue what rational expressions are or...
    7. Don't want to have to keep track of special external functions during upgrades to SQL Server or...
    8. ...whatever...

    ... then this is for you

    First, if after all this time, you STILL haven't created a Tally table, now's the time to do it

    --===== Create and populate the Tally table on the fly

     SELECT TOP 11000 --equates to more than 30 years of dates

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC

    ... then, create this function (does a little over 10k rows per second... a bit slow but tolerable because it's about 10 times faster than a WHILE loop ) ...

     CREATE FUNCTION dbo.fnLeftWords

    /******************************************************************************

     This function accepts a string of characters and returns the left "x" number

     of words using the provided delimiter.  Using "0" for the desired number of

     words returns the full string.

    ******************************************************************************/

            (

             @String VARCHAR(8000), --The string to return the left "x" words from

             @DesiredWords INT,     --The number of words to return from @String

             @Delimiter VARCHAR(1)  --The single character that separates words

            )

    RETURNS VARCHAR(8000)

         AS

      BEGIN

            --===== Declare local variables

            DECLARE @Work VARCHAR(8000), --Holds modified input string

                    @Posit INT,          --Position of a delimiter in the string

                    @Count INT           --Number of delimiters that have occurred

                SET @Count = 0           --No delimiters have occurred, yet

            --===== Convert the input with something not likely to be a delimiter.

                 -- and we're going to count delimiters to find words.

                SET @Work = REPLACE(@String,@Delimiter,CHAR(1))+CHAR(1)

               

            --===== Using the Tally table to replace a loop, find the position

                 -- of the delimiter after the desired word

             SELECT @Count = @Count+1,

                    @Posit = CASE

                                 WHEN @Count = @DesiredWords

                                 THEN N

                                 ELSE @Posit

                             END

               FROM dbo.Tally

              WHERE SUBSTRING(@Work,N,1) = CHAR(1)

                AND N < LEN(@Work)

            --===== Return all but the last delimiter after the number of desired words

             RETURN (SELECT SUBSTRING(@String,1,ISNULL(@Posit-1,8000)))

        END

    GO

    ...and use it...

    SET NOCOUNT ON

    SELECT dbo.fnLeftWords(NULL,10,' ')

    SELECT dbo.fnLeftWords('',10,' ')

    SELECT dbo.fnLeftWords('one',10,' ')

    SELECT dbo.fnLeftWords('one two',10,' ')

    SELECT dbo.fnLeftWords('one two three',10,' ')

    SELECT dbo.fnLeftWords('one two three four',10,' ')

    SELECT dbo.fnLeftWords('one two three four five',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six seven',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six seven eight',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six seven eight nine',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six seven eight nine ten',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six seven eight nine ten eleven',10,' ')

    SELECT dbo.fnLeftWords('one two three four five six seven eight nine ten eleven twelve',10,' ')

    ...or...

     SELECT dbo.fnLeftWords(acolumn,#ofwords,somedelimiter)

       FROM yourtable

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hi Jeff, hope u r doing fine.

    I just run a search in a table of 250k recs  on a varchar(4000) fld for a regex pattern of medium complexity using a function similar to what I posted. On a table w/o indices it  completed the search and returned 10000 matches in 16 sec which is apprx 15k recs/sec. Good enough for me. Easier to write too, btw. 

    Regards,

    Sergei

     

  • Jeff if u can spare a minute, please explain in a few words what did u mean by

    ***Remi... remember my response when you asked about DTS??? This is a perfect example.**

    Thanks

    Sergei

  • Outstanding... thank you for the feedback.  I always wondered how the performance of a CLR would compare.

    Yep, you are correct... the CLR is easier to write AND has the utility of being able to use other RegEx.  I just wanted to make sure that folks who don't have 2005 or may don't have "C" could do the same thing if the needed to...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sure... although I have to eat a bit of crow on this one

    A frequent poster by the name of Serqiy and I took exception to someone using DTS to do very simple ETL job with a slight hook in it and Remi asked why.  My explanation was that a lot of people use other tools/applications, like VBS via DTS for example, to overcome their lack of knowledge in SQL Server 2000 (they either can't or won't take the time to figure out how do it in SQL).  (BTW... not saying that you have a lack of knowledge here, read on).  Many times, those work arounds perform in a much less than optimal fashion and for THAT particular problem, DTS was simply not the answer.

    Here's where I bite into that crow a bit... the "C" function you wrote to allow RexEx without a bunch of hoopla is superb (don't get a big head on me, now ) especially in light of the fact that it beats the native T-SQL solution.  Your CLR is definitely an exception to the rule so far as what I've seen folks write (not necessarily all CLR's either), so far.  Nice job!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good enough, you say?

    Go to Google and start search by some long string.

    Number of searched pages is definetely more than 10000, and result will be returned in less than a second.

    I've got strict request to bring search result from 1 mill. rows in less than 5 second.

    10000 in 15 sec is not good enough anymore.

    BTW. Why do you need SQL Server? What's wrong with file system?

    _____________
    Code for TallyGenerator

  • What's the best way to use the file system to do such a quick search?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Which search do you mean?

    Like Google?

    There is no way. It must be proper database solution. 95% done on "storing strings" part, and 5% on "searching and retrieving" part.

    Like 10000 per 15 sec.?

    Read file into memory and do the search from application code.

    This is what Sergei's code is actually doing.

     

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply