February 27, 2008 at 6:50 am
I'm currently trying to write a query that will extract the first two words from a string. I'm using charindex and substrings and it's getting quite messy! Has anyone ever written a function may be to extract a specific number of words i.e. text around spaces please? It would need to be able to cope with data where there are different lengths and number of words such as:
This is the first line
Second line
Word
This is the fourth line
and if I was asking it to return a string containing the first two words the results would be:
This is
Second line
Word
This is
Thanks.
February 27, 2008 at 6:59 am
I know that by using regular expressions, the \b is a word boundary; it might be a space/tab/Crlf, or it might be punctuation...period comma exclamation semicolon etc.
using regualr expressions, you'd be able to grab x number of words, based on word boundaries.
that would be the best way to go, but I've only installed the extended stored procedures from the SSC Toolkit on SQL2000.
let me install on 2005 and test a few expressions.
Lowell
February 27, 2008 at 7:06 am
Lowell - if you're on 2005 - use CLR instead to create the REGEX expressions instead. The XP is nice in 2000, but you can do better in 2005.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 27, 2008 at 7:16 am
This may look a little funky, but it does the job 🙂
declare @x table ( words varchar(50) not null )
insert @x
select 'This is the first line' union all
select 'Second line' union all
select 'Word' union all
select 'This is the fourth line'
selectcase when charindex(' ', words, charindex(' ', words) + 1) = 0
then words
else left(words, charindex(' ', words, charindex(' ', words) + 1))
end
from @x
go
--------------------------------------------------
This is
Second line
Word
This is
(4 row(s) affected)
/Kenneth
February 27, 2008 at 7:22 am
martin.griffiths (2/27/2008)
I'm currently trying to write a query that will extract the first two words from a string. I'm using charindex and substrings and it's getting quite messy! Has anyone ever written a function may be to extract a specific number of words i.e. text around spaces please? It would need to be able to cope with data where there are different lengths and number of words such as:This is the first line
Second line
Word
This is the fourth line
and if I was asking it to return a string containing the first two words the results would be:
This is
Second line
Word
This is
Thanks.
Just curious... what will this be used for? In other words, why do you need to do this?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2008 at 7:30 am
I need to do this to parse out data required by an agency we send address details too to place the contents of our address line 1 to their separate Flat No, house number, and street name fields. So I'm basically going to try to select the first two words if address line 1 starts with Flat in to Flat No, if it begins with a number then place the number in to house number field, and then put the remainder (or the else) in to the street name field. Not entirely 100% reliable but will do. Kenneth's response above will work nicely if I'm only looking for two words but will not work with instances like my fourth line where there's multiple spaces. Would prefer a function though that I can re-use taking in a string and a number representing the number of words required. Like the way it works though thanks Kenneth.
February 27, 2008 at 7:32 am
I found a function that may be more along the lines of what you are looking for. This function requires two inputs and the number of words you would like returned. I did not write the code and will give credit to the programmer. His link is http://www.tom-muck.com/blog/index.cfm?newsid=72
CREATE FUNCTION udf_GetNumberOfWords (
@stringToSplit varchar(8000),
@numberOfWords int
)
RETURNS varchar(8000) AS
BEGIN
DECLARE @currentword varchar(8000)
DECLARE @returnstring varchar(8000)
DECLARE @wordcount int
SET @wordcount = 0
SET @returnstring = ''
SET @currentword = ''
SET @stringToSplit = ltrim(rtrim(@stringToSplit))
Declare @index int
WHILE @wordcount 0
BEGIN
Select @index = CHARINDEX(' ', @stringToSplit)
if @index = 0
BEGIN
SELECT @currentword = ltrim(rtrim(@stringToSplit))
SELECT @wordcount = @numberOfWords
END
else
BEGIN
IF (len(@stringToSplit) - @index > 0) BEGIN
SELECT @currentword = ltrim(rtrim(LEFT(@stringToSplit, @index-1)))--the new shortened string
SELECT @stringToSplit = RIGHT(@stringToSplit,LEN(@stringToSplit) - @index) -- the rest
END
END
SELECT @returnstring = @returnstring + ' ' + @currentword
SELECT @wordcount = @wordcount + 1
END
SET @returnstring = LTRIM(@returnstring)
RETURN @returnstring
END
The function can be called like this: (2 is the number of words to return)
select dbo.udf_GetNumberOfWords(mycolumn,2)
from mytable
February 27, 2008 at 7:38 am
Matt Miller (2/27/2008)
Lowell - if you're on 2005 - use CLR instead to create the REGEX expressions instead. The XP is nice in 2000, but you can do better in 2005.
agreed Matt; I think I'm waiting for someone to put together a package of "CLR must have functions", they way someone did for the Toolkit.
Regular expressions, with specific calls for common string manipulations (strip html for example) is something I'm playing with now, but i get distracted easily.
Lowell
February 27, 2008 at 7:40 am
I have the start to one. I need to spiffy them up and I will see what I can do about posting them.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 27, 2008 at 8:02 am
That function's worked a dream thanks. I added a little while loop in to take care of the multiple spaces and it looks really good now. Many thanks.
February 27, 2008 at 8:32 am
Just for fun, another way to skin the cat.
This specific example relies on a numbers table named 'nums' with a column named 'n' that has numbers at least up to the number of charachters in the longest string.
It's also possible to do a numtable 'on the fly' as a derived table.
No considerations to performance, though 😉
create table #x ( words varchar(50) not null )
insert #x
select 'This is the first line' union all
select 'Second line'union all
select 'Word' union all
select 'This is the fourth line'
go
declare @numWords int
set @numWords = 3 -- set the 'number of words here'
selectleft(z.words, z.n) as shortString
from (
selecty.words, y.n, row_number() over (partition by y.words order by y.words, y.n) as b
from (
selectx.words, n.n, substring(x.words, n.n, 1) as letter
from#x x
joinnums n
onn.n <= len(x.words)
) y
where y.letter = ' '
) z
wherez.b = @numWords
union all
selectwords
from#x
wherelen(words) - len(replace(words, ' ', '')) < @numWords
go
shortString
------------------------
This is the
This is the
Second line
Word
(4 row(s) affected)
/Kenneth
February 27, 2008 at 8:50 am
Ya beat me to it with the numbers table... well done...
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2008 at 8:56 am
Nice. I like this one because you are not using while loops :D.
February 27, 2008 at 8:56 am
Thanks everyone. You've been a great help here!
March 2, 2012 at 10:52 am
I agree that this is a nice function, but it only works with this correction:
Instead of
While @wordcount 0
there should be
WHILE @wordcount < @numberOfWords AND len(@stringToSplit) > 0
like in the original code posted on the external page.
CREATE FUNCTION fnGetNumberOfWords (
@stringToSplit varchar(8000),
@numberOfWords int
)
RETURNS varchar(8000) AS
BEGIN
DECLARE @currentword varchar(8000)
DECLARE @returnstring varchar(8000)
DECLARE @wordcount int
SET @wordcount = 0
SET @returnstring = ''
SET @currentword = ''
SET @stringToSplit = ltrim(rtrim(@stringToSplit))
Declare @index int
WHILE @wordcount < @numberOfWords AND len(@stringToSplit) > 0
BEGIN
Select @index = CHARINDEX(' ', @stringToSplit)
if @index = 0
BEGIN
SELECT @currentword = ltrim(rtrim(@stringToSplit))
SELECT @wordcount = @numberOfWords
END
else
BEGIN
IF (len(@stringToSplit) - @index > 0) BEGIN
SELECT @currentword = ltrim(rtrim(LEFT(@stringToSplit, @index-1)))--the new shortened string
SELECT @stringToSplit = RIGHT(@stringToSplit,LEN(@stringToSplit) - @index) -- the rest
END
END
SELECT @returnstring = @returnstring + ' ' + @currentword
SELECT @wordcount = @wordcount + 1
END
SET @returnstring = LTRIM(@returnstring)
RETURN @returnstring
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply