November 7, 2016 at 5:24 pm
Comments posted to this topic are about the item Function - Word count in a sentence
November 8, 2016 at 7:42 am
declare @var1 varchar(max) = 'It''s now or never I ain''t gonna live forever'
select len(@var1)-len(replace(@var1,' ','')) + 1
November 11, 2016 at 12:43 am
Thank you for script.
But it's need to SqlServer 2012 because of lag function.
why not just split & count?
CREATE FUNCTION dbo.kg_ifn_StrToTable
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Sonuc = y.i.value('(./text())[1]', 'nvarchar(MAX)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(@List, @Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
Select COUNT(1) from dbo.kg_ifn_StrToTable('It''s now or never I ain''t gonna live forever', ' ')
-- 300.000 word / 4 sec
And it's running on SqlServer 2008
November 11, 2016 at 2:06 am
SQL Guy 1 (11/8/2016)
declare @var1 varchar(max) = 'It''s now or never I ain''t gonna live forever'select len(@var1)-len(replace(@var1,' ','')) + 1
My thoughts exactly. But you probably need to boil down double, triple, etc. spaces to a single space first. Something along the lines of:
Replace (Replace (Replace (@Source, @CharacterToRemove + @CharacterToRemove, @CharacterToRemove + Char (8)), Char (8) + @CharacterToRemove, ''), Char (8), '')
November 11, 2016 at 3:28 am
julian.fletcher (11/11/2016)
SQL Guy 1 (11/8/2016)
declare @var1 varchar(max) = 'It''s now or never I ain''t gonna live forever'select len(@var1)-len(replace(@var1,' ','')) + 1
My thoughts exactly. But you probably need to boil down double, triple, etc. spaces to a single space first. Something along the lines of:
Replace (Replace (Replace (@Source, @CharacterToRemove + @CharacterToRemove, @CharacterToRemove + Char (8)), Char (8) + @CharacterToRemove, ''), Char (8), '')
I'd be tempted to extend that to all non alphanumeric characters or apostrophies so it handles punctuation and multiple lines.
Still would have problems if it needed handle something like an IP address, url or formatted phone numbers but you'd need to define the definition of a word a bit better in relation to such things.
November 12, 2016 at 12:07 pm
Same Function - compatible with 2008:
CREATE FUNCTION dbo.udf_WordCount_2008
(
--declare
@STR VARCHAR(8000)
--= 'It''s now or never I ain''t gonna live forever'
)
RETURNS TABLE AS RETURN
WITH Tally (n) AS
(
SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
, BreakChar as
(
SELECT SUBSTRING(@str , n , 1) [Char] , N
FROM Tally
)
, Analize as
(
SELECT a.* , b.Char PrevChar
FROM BreakChar a
JOIN BreakChar b
on a.n = b.n+1
)
SELECT WordCount = COUNT(1) + 1
FROM Analize
WHERE [Char] != PrevChar
AND PrevChar = ' '
November 14, 2016 at 8:44 am
hkravitz (11/12/2016)
Same Function - compatible with 2008:
Thanks for the 2008 version.
January 15, 2017 at 7:50 am
Excellent. Thanks.
Sometimes you need to use the function in a query, so that it needs to return INT instead of a set.
This is a little amendment.
CREATE FUNCTION [dbo].[fnWordCount] (@str VARCHAR(8000) )
RETURNS INT AS
BEGIN
DECLARE @WordCount INT=0;
WITH Tally (n) AS
( SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM(VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
),
BreakChar as
(
SELECT SUBSTRING(@str , n , 1) [Char] , N
FROM Tally
),
Analize as
(
SELECT [Char], LAG([Char],1) OVER (ORDER BY N) PrevChar
FROM BreakChar
)
SELECT @WordCount = COUNT(1) + 1
FROM Analize
WHERE [Char] != PrevChar AND PrevChar = ' ';
RETURN @WordCount;
END
Igor Micev,My blog: www.igormicev.com
January 15, 2017 at 10:41 am
Ercüment Eşkar - Friday, November 11, 2016 12:43 AMThank you for script.But it's need to SqlServer 2012 because of lag function.why not just split & count?CREATE FUNCTION dbo.kg_ifn_StrToTable( @List NVARCHAR(MAX), @Delimiter NVARCHAR(1))RETURNS TABLEWITH SCHEMABINDINGAS RETURN ( SELECT Sonuc = y.i.value('(./text())[1]', 'nvarchar(MAX)') FROM ( SELECT x = CONVERT(XML, '<i>' + REPLACE(@List, @Delimiter, '</i><i>') + '</i>').query('.') ) AS a CROSS APPLY x.nodes('i') AS y(i) );GOSelect COUNT(1) from dbo.kg_ifn_StrToTable('It''s now or never I ain''t gonna live forever', ' ') -- 300.000 word / 4 sec
And it's running on SqlServer 2008
The XML method of splitting is quite slow compared to other methods especially if you don't actually need blobs. Please see the following article on that.
http://www.sqlservercentral.com/articles/Tally+Table/72993/
If you want to save yourself a trip to the article, here's one of the performance charts from the article with split counts in the range of what you might expect in a sentence. The XML splitter is as bad as a WHILE loop, mostly because of the concatenation that takes place in the XML Splitter.
For splits with more elements but still in the range of VARCHAR(8000) (and, yes, there's a separate splitter for NVARCHAR(4000), please see the following chart.
The skinny Black lines in both charts represent a relatively newer version of the DelimitedSplit8K function and that's before the additional improvements that someone else made and certainly before the improvement for 2012 that Eirikur Erricson posted in the following article.
http://www.sqlservercentral.com/Authors/Articles/Eirikur_Eiriksson/1346039/
--Jeff Moden
Change is inevitable... Change for the better is not.
January 15, 2017 at 10:59 am
Igor Micev - Sunday, January 15, 2017 7:50 AMExcellent. Thanks.Sometimes you need to use the function in a query, so that it needs to return INT instead of a set.
This is a little amendment.
CREATE FUNCTION [dbo].[fnWordCount] (@str VARCHAR(8000) )
RETURNS INT AS
BEGIN
DECLARE @WordCount INT=0;
WITH Tally (n) AS
( SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM(VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
),
BreakChar as
(
SELECT SUBSTRING(@str , n , 1) [Char] , N
FROM Tally
),
Analize as
(
SELECT [Char], LAG([Char],1) OVER (ORDER BY N) PrevChar
FROM BreakChar
)
SELECT @WordCount = COUNT(1) + 1
FROM Analize
WHERE [Char] != PrevChar AND PrevChar = ' ';
RETURN @WordCount;
END
Hi,
Turning the function into a scalar function effects the performance, you can still use the function (inline) in a select statement using a cross apply clause and use any calculation you are trying to achieve.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply