December 3, 2004 at 2:39 pm
In a table there is field "txt1". In one record, this field contains "Profit:The return received on a business undertaking after all operating expenses have been met".
In another record, the field contains "Profit: The return received on an investment after all charges have been paid. Profit is a good thing."
Notice the word "profit" is used once in the first record and twice in the second record.
Here is the question: Can I write a query that will count the instances of the word "profit" and give me a result set like...
Record Count
1 1
2 2
December 4, 2004 at 1:26 pm
Do a search in the script section here. I'm pretty sure you'll find something useful there.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
January 9, 2005 at 7:46 pm
The LEN(Replace.... finds the instances - even if it is used in another word (i.e. profitability).
How can I get only EXACT matches??
Any ideas
January 11, 2005 at 8:00 am
I'm not a huge fan of the LEN(REPLACE) method (it fails for strings over 8000 characters); instead, I'd recommend:
First create a table of numbers:
SELECT IDENTITY(int, 1, 1) AS Number
INTO #Numbers
FROM pubs..authors a,
pubs..authors b,
pubs..authors c
Then:
SELECT YourTable.PK, COUNT(*) AS NumberOfProfit
FROM #Numbers N
CROSS JOIN YourTable
WHERE
SUBSTRING(YourTable.Col, N.Number, LEN('profit')) = 'profit'
AND PATINDEX('%[a-z0-9]%', SUBSTRING(YourTable.Col, N.Number + LEN('profit'), 1)) = 0
GROUP BY YourTable.PK
... This query assumes that any non-alphanumeric character is either punctuation or white space. You might want to modify it if that's not the case...
--
Adam Machanic
whoisactive
January 11, 2005 at 8:37 am
By the way, that query only looks at characters AFTER the word 'profit' ...
So it would find a match (if it were a word) on 'unprofit'.
If that's a problem, change it to:
SELECT YourTable.PK, COUNT(*) AS NumberOfProfit
FROM #Numbers N
CROSS JOIN YourTable
WHERE
SUBSTRING(YourTable.Col, N.Number, LEN('profit')) = 'profit'
AND PATINDEX('%[a-z0-9]%', SUBSTRING(YourTable.Col, N.Number + LEN('profit'), 1)) = 0
AND PATINDEX('%[a-z0-9]%', SUBSTRING(YourTable.Col, N.Number - 1, 1)) = 0
GROUP BY YourTable.PK
--
Adam Machanic
whoisactive
January 11, 2005 at 1:04 pm
I turned this into a UDF, which you can find here:
http://sqljunkies.com/WebLog/amachanic/articles/CountSubstring.aspx
--
Adam Machanic
whoisactive
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply