February 16, 2007 at 1:45 pm
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/mAhmadi/2875.asp
April 11, 2007 at 10:11 am
Hi,
I'm a bit of a newbie - so I've seen SQL has the ability to do Full Text Searching. I've never used it but why did you not use it? Just curious as I'll need to implement something like this soon.
Thanks
April 11, 2007 at 2:53 pm
If You want to refine the search You can put all articles into an n-dimensional vector-space, where n ist the number of all (distinct) keywords in Your (keyword-) table. Each Entry of Your keyword log (LogEntry_Keyword) can then be understood as a vector. Euclidean "near" vectors will then assumedly contain related content. Have fun
April 13, 2007 at 1:05 pm
Don't get the impression that the methodology described here is by any means a substitute for full text search - it certainly is not. In the article we are simply defining a keyword as the text between space characters and we are in control of what tags we use to describe a log entry. In a real-world scenario you are more likely going to need the abilities of the full text engine's parser (word breaker) as well as the efficiency and versatility of full text querying. This is in fact a self-contained solution to the problem of keyword search, but it is a bare-bones solution. I use it mainly for keeping track of information that can be described with a handful of keywords.
Mike
February 20, 2008 at 10:40 pm
This is an interesting idea, and I think I get the author's point - why use the FTS "sledgehammer" for simpler tasks that don't require all the extra functionality? One thing that might add value to the tokenization/matching the author presents when compared to FTS is the ability to do approximate matching: phonetic, edit distance, n-gram, or common substring matching (or maybe some combination of these). You can actually get very good performance and good accuracy matches from a set-based n-gram solution.
February 21, 2008 at 6:01 am
This is a brilliant concept, and I'm amazed I have not seen it anywhere else!
There are some known issues with full-text search:
- when searching, you need to look for "Words", you cannot use arbitrary substrings
- you cannot (easily?) "partition/order" a full-text index by a key, eg "UserID" or "ClientID" - in a shared-tenant architecture (SaaS environment with multiple/many clients in a single DB) this can be a very serious issue!
- Administration/Maintenance is very painful in SQL Server 2000 and earlier (have not tried 2005 but reputedly much better)
If instead of using a Trigger to do the "tokenizing" in this solution you used a scheduled job, along with trigger to maintain an "UpdateRequired" flag of some sort on the record, for the job to look at, you would basically be building your own "text search light" system, suitable for all sorts of uses...
It does have major disadvantages of course:
- will use much more space for the tokens than full-text search would
- will be less efficient when tokenizing
- will be less powerful when tokenizing (no word root identification etc)
- will probably/possibly be slower when returning matches on entire table (but faster on subset by a key that you specify)
All in all a great option to keep in mind though I think - does anyone see other major disadvantages (or advantages) that I am missing?
Thanks,
Tao
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
February 22, 2008 at 10:19 am
There appears to be a bug in the search stored procedure ... it was only searching on the first term given to it in the search string.
It was also eroding the string by one character every loop, so would break after the padding string was totally eroded away
To solve the problems change:
The line:
SET @kws = SUBSTRING(@kw, CHARINDEX(' ', @kws) + 1, LEN(@kws) - CHARINDEX(' ', @kws) - 1)
To this (remembering to remove the -1 at the end):
SET @kws = SUBSTRING(@kws, CHARINDEX(' ', @kws) + 1, LEN(@kws) - CHARINDEX(' ', @kws) )
November 10, 2008 at 3:16 pm
:w00t: fascinating concept.
Which versions of SQL is this intended for? I am getting error from SQL Query Analyzer 2000 when try to execute the trigger section on a db running in SQL Server 2005 :
Server: Msg 207, Level 16, State 1, Procedure trgInsertLogEntry, Line 21
Invalid column name 'tags'.
Points to line:
SET @tags = (SELECT tags FROM INSERTED)
Where did table "INSERTED" get made?
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply