June 10, 2010 at 4:27 pm
Dear All,
For a Web site I am currently developing, I am facing a problem for which I need real SQL experts...
Background:
This web site is about various types of information. Each type of information refers to an "owner" and is linked to some kind of "items". Since this web site is meant to be used in different languages, I also consider using the "double metaphone" algorithm (see [/url]).
This resulted in creating a table called tbl_freetext with the following definition.
owner_id (bigint)
owner_type (int)
content_id (bigint)
content_type (int)
hashcode (int)
meta_primary (int)
meta_alternate (int)
where hashcode is the "hashcode" of a word to be indexed, the meta_primary and meta_alternate are the "metaphone translations" of the word to be indexed.
I need to offer the user of the web site the possibility to search via keywords. Also, they have the possibility to select the following options:
- exact match (at least one word): look for {owner_id, owner_type, content_id} for which at least one of the entered searched words appears
- exact match (all the words): look for {owner_id, owner_type, content_id} for which ALL the entered keywords exist.
(also, I offer the variant: sound-like... In that case, the algorithm will use the metaphone entries, rather than the hashcodes).
The table is going to grow very fast. This means that I need to find out a very powerful algorithm.
Keyword search is a subset of the actual query, since in addition to the keywords, a user may also define other search criteria. The other criteria are used with the content of another table tbl_info (in fact, there are several other tables). This means that the problem described here below only returns records that will be joint to the rest of the query.
Problem definition:
Searching for the entries that match at least one word is not an issue, but I cannot find any elegant and performant algorithm to look for entries that match all the words.
I need a stored procedure (or table-valued function) with the following constraints:
input: @in_content_type int, @in_hash_list nvarchar(1024) where @in_hash_list is a string that contains a series of hashcodes (delimited) that correspond to the searched words.
output: DISTINCT owner_id, owner_type, content_id
In order to process the @in_hash_list, I am using a "table-valued function" that returns a table built based on the @in_hash_list. The function is defined as follows:
FUNCTION dbo.fct_ListToTableInteger (@list as varchar(8000), @delim as varchar(10)) RETURNS @listTable table (value int)
I don't know how to build such Stored Procedure (or table-valued function) that would answer my needs. A solution would be to dynamically build a string, based on the number of keywords and execute the string
I suppose I am not the first one who needs this kind of functionality but I could not find any acceptable solution...
Many thanks in advance for your help.
Didier
June 10, 2010 at 5:05 pm
Have you consider a "Full text search" instead of re-inventing a wheel?
Marin Kostadinovic
______________________________
Database Administrator/Architect
dimm-is.com
freemerlin.com
June 10, 2010 at 5:19 pm
In fact I did however, since my web site is multi-lingual, I need to provide a means of "phonetic" searching across several languages. Correct me if I am wrong but full-text indexing/searching does not provide this functionality.
For example, let's consider French where a similar sound might be written in different ways (e.g. "è" can be written: (è, ê, ei, ai, ~et, eh, he) , or "in" can be written: (in, ain, ein). Therefore, I found the "double metaphone" or now even better "metaphone 3" for which I adapt the initial spelling before obtaining the "metaphones".
But you are right. If the full-text indexing may give me such a similar functionality, I will reconsider...
Thanks for your answer.
June 10, 2010 at 5:27 pm
I can spend some time on different ways to accomplish a "full text search" but there is a nice article on the SQLServerCentral called "A Google-like Full Text Search[/url]", check it out, that might be a solution for your problem.
Cheers!
Marin Kostadinovic
______________________________
Database Administrator/Architect
dimm-is.com
freemerlin.com
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply