A Google-like Full Text Search

  • Hi Mike,

    Thanks for responding. Unfortunately that didn't seem to do it. I notice that there is an explicit check for the "-" character in the "AndExpression" case statement in ConvertQuery and also a reference to the ExcludeOperator in the defintion of the AndOperator. I wonder does something need to be added to these also to make it recognise the "NOT" keyword? Or could it be a stopword issue?

    Thanks

    James

  • I'm using this in my current project and it's working great. Thanks for taking time to put together a nice article we can all follow!

    My site is used to search through a large technical publication. Everything has been working great, but a few days ago I typed in "orthostatic" and it threw an error.

    I stepped threw the code and after the following:

    AstNode root;

    root = _compiler.Parse("orthostatic");

    "root" is coming back null, so when I pass it to:

    string myString;

    myString = SearchGrammar.ConvertQuery(root, SearchGrammar.TermType.Inflectional);

    the switch statement in the ConvertQuery method throws the error.

    Before I do a "quick" workaround for nulls, I wanted to see if you had any insight on what might be causing this.

    I've been using the search for quite a while and this is the only one that has caused an issue so far.:crazy:

    Btw, I'm using version 1.0 of the Irony.dll.

    Thanks!

  • Hi folks

    this is Roman, Irony developer

    Just wanted to let everybody know that I've refactored SearchGrammar to work with latest version of Irony and added it to Sample grammars in download. The original grammar and conversion method had been tweaked a bit, as some of the old functionality in Irony (node bubbling) had been refactored into similar but different thing. As a result, the output tree is different from Mike's version, but the query conversion result should be the same.

    I didn't try the new grammar against database, so cannot garantee 100% bug free status. Any comments, test runs against real SQL database would be appreciated

    thank you

    Roman

  • Dear Michael,

    first of all, thank you very much for your articles, always very interesting for me.

    Let's talk about your article related to the FTS features of SQL server (Google-like search).

    I have ths problem: I would like to emulate the FTS feature inside a C# application, without calling my SQL server DB.

    That is, I simply want to have a routine taking in input a text string and a search string, and returning true or false if the search string matches or not the text string.

    Do you know anything about this issue?

    Perhaps someone has already solved my problem...

    Please, let me know something, I'm struggling about this dilemma!!!

    my email address is: cghersi@foldier.com

    thank you very much!!

    Cristiano

  • outatime (6/1/2009)


    I'm using this in my current project and it's working great. Thanks for taking time to put together a nice article we can all follow!

    My site is used to search through a large technical publication. Everything has been working great, but a few days ago I typed in "orthostatic" and it threw an error.

    I stepped threw the code and after the following:

    AstNode root;

    root = _compiler.Parse("orthostatic");

    "root" is coming back null, so when I pass it to:

    string myString;

    myString = SearchGrammar.ConvertQuery(root, SearchGrammar.TermType.Inflectional);

    the switch statement in the ConvertQuery method throws the error.

    Before I do a "quick" workaround for nulls, I wanted to see if you had any insight on what might be causing this.

    I've been using the search for quite a while and this is the only one that has caused an issue so far.:crazy:

    Btw, I'm using version 1.0 of the Irony.dll.

    Thanks!

    I would bet you dollars to donuts the parser is picking out the keyword "or" from the front of the word. We had a similar issue previously that I believe was addressed on the message board here. Roman has also updated Irony and the search grammar over at the codeplex Irony project website. You might want to check that out, as his update might resolve the issue as well.

    Thanks

    Michael

  • james.spibey (3/30/2009)


    Hi Mike,

    Thanks for responding. Unfortunately that didn't seem to do it. I notice that there is an explicit check for the "-" character in the "AndExpression" case statement in ConvertQuery and also a reference to the ExcludeOperator in the defintion of the AndOperator. I wonder does something need to be added to these also to make it recognise the "NOT" keyword? Or could it be a stopword issue?

    Thanks

    James

    Yes, the - character is explicitly checked for in the AndExpression. Stopwords shouldn't be a factor until after it reaches SQL Server in the form of an iFTS query. Adding a new keyword (like "NOT") to the grammar might require some refactoring of the grammar itself.

  • rivantsov (6/11/2009)


    Hi folks

    this is Roman, Irony developer

    Just wanted to let everybody know that I've refactored SearchGrammar to work with latest version of Irony and added it to Sample grammars in download. The original grammar and conversion method had been tweaked a bit, as some of the old functionality in Irony (node bubbling) had been refactored into similar but different thing. As a result, the output tree is different from Mike's version, but the query conversion result should be the same.

    I didn't try the new grammar against database, so cannot garantee 100% bug free status. Any comments, test runs against real SQL database would be appreciated

    thank you

    Roman

    Hi Roman, thanks! I'll download and test soon.

    Michael

  • cghersi (7/15/2009)


    Dear Michael,

    first of all, thank you very much for your articles, always very interesting for me.

    Let's talk about your article related to the FTS features of SQL server (Google-like search).

    I have ths problem: I would like to emulate the FTS feature inside a C# application, without calling my SQL server DB.

    That is, I simply want to have a routine taking in input a text string and a search string, and returning true or false if the search string matches or not the text string.

    Do you know anything about this issue?

    Perhaps someone has already solved my problem...

    Please, let me know something, I'm struggling about this dilemma!!!

    my email address is: cghersi@foldier.com

    thank you very much!!

    Cristiano

    Hi Cristiano,

    iFTS is actually a SQL Server feature, and duplicating its functionality would require a huge investment of time and effort on your part. There is a .NET project (Lucene.Net) that provides full-text search functionality from within a .NET program: http://incubator.apache.org/projects/lucene.net.html. I haven't used this myself, but Hilary Cotter has recommended Lucene (Java version) in the past. If you don't need full-text search functionality, but just simple string comparison matching, the task gets a lot easier.

    Thanks

    Michael

  • I am perhaps loosing my mind, but the conclusion says code can be downloaded from the link above. But the only link I can find is to http://www.codeplex.com/irony, not the source for the article itself.

    Thanks

    Derek

  • derek (7/24/2009)


    I am perhaps loosing my mind, but the conclusion says code can be downloaded from the link above. But the only link I can find is to http://www.codeplex.com/irony, not the source for the article itself.

    Thanks

    Derek

    When it was first published the link was at the top of the article (as well as the bottom). The link at the top isn't there any more, but the link at the bottom still appears to be active (it's below the article).

    Thanks

    Michael

  • Confucius247 (10/7/2008)


    Google is not defined by its syntax, which few users use, but by the relevancy of its results.

    Relevancy is not defined by such comments, but whether you understood ranks in FTS.

  • Thanks for the excellent article. I spent a week or so writing a mammoth t-sql scalar function to perform these exact Google-operator interpretations - and then once I'd finished, a colleague sent me the link to your article. 😛 Quite spooky really.

    Cheers

    David

    Cheers,
    Dave

  • Mike C (10/15/2008)


    According to Ivan this line is causing the issue:

    Term.Priority = Terminal.LowestPriority;

    If you remove this he says it should fix the problem. He says there's a deeper issue that he needs to address concerning this operation, but this simple fix should resolve the issue for this simple grammar.

    Thanks

    Mike C

    I've just downloaded the latest version of Irony (31155) and I have the same issue reported last year where terms that start with "or" (e.g. orange) and "and" (e.g. andes) get confused by the parser. Is there a fix for this latest version, because the fix you mentioned above won't work because that line of code doesn't not exist anymore.

    Thanks!

    Aa

  • Mike C (11/9/2008)


    Hi panteluke,

    I think you need to modify the grammar to handle international characters. Right now I use a pretty narrow definition for terminals: basically letters A-Z, a-z, numbers, and a few punctuation symbols. Try modifying the grammar to include additional Greek characters in the definition for terminals.

    If that doesn't work I would recommend checking with Roman to make sure this version of Irony is handling international/Unicode characters properly. My guess is that Irony can handle Unicode/international characters, but check the CodePlex site to verify that; I just didn't include them as part of the definition of a terminal in this example. The Irony website is at http://www.codeplex.com/irony. The full source and binaries for the most recent version of Irony is available there.

    Thanks

    Mike C

    Mine won't allow French characters - where do I modify the terminal definition? I can't find the appropriate code. Thanks!

  • as for international characters, Irony does support this. You can add Unicode categories to identifier terminal definition; look at c# sample grammar. c# in fact allows using national characters in identifiers! -I didn't know that until I started with c# sample grammar and had to read carefully c# spec.

    As for the issue with confusing "OR" operator with prefixes in "orthogonal" - I'm looking into this issue, the fix will be there in a couple of days. Note, this will be in latest "source" version, so you should use "SearchGrammar" from samples.

    In general, for questions like these, please feel free to post them in Irony's discussions page, it would be easier for me to track them there. Rest assured I'm here to answer all your questions and to help over all issues.

    thanks for your interest in Irony!

Viewing 15 posts - 61 through 75 (of 166 total)

You must be logged in to reply to this topic. Login to reply