Full-Text Indexing & Searches

  • Greetings, all.

    I have not used full-text indexing. If we have a title column (nvarchar(1000)) & a description column (nvarchar(max)), and we expect that many web users will be querying these by words contained in the title or description, would using a full-text index be a good performance choice? Or is using LIKE good enough for this (since other search fields will be integer keys, etc.)? Is full-text indexing intended more for when searching entire documents?

    Thanks,

    Randy

  • The issue with using LIKE is that it will do a complete table or index scan checking every row, unless you are looking for a title or description that starts with certain characters.

    LIKE '%Abs%' -- always scans

    LIKE 'Abs%' -- can do a seek, given an appropriate index.

    (For more detailed explanations about this, you can search 'SARGABILITY'.)

    Alternatively you could parse those strings into individual words and create a search table that would avoid the scan problem, but that is probably a lot more work than just using the full-text index functionality.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • As with any index, it will depend partly on how many records you are likely to have. If you only have about 10 records in the table, then LIKE is probably going to be fine. If you have thousands, you will probably begin to see a difference using the full text search functions. Make up a test script, and see where you land on the spectrum. Test it once in a while, as you will no doubt be adding new data. Eventually, you will want to go to full text indexing. Probably just a matter of time.

  • Alternatively you could parse those strings into individual words and create a search table that would avoid the scan problem, but that is probably a lot more work than just using the full-text index functionality.

    If you have a permanent tally table available you could let an indexed view do a lot of the work for you.

    First the tally table:

    WITH DummyRows(V) AS

    (

    SELECT 1 FROM (VALUES -- 100 Dummy Rows

    ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),

    ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),

    ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),

    ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),

    ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($)) t(N)

    )

    INSERT dbo.tally

    SELECT TOP (100000)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 -- Up to 100,000,000 rows:

    FROM DummyRows a CROSS JOIN DummyRows b CROSS JOIN DummyRows c CROSS JOIN DummyRows d;

    ALTER TABLE dbo.tally ADD CONSTRAINT pk_tally PRIMARY KEY(N) WITH FILLFACTOR = 100;

    ALTER TABLE dbo.tally ADD CONSTRAINT uq_tally UNIQUE(N);

    Now for some sample data:

    CREATE TABLE dbo.articles

    (

    articleID int identity NOT NULL,

    articleTitle nvarchar(1000) NOT NULL,

    articleDescription nvarchar(4000) NOT NULL

    );

    INSERT dbo.articles(articleTitle, articleDescription)

    VALUES

    ('An Article',

    'Video provides a powerful way to help you prove your point. When you click Online Video,

    you can paste in the embed code for the video you want to add. You can also type a keyword

    to search online for the video that best fits your document.'),

    ('Some other Article','To make your document look professionally produced, Word provides

    header, footer, cover page, and text box designs that complement each other. For example,

    you can add a matching cover page, header, and sidebar. Click Insert and then choose the

    elements you want from the different galleries.');

    Now we're going to take the logic from Jeff Moden's DelimitedSplit8K[/url] and split the articleDescription column using spaces as a delimiter. Note that the permanent tally table is required because cteTally tables can't be included in an indexed view. Before doing the split we're going to replace punctuation and whitespace characters in the articleDescription with spaces.

    Here's the view:

    CREATE VIEW dbo.indexedWordSplit_articles WITH SCHEMABINDING AS

    SELECT

    w.articleID,

    w.articleTitle,

    position = t.N+1,

    word = CAST(RTRIM(LTRIM(SUBSTRING(

    w.articleDescription,

    t.N+1,

    ISNULL(NULLIF(CHARINDEX(N' ',

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    w.articleDescription,'.',' '),',',' '),';',' '),'?',' '),'!',' '),':',' '),

    CHAR(10),' '),CHAR(13),' '),CHAR(9),' '),t.N+1),0

    )-t.N-1,4000)))) AS nvarchar(100))

    FROM dbo.tally t

    CROSS JOIN dbo.articles w

    WHERE N <= (ISNULL(DATALENGTH(w.articleDescription),0))

    AND (SUBSTRING(w.articleDescription,t.N,1) = ' ' OR N = 0)

    AND CAST(RTRIM(LTRIM(SUBSTRING(

    w.articleDescription,

    t.N+1,

    ISNULL(NULLIF(CHARINDEX(N' ',

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    w.articleDescription,'.',' '),',',' '),';',' '),'?',' '),'!',' '),':',' '),

    CHAR(10),' '),CHAR(13),' '),CHAR(9),' '),t.N+1),0

    )-t.N-1,4000)))) AS nvarchar(100)) <> ' ';

    Next we add a clustered index to do the "split":

    -- a clustered index that does the "split"

    CREATE UNIQUE CLUSTERED INDEX cl_indexedWordSplit_articles

    ON dbo.indexedWordSplit_articles(articleID, position);

    Now we can add nonclustered indexes as needed. This one will be helpful for my example:

    CREATE NONCLUSTERED INDEX nc_indexedWordSplit_articles_wordSearch

    ON dbo.indexedWordSplit_articles(word, articleID) INCLUDE (articleTitle);

    Note that this solution is great for the kind of select statements I'm about to demonstrate but will have an impact on data modification against the base tables.

    Now, consider this query:

    -- Routine for word searching

    DECLARE @SearchString nvarchar(1000) = 'video';

    SELECT articleID, @SearchString

    FROM dbo.indexedWordSplit_articles WITH (NOEXPAND) -- Index hints should always be used with caution; this one is critical.

    WHERE word = @SearchString

    GROUP BY articleID;

    We just searched the aforementioned Article column using this type of logic:

    SELECT articleID

    FROM articles

    WHERE articleDescription LIKE '%'+@SearchString+'%'

    But we did so in a way where the optimizer leverages an index seek against a nonclustered index. :w00t:

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • The article Searching for Strings in SQL Server Databases" could also be interesting, see

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply