August 8, 2016 at 5:47 pm
Greetings, all.
I have not used full-text indexing. If we have a title column (nvarchar(1000)) & a description column (nvarchar(max)), and we expect that many web users will be querying these by words contained in the title or description, would using a full-text index be a good performance choice? Or is using LIKE good enough for this (since other search fields will be integer keys, etc.)? Is full-text indexing intended more for when searching entire documents?
Thanks,
Randy
August 8, 2016 at 7:32 pm
The issue with using LIKE is that it will do a complete table or index scan checking every row, unless you are looking for a title or description that starts with certain characters.
LIKE '%Abs%' -- always scans
LIKE 'Abs%' -- can do a seek, given an appropriate index.
(For more detailed explanations about this, you can search 'SARGABILITY'.)
Alternatively you could parse those strings into individual words and create a search table that would avoid the scan problem, but that is probably a lot more work than just using the full-text index functionality.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 9, 2016 at 8:15 am
As with any index, it will depend partly on how many records you are likely to have. If you only have about 10 records in the table, then LIKE is probably going to be fine. If you have thousands, you will probably begin to see a difference using the full text search functions. Make up a test script, and see where you land on the spectrum. Test it once in a while, as you will no doubt be adding new data. Eventually, you will want to go to full text indexing. Probably just a matter of time.
August 9, 2016 at 9:34 pm
Alternatively you could parse those strings into individual words and create a search table that would avoid the scan problem, but that is probably a lot more work than just using the full-text index functionality.
If you have a permanent tally table available you could let an indexed view do a lot of the work for you.
First the tally table:
WITH DummyRows(V) AS
(
SELECT 1 FROM (VALUES -- 100 Dummy Rows
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($)) t(N)
)
INSERT dbo.tally
SELECT TOP (100000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 -- Up to 100,000,000 rows:
FROM DummyRows a CROSS JOIN DummyRows b CROSS JOIN DummyRows c CROSS JOIN DummyRows d;
ALTER TABLE dbo.tally ADD CONSTRAINT pk_tally PRIMARY KEY(N) WITH FILLFACTOR = 100;
ALTER TABLE dbo.tally ADD CONSTRAINT uq_tally UNIQUE(N);
Now for some sample data:
CREATE TABLE dbo.articles
(
articleID int identity NOT NULL,
articleTitle nvarchar(1000) NOT NULL,
articleDescription nvarchar(4000) NOT NULL
);
INSERT dbo.articles(articleTitle, articleDescription)
VALUES
('An Article',
'Video provides a powerful way to help you prove your point. When you click Online Video,
you can paste in the embed code for the video you want to add. You can also type a keyword
to search online for the video that best fits your document.'),
('Some other Article','To make your document look professionally produced, Word provides
header, footer, cover page, and text box designs that complement each other. For example,
you can add a matching cover page, header, and sidebar. Click Insert and then choose the
elements you want from the different galleries.');
Now we're going to take the logic from Jeff Moden's DelimitedSplit8K[/url] and split the articleDescription column using spaces as a delimiter. Note that the permanent tally table is required because cteTally tables can't be included in an indexed view. Before doing the split we're going to replace punctuation and whitespace characters in the articleDescription with spaces.
Here's the view:
CREATE VIEW dbo.indexedWordSplit_articles WITH SCHEMABINDING AS
SELECT
w.articleID,
w.articleTitle,
position = t.N+1,
word = CAST(RTRIM(LTRIM(SUBSTRING(
w.articleDescription,
t.N+1,
ISNULL(NULLIF(CHARINDEX(N' ',
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
w.articleDescription,'.',' '),',',' '),';',' '),'?',' '),'!',' '),':',' '),
CHAR(10),' '),CHAR(13),' '),CHAR(9),' '),t.N+1),0
)-t.N-1,4000)))) AS nvarchar(100))
FROM dbo.tally t
CROSS JOIN dbo.articles w
WHERE N <= (ISNULL(DATALENGTH(w.articleDescription),0))
AND (SUBSTRING(w.articleDescription,t.N,1) = ' ' OR N = 0)
AND CAST(RTRIM(LTRIM(SUBSTRING(
w.articleDescription,
t.N+1,
ISNULL(NULLIF(CHARINDEX(N' ',
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
w.articleDescription,'.',' '),',',' '),';',' '),'?',' '),'!',' '),':',' '),
CHAR(10),' '),CHAR(13),' '),CHAR(9),' '),t.N+1),0
)-t.N-1,4000)))) AS nvarchar(100)) <> ' ';
Next we add a clustered index to do the "split":
-- a clustered index that does the "split"
CREATE UNIQUE CLUSTERED INDEX cl_indexedWordSplit_articles
ON dbo.indexedWordSplit_articles(articleID, position);
Now we can add nonclustered indexes as needed. This one will be helpful for my example:
CREATE NONCLUSTERED INDEX nc_indexedWordSplit_articles_wordSearch
ON dbo.indexedWordSplit_articles(word, articleID) INCLUDE (articleTitle);
Note that this solution is great for the kind of select statements I'm about to demonstrate but will have an impact on data modification against the base tables.
Now, consider this query:
-- Routine for word searching
DECLARE @SearchString nvarchar(1000) = 'video';
SELECT articleID, @SearchString
FROM dbo.indexedWordSplit_articles WITH (NOEXPAND) -- Index hints should always be used with caution; this one is critical.
WHERE word = @SearchString
GROUP BY articleID;
We just searched the aforementioned Article column using this type of logic:
SELECT articleID
FROM articles
WHERE articleDescription LIKE '%'+@SearchString+'%'
But we did so in a way where the optimizer leverages an index seek against a nonclustered index. :w00t:
-- Itzik Ben-Gan 2001
August 10, 2016 at 12:28 am
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply