January 9, 2012 at 9:36 am
Ninja's_RGR'us (1/9/2012)
Bumping is always frowned upon here (unless the next day or something like that).I didn't think this was done neither did Paul.
Very sorry then. First time I ever bumped a thread here.
SQL Kiwi (1/9/2012)
aurato (1/9/2012)
This post cleared up basically all of my remaining confusion. Thank you very much.You're welcome.
aurato (1/9/2012)
The LIKE query without the '-' works fine. Thanks again. I'm going to retry my procedure and omit the hyphen to see if that's what was causing the whole thing.I'm thinking the issue with the hyphen (or dash) character has to do with interpreting it in a culture-aware fashion. I'll post back if I get something more concrete on this. By the way, unless you really need Unicode with full language and culture aware comparisons, I would consider using VARCHAR instead of NVARCHAR - SQL collations typically compare and sort several times faster than Unicode collations. Faster still are the binary collations, but these are not suitable unless your LIKE comparisons can tolerate a strict byte-by-byte comparison (e.g. 'q' = 'Q'). Obviously, changing data type and/or collation would be a big job, given the size of the data, but if it is appropriate for you, it could be a big performance win. Another idea would be to hash words to integers, and seek on integer hash values.
I probably don't need Unicode. But I have a boss who insists upon it. One day it might be useful, who knows. The other suggestions are very interesting and I'll look into them if for nothing else educational purposes.
Full-text indexing on Azure would be great though. It'd save me about 30 gigs of lookup tables to make website searches reasonable.
January 9, 2012 at 9:43 am
aurato (1/9/2012)
I probably don't need Unicode. But I have a boss who insists upon it. One day it might be useful, who knows. The other suggestions are very interesting and I'll look into them if for nothing else educational purposes.
I should be clear about it: collation is (somewhat) independent of Unicode versus ANSI encoding. It is possible to use binary, SQL, and Windows collations with both, and the difference can be astonishing (20x or so).
January 9, 2012 at 11:19 am
Aaron Aardvark (1/9/2012)
aurato (1/9/2012)
I probably don't need Unicode. But I have a boss who insists upon it. One day it might be useful, who knows. The other suggestions are very interesting and I'll look into them if for nothing else educational purposes.I should be clear about it: collation is (somewhat) independent of Unicode versus ANSI encoding. It is possible to use binary, SQL, and Windows collations with both, and the difference can be astonishing (20x or so).
I appear to have a SQL collation (SQL_Latin1_General_CP1_CI_AS) which I guess is the default.
January 9, 2012 at 12:00 pm
This is the procedure I'm trying to use. I attempted this to join to a field that already had a '%' for the LIKE in it. It didn't improve performance very much.
Note: DelimitedSplit8K is a copy-paste from Jeff Moden's article on it[/url].
CREATE PROCEDURE [dbo].[DescriptionSearchLike]
(
@Description nvarchar(4000)
)
AS
BEGIN
DECLARE @Keywords TABLE
(
Id int not null,
Keyword nvarchar(4000) not null
)
INSERT INTO @Keywords (Id, Keyword)
SELECT Id, Keyword + '%' FROM DelimitedSplit8K(@Description, ' ')
SELECT Manufacturer, PartNumber, Description
FROM Items
WHERE Id IN
(
SELECT TOP 1000 L.ItemId
FROM @Keywords K
INNER JOIN DescriptionKeywordLookup L
ON L.Keyword LIKE K.Keyword
GROUP BY L.ItemId
ORDER BY COUNT(*) DESC
)
END
GO
EDIT: I feel the need to point out the Count(*) and grouping etc happens very quickly with an =
EDIT2: DescriptionSearch, which is this procedure with the LIKE replaced by an =, finishes this query in about 2 seconds, as opposed to about 50 seconds in DescriptionSearchLike
EDIT3: Replacing the table variable with a #temptable seemed to hurt performance by quite a lot.
EDIT4: I forgot query plans again. I'll add them soon.
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply