June 27, 2008 at 7:53 am
Hi, In my job we need to create a search system on nearly 60 million people. The search must be done on a field of Full_Name which contains the complete name of a person.
We have done a little search system which transforms each element of name into a number and then returns only the records that are more relevant to the particular search. Everything goes well with 11 millions but when I jump to the whole search universe the search time can get to a maximum of 5 minutes.
Additionaly I have tried already Full-Text Index Search, but it isn't as fast as required.
I.E: A sear for a name with Full-Text Index can last 1 minute on 11 million people. Meanwhile our process can last up to 25 seconds on the same 11 million people.
I've been thinking that there must be a way to order the data but I haven't figured out on how to do it.
Any idea would be great.
Thanks.
PS: Currently we are running the system with 11 millions on a central server and 8 millions on 6 desktop servers.
PS: All the system is running on SQL Server 2000
June 27, 2008 at 8:27 am
I assume the column has a non clustered index defined?
June 27, 2008 at 9:58 am
Yes, It has one
June 27, 2008 at 10:04 am
Is this a LIKE search?
I'm not sure you'll get faster than full-text. I suppose you could try some type of hash, but that might not work better. Perhaps breaking into first and last and searching two indexes separately might be faster, but that's a lot of data.
I'd try pinging some of the FTS people
http://jtkane.spaces.live.com/blog/cns!33DA9EBF0ACF8558!264.entry
June 27, 2008 at 10:23 am
It's a freetext search
A search query is similar to one of this
SELECT TOP 100 *
FROM dbo.People
WHERE freetext(Full_Name, 'sanders koernic john')
The problem with this one is that I'm getting the sanders, the koernic an the john separate without an order of relevance (average time: 17 seconds)
Another thing I tried was something like this
SELECT TOP 100 *
FROM dbo.People
WHERE freetext(Full_Name, 'sanders')
AND freetext(Full_Name, 'koernic')
AND freetext(Full_Name, 'john')
average time 27 seconds
This is only an example of what I'm trying to do.
Thank for answering
I didn't understand what do you mean with: "Perhaps breaking into first and last and searching two indexes separately might be faster, but that's a lot of data."
June 27, 2008 at 3:53 pm
Have you partitioned the table?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 27, 2008 at 3:56 pm
What do you mean exactly?
I'm thinking that maybe you talk about "slicing" (horizontally) in several tables but remaining on the same server?
June 27, 2008 at 4:00 pm
Yes.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 27, 2008 at 6:50 pm
Just a reminder that the freetext parameter needs to be defined as national varchar, so use
SELECT TOP 100 *
FROM dbo.People
WHERE freetext(Full_Name, N'sanders koernic john')
As an alternative, have you looked at Google's BIGTABLE Solution?
There are also some open source projects underway that support search of text with extremely large volumes.
SQL = Scarcely Qualifies as a Language
June 30, 2008 at 2:54 pm
Thanks for the replies, I'm trying to figure out the way to organize the data to achieve the speed needs.
Also thanks for posting the Google's BIGTABLE, currently I'm reading, and viewing some interesting papers about it.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply