July 16, 2007 at 6:46 am
Hmmm. will think a bit more. Do you have some sample data please (as insert statements if possible)?
The reason I suggested Full Text is that it does automatically what you're trying to do by hand. Create lists of words and occurences and allows quick and easy searching
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 16, 2007 at 7:06 am
In this example I have 1 x url http://www.anywebsite.com and on that page it has two words, 'hello' and 'goodbye'
If I do a SQL query for both words i.e. AND, it should return just this page, not any other pages that may be listed with either of these pages! Thanks.
CREATE
USE
[SearchEngine]
GO
/****** Object: Table [dbo].[tblOccurrence] Script Date: 07/16/2007 14:05:54 ******/
SET
ANSI_NULLS ON
GO
SET
QUOTED_IDENTIFIER ON
GO
CREATE
TABLE [dbo].[tblOccurrence](
[occurrence_id] [int]
IDENTITY(1,1) NOT NULL,
[word_id] [int]
NOT NULL,
[page_id] [int]
NOT NULL,
[datetime] [datetime]
NOT NULL,
CONSTRAINT [PK_tblOccurrence] PRIMARY KEY CLUSTERED
(
[occurrence_id]
ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
ON [PRIMARY]
CREATE
TABLE [dbo].[tblWord](
[word_id] [int]
IDENTITY(1,1) NOT NULL,
[word] [nvarchar]
(50) COLLATE Latin1_General_CI_AS NOT NULL,
[datetime] [datetime]
NOT NULL,
CONSTRAINT [PK_tblWord] PRIMARY KEY CLUSTERED
(
[word_id]
ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
ON [PRIMARY]
CREATE
TABLE [dbo].[tblPage](
[page_id] [int]
IDENTITY(1,1) NOT NULL,
(50) COLLATE Latin1_General_CI_AS NOT NULL,
[datetime] [datetime]
NOT NULL,
CONSTRAINT [PK_tblPage] PRIMARY KEY CLUSTERED
(
[page_id]
ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
ON [PRIMARY]
DATA
INSERT INTO [SearchEngine].[dbo].[tblWord]
([word])
VALUES
(<'hello', nvarchar(50),>
INSERT
INTO [SearchEngine].[dbo].[tblWord]
([word])
VALUES
(<'goodbye', nvarchar(50),>
INSERT
INTO [SearchEngine].[dbo].[tblOccurrence]
([word_id]
,[page_id])
VALUES
(<1, int,>
,<1, int,>
INSERT
INTO [SearchEngine].[dbo].[tblOccurrence]
([word_id]
,[page_id])
VALUES
(<2, int,>
,<1, int,>
INSERT
INTO [SearchEngine].[dbo].[tblPage]
VALUES
(<'http://www.anywebsite.com', nvarchar(50),>
July 16, 2007 at 8:24 am
I had to write something once that had to search either for words in any order inside a string, or the exact string. What I had to search was the entire library of tapes for one of the largest Radio companies in the US.... i.e. millions of records, many millions of words to search.
(there are good reasons why I did not use full text search for this which I am not going into)
What I ended up doing was create a table of words, one record for each word in each broadcast.... With that could do an index on each word so no need for a like search.
To find the record(s) I needed in the search would search the list of words, then count that search result to see if it matched the count of how many words the user had entered to search for. Eliminate those that had a count that was less than it, and voila had the "anywhere" search result. If they wanted to search the exact string, I simply ran a where clause on the potential resultset from an "anywhere" search with the exact string.
In SQL 2005 this would have been a lot easier since I could have used varchar(max) to store the string instead of text.
I put some limits on the resultset, i.e. max of 200 returned, but they could further refine the query after it ran the first time to eliminate it. The query could also limit to what type of tapes, what year the recording was made and a host of other information.
The search never took more than 4 seconds no matter how complicated a search they wanted to do.
July 16, 2007 at 8:02 pm
sorry mate if i didn't clearly understood the problem.. i guess i was barking at the wrong tree.. heheheh btw,that solution that i proposed is well tested.. it is used for some of our search functionality with multiple parameters anyway, i think Anders Pedersen' solution is suited for your prob.. though you must ensure that the records(words) stored in the db are static.. it seems reasonable to me, and i have some thoughts of presenting that to my tech lead..
July 17, 2007 at 3:03 am
Hi Christoper,
No problem what so ever, I probably explained myself badly!
I have read Anders reply many times and still trying to get a picture of how it will work!!
Scenario
If I had a web page that has three words on it for example:
http://www.mywebsite.com with words: 'race' 'london' '2007'
I need to record the web address in my table:
tblPage
page_id = 1 (Auto increment)
url = http://www.mywebsite.com
date = Date added
I then need to record the three unique words in my table:
tblWord
word_id = 1 (Auto increment)
word = 'race' (Indexed NO duplicates allowed)
date = Date added
word_id = 2
word = 'london'
date = Date added
word_id = 3
word = '2007'
date = Date added
I then need to record the occurrence of each word:
tblOccurrence
occurrence_id = 1 (Auto Increment)
word_id = 1
page_id = 1
date = Date added
occurrence_id = 2
word_id = 2
page_id = 1
date = Date added
occurrence_id = 3
word_id = 3
page_id = 1
date = Date added
The dates are to be used to purge old indexed pages.
In this example the words only occurred once on this page, however, in reality they could appear several times.
QUERY
So if the user searches for 'race london 2007' as a complete string, I need to perform a logical AND to pull back this web address, however, doing a logical AND does not and willl not work as it is will never be true, this is my puzzle!
WILL NOT WORK: WHERE (w.word='race') AND (w.word='london') AND (w.word='2007')
Doing a logical OR will work but it will return the page even if the user just searches for teh word 'race'
Does that make sence, or am I coming about this from the wrong angle?
July 17, 2007 at 7:25 am
Actualy the words are not unique. They would be unique for a mathing parent ID. And yes my solution was for fairly static data, new records where added, so I would after a new record was added add the words to the words table.
In pseudo code.... my tables looked something like this:
Table: Main Record
RecordID
StringToSearchIn
Bunch of other columns used to limit the search (for example we had TypeOfRecording, Source, TapeType, Author, Actors etc. Most of these where FK'ed out).
Table: Words
RecordID (FK'ed to above)
Word
(I filtered out common words like 'a', 'the', etc, both in the Words table and from the search string passed to me)
If I was searching for records that had the following words in it in any order: "London, Race, 2008" I would search for RecordIDs in the Words table that had those words in it. If all 3 where found it would return that recordID to the front end, with the entire StringToSearchIn.
If I was searching for "The London 10K Race in 2008" I would search for records that had "London, 10K, Race, 2008", whatever records that had those 4 words in it would then be returned tothe stored procedure, which would then run select from Main Record table where StringToSearchIn like "%The London 10K Race in 2008%" to only limit the search to records that matched the exact string asked for.
As I said this worked in about 3-4 seconds, and the "server" it ran on was a normal desktop computer with enough hard drive space.
July 17, 2007 at 9:52 am
I got you now, LOL!!
I have now modified my tblPage to include a keyword field ntext(), added the words in here also as well as maintaining a unique list of words.
QUERY1
select word_id, word from tblWord where word='race' or word='london'
go
Get my unique word_id's and pass to next query...
QUERY2
select url, count(*) as occurrences from tblPage inner join tbloccurrence on tblPage.page_id = tblOccurrence.page_id
where ((word_id=2262) or (word_id=2950)) and ((keywords like '%london%') and (keywords like '%race%'))
group by url
order by occurrences desc
go
Works great, many thanks, much appreciated
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply