December 13, 2011 at 8:27 am
Hi Gurus,
I would like to have your advice on the following "idea"...
Background:
In order to ease the understanding, let's consider a site dedicated to house renting for example.
Owners would define their house characteristics like (price, location, environment...).
In order to enter all data, the owner is invited to use dropdown lists for some special fields (environment, house type, ...).
Some characteristics are mandatory, some others are not. Also, the owner can enter description as free text.
In order to record a house entry, I use one dedicated table (called tbl_houses) which makes links to other tables (list of characteristics).
Also free text as well as "translated characteristics" are stored in a table called tbl_description.
Objectives:
I want to let users the possibility to query the database as follows:
* they can fill in any criteria (linked to house characteristics) but they are not obliged to
* they can enter some keywords at the same time.
Situation:
Everything works ok today except that once a characteristics is set by the user and does not correspond to data previously entered by the owner, records are not retrieved,
which is normal.
Ideas:
I would like to give a score to criteria matching and sort the results by score descending.
In other words, let's take an example:
tbl_houses:
house_idbigint identity(1,1),
owner_idbigint,-- refers to tbl_owners
data_0int,
data_1int,
data_2int
tbl_description:
description_idbigint identity(1,1),
house_idbigint, -- refers to tbl_houses
description_textvarbinary(MAX)-- stores the descriptions in HTML format due to other constraints I have, as multilinguism
Today, queries are dynamically built, based on the query criteria entered by the users but they are of the following format:
SELECT H.house_id, O.owner_name, H.data_0, H.data_1, H.data_2
FROM tbl_houses H
INNER JOIN tbl_owners O ON H.owner_id = O.owner_id
INNER JOIN (
SELECT DISTINCT TD.house_id
FROM tbl_description TD
INNER JOIN CONTAINSTABLE(dbo.tbl_description, [description_text], @freetext) AS ftt
ON TD.description_id = ftt.
) S
ON H.house_id = S.house_id
WHERE
H.data_0 = @data_0_value
AND
H.data_1 = @data_1_value;
Ideal objective:
I would like the query to return the "top ranked" results at first, followed by "lower ranked" ones.
I was thinking about the following:
1) If a criteria is set, to give it a High score if matching but a lower if not
2) If free text keywords are entered, to give a High rank for descriptions matching the keywords, a lower rank for the other ones.
3) Not exclude any results but give them a lower rank.
Therefore my idea was (since I am dynamically building the SQL statement):
1) If a criteria is defined by the user to check if the data is defined at the record level.
If yes, give a score depending on the matching, otherwise if not defined by the owner, give a score = 1 (for doubt)
2) Same would apply to keywords
3) Some exclusion criteria could exist
This could be translated as follows (considering the fact that if a criteria is not defined by the owner, the data = -1 at the database level)
DECLARE @topRankdecimal;
SET @topRank = (SELECT MAX(RANK) FROM CONTAINSTABLE(dbo.tbl_description, [description], @in_freetext, LANGUAGE @in_language_LCID));
WITH query_fts AS (
SELECT DISTINCT TD.house_id, ftt.[RANK] AS FT_RANK
FROM tbl_description TD
INNER JOIN CONTAINSTABLE(dbo.tbl_description, [description_text], @freetext) AS ftt
ON TD.description_id = ftt.
)
SELECT H.house_id, 0.owner_name, H.data_0, H.data_1, H.data_2,
Score = (
CASE WHEN H.data_0 = -1 THEN 1
ELSE 100 * (CASE WHEN H.data_0 = @data_0_value THEN 1 ELSE 0 END)
END
+CASE WHEN H.data_1 = -1 THEN 1
ELSE 100 * (CASE WHEN H.data_1 = @data_1_value THEN 1 ELSE 0 END)
+CASE WHEN FTS_FT_RANK IS NULL THEN 0
ELSE 100 * CAST(FTS.FT_RANK as DECIMAL)/@topRank) END
)
FROM tbl_houses H
LEFT JOIN query_fts FTS
ON FTS.house_id = H.house_id
INNER JOIN tbl_owners O ON H.owner_id = O.owner_id
WHERE H.exclusion_criteria = @exclusion_criteria_value
ORDER BY Score DESC;
Do you know any other technique that would offer me this?
Would you have any advice?
Many thanks in advance,
December 13, 2011 at 8:54 am
If I've understood you correctly, I think you could go about your problem in the following way. I'm talking at a conceptual level, as it would take too much time to put this into your example - but I'm sure you can do that yourself. Also, you'd obviously want to replace all the "SELECT *" references.
Example: keywords are 'detached', 'centralheating'
SELECT tbl_houses.*,CASE WHEN tkw1.house_id IS NULL THEN 0 ELSE 1 END + CASE WHEN tkw2.house_id IS NULL THEN 0 ELSE 1 END AS score
FROM tbl_houses th
LEFT JOIN (SELECT * FROM tbl_keywords WHERE keyword='detached') tkw1 ON tkw1.house_id=th.house_id
LEFT JOIN (SELECT * FROM tbl_keywords WHERE keyword='centralheating') tkw2 ON tkw2.house_id=th.house_id
ORDER BY score DESC
Does that get you some of the way?
December 13, 2011 at 9:18 am
Thanks for your reply but the example you gave does not give any score if keywords are present or not and, I would like to weight the results, also by the presence of the keywords...
December 13, 2011 at 9:45 am
Hi again.
It would give you a score if the keywords were not present - the LEFT JOIN means that you get a row returned, and the CASE statement allows you to set the score for when the keyword isn't present (I've used 0, but you don't have to).
I'm not sure what else you're after - could you be clearer?
December 13, 2011 at 11:06 am
That solution would require a large amount of dynamic sql with potentially hundreds of joins to the same table. I have an idea that would work using a string splitter. I will see if I can put it together.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 13, 2011 at 11:14 am
Can you put together some ddl and sample data? It sure make life a lot easier without having to create your structures and data. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 13, 2011 at 11:22 am
Sean Lange (12/13/2011)
Can you put together some ddl and sample data? It sure make life a lot easier without having to create your structures and data. 😉
A list of the most frequently encountered keywords would be tremendously useful too. It would be a lot simpler to match '3' to a column [Bedroom_count] than to hope to find the string "three bedrooms" in a huge pile of text 😉
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
December 14, 2011 at 8:02 am
Sean Lange (12/13/2011)
That solution would require a large amount of dynamic sql with potentially hundreds of joins to the same table.
Sure, my solution isn't necessarily the best by any means. But hundreds of joins? How many keywords are you going to be searching on for a house? Rightmove, the leading UK property search site, searches on fewer than 10 criteria, and many of those are non-keyword fields like price. As to the large amount of dynamic SQL: if you want to wrap this in a Stored Proc then yes, that's an issue, but it would seem to make more sense to generate this SQL in the webform, in which case there's nothing too complex about it at all.
As I say, my solution may not be brilliant - but knock it for the right reasons! 😉
December 14, 2011 at 8:26 am
integritec (12/14/2011)
Sean Lange (12/13/2011)
That solution would require a large amount of dynamic sql with potentially hundreds of joins to the same table.Sure, my solution isn't necessarily the best by any means. But hundreds of joins? How many keywords are you going to be searching on for a house? Rightmove, the leading UK property search site, searches on fewer than 10 criteria, and many of those are non-keyword fields like price. As to the large amount of dynamic SQL: if you want to wrap this in a Stored Proc then yes, that's an issue, but it would seem to make more sense to generate this SQL in the webform, in which case there's nothing too complex about it at all.
As I say, my solution may not be brilliant - but knock it for the right reasons! 😉
LOL ok so up to 10 dynamic joins. Not putting this in a stored proc is an invitation to sql injection. Pass through queries from a web form are incredibly dangerous. If you do this as a pass through you have to parameterize your query at the very least.
I don't know anything about Rightmove or what the leading UK property companies experience for the typical number of search criteria. I also don't see how that is relevant to the OP here.
I don't know anything about the webform the OP is using but if you read the OP it is entered as freetext. That means the user can type in whatever they want. If the UI allows that much text then the code has to be able to deal with it. The only way to search for all of them is to first parse their entry into words.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 14, 2011 at 8:57 am
Sean Lange (12/14/2011)I don't know anything about Rightmove or what the leading UK property companies experience for the typical number of search criteria. I also don't see how that is relevant to the OP here.
Because he was talking about searching a property database. Seemed a logical comparison to draw.
As to doing 10 JOINs - maybe less than ideal, but usually far more efficient than a single function call.
December 14, 2011 at 9:03 am
integritec (12/14/2011)
Agreed on SQL injection - needs to be guarded against. Doesn't make the solution bad - you just have to sanitise your inputs.As to doing 10 JOINs - maybe less than ideal, but usually far more efficient than a single function call.
True enough. "Sanitizing" inputs is a slippery slope though. Much better to parameterize your queries. No matter the level of cleansing you just can't possibly clean it to the point that executing code from an end user is safe. It is easy to do things like inject hex or otherwise obfuscate input to circumvent any kind of cleansing mechanism.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply