April 27, 2009 at 12:22 pm
I'm having some trouble with an issue that has troubled 2 developers with WAY more experience before me. 😛
A little background: A user can have pieces of equipment in their profile. Each piece of equipment can have up to 500 skills associated with it. Each skill can be rated 0 through 9.
The program that I am currently working on allows someone to search for someone else with a particular set of skills. So, you could be looking for someone who has rated themselves between 5 and 9 on a skill for equipment A and B. So, the only people who should show up would be ones who are rated between 5 and 9 for BOTH pieces of equipment A and B. You would send in the models, skills, and ratings range.
The (simplified) tables are set up as:
IPR_equipment (equipment_id INT, user_id INT, model_id)
IPR_equipment_to_skill (equipment_id INT, skill_id INT, rating INT)
Sending in a skill_id of 5733 and equipment with the model_ids of 1558 and 1559 and a range of 5 to 9, this is basically what the code would currently run. Table @result would already have a list of user_ids that have met other criteria.
--Get all requested skills for all requested pieces of equipment
INSERT INTO @equipment_skill_temp (model_id, skill_id)
SELECT E.model_id, S.skill_id
FROM IPR_equipment E
JOIN SKL_skill S ON 1 = 1
WHERE E.model_id IN (1558, 1559)
AND S.skill_id IN (5733)
--Loop through all skills
WHILE EXISTS (
SELECT * FROM @equipment_skill_temp
)
BEGIN
--Select first equipment / skill combination
SELECT TOP 1 @temp_model_id = model_id,
@temp_skill_id = skill_id
FROM @equipment_skill_temp
--Remove it from the list
DELETE FROM @equipment_skill_temp
WHERE model_id = @temp_model_id
AND skill_id = @temp_skill_id
--Get users who are rated between 5 and 9 for current equipment / skill
INSERT INTO @equipment_user_temp (user_id)
SELECT DISTINCT user_id
FROM IPR_equipment E
JOIN IPR_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id
WHERE ISNULL(E.model_id, -1) = @temp_model_id
AND skill_id = @temp_skill_id
AND rating BETWEEN 5 AND 9
--Remove people who didn't meet the current equipment / skill rating requirement
DELETE FROM @result
WHERE user_id NOT IN (
SELECT user_id FROM @equipment_user_temp
)
--Reset user list
DELETE FROM @equipment_user_temp
END
--Select remaining users who meet all the criteria
SELECT user_id FROM @result
This what I came up with when trying to get away from a loop.
--Get all requested skills for all requested pieces of equipment
INSERT INTO @equipment_skill_temp (model_id, skill_id)
SELECT E.model_id, S.skill_id
FROM IPR_equipment E
JOIN SKL_skill S ON 1 = 1
WHERE E.model_id IN (1558, 1559)
AND S.skill_id IN (5733)
--Get users who are rated between 5 and 9 for ANY (the issue) equipment / skill
INSERT INTO @equipment_user_temp (user_id)
SELECT DISTINCT user_id
FROM IPR_equipment E
JOIN IPR_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id
JOIN @equipment_skill_temp E2S_temp
ON E2S_temp.model_id = E.model_id
AND E2S_temp.skill_id = E2S.skill_id
WHERE rating BETWEEN 5 AND 9
--Remove people who didn't meet the current equipment / skill rating requirement
DELETE FROM @result
WHERE user_id NOT IN (
SELECT user_id FROM @equipment_user_temp
)
--Select remaining users who meet all the criteria
SELECT user_id FROM @result
The problem is that you could have a 7 on one piece of equipment and 3 on the other and still show up, where the first one would basically treat them as seperate searches and just give you the results that overlap. Previous developers that have worked here have just come up with INCREDIBLY complicated series selects that produce strings to produce other strings to get the results, and it's never even worked like that. At least my loop produces the desired results, but from what I can tell, it's not the most efficient.
This is somewhat of a result of me reading this article[/url], but I didn't want to clutter up that thread with me just not knowing enough about T-SQL. 😉
Thanks for any ideas or even just nudges in the right direction!
April 27, 2009 at 12:52 pm
Looking at the two little schema you supplied, I don't understand why the user ID is associated with a piece of equipment, rather than with skills. I would expect the skills table to look like
UserID/Equipment/Skill/Rating
This can be done, but it would be much better if you could give us a script to create a couple of temp tables with schema for the columns you need, and just a little sample data. Finally, how would you expect the output to look. That way we can supply you with tested solutions rather than just nudges, because we can see if our code produces the expected output from the input you supplied. Thanks.
Bob
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 27, 2009 at 1:21 pm
The reason there are 2 seperate tables is because initially, someone else set up the IPR_equipment table as (equipment_id INT, user_id INT, model_id INT, ratingsmap NVARCHAR(MAX)) where "ratingsmap" would contain all of the ratings for that equipment in one big string like "5733:5|5734:7|5735:8". Currently, "ratingsmap" is still used a few places in our system that I haven't gotten to switch over yet, so both ratings methods are running side by side right now, updating each other with triggers.
Also, equipment_id is unique in IPR_equipment, so 2 people could each have a piece of equipment with a model_id of 1558, but with unique equipment_ids. If a user removes a piece of equipment, only the 1 record in IPR_equipment is deactivated, but the 500 or so records in IPR_equipment_to_skill remain, so if they add that particular model back in, it can just reactivate the 1 record and all the ratings are still there.
I will work on getting a sample script up.
Thanks.
April 27, 2009 at 1:45 pm
Ewww....
Did the same people who wrote the first chunk of code design your DB ??
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 27, 2009 at 1:53 pm
Sadly, the first code block I posted was my vast improvement over the code that was there. Like I said, the original code created strings put together to make a GIANT select that took FOREVER to run. If I remember right, there was a chuck of code (maybe 30 lines long depending on how many skills were selected) for each piece of equipment. And typically, a user might put in 10 pieces of equipment with 10 skills, so then you're looking at code size with a factor of 100x... :crazy:
April 27, 2009 at 2:57 pm
Ok... This is my first attempt to provide a working example of something on a forum, so I apologize in advance if I make this more confusing that what it already was. 😛
Here's what I would consider to be valid sample data.
DECLARE @result TABLE (user_id VARCHAR(50))
DECLARE @ipr_equipment TABLE (equipment_id VARCHAR(50), user_id VARCHAR(50), model_id VARCHAR(50))
DECLARE @skl_skill TABLE (skill_id VARCHAR(50))
DECLARE @ipr_equipment_to_skill TABLE (equipment_id VARCHAR(50), skill_id VARCHAR(50), rating INT)
--
INSERT INTO @result (user_id)
SELECT 'A'
UNION
SELECT 'B'
UNION
SELECT 'C'
--
INSERT INTO @ipr_equipment (equipment_id, user_id, model_id)
SELECT 1001, 'A', '747'
UNION
SELECT 1002, 'A', 'Cessna'
UNION
SELECT 1003, 'B', '747'
UNION
SELECT 1004, 'B', 'Cessna'
UNION
SELECT 1005, 'C', '747'
--
INSERT INTO @skl_skill (skill_id)
SELECT 'Servicing'
UNION
SELECT 'Auto Flight'
--
INSERT INTO @ipr_equipment_to_skill (equipment_id, skill_id, rating)
SELECT 1001, 'Servicing', 2
UNION
SELECT 1001, 'Auto Flight', 2
UNION
SELECT 1002, 'Servicing', 4
UNION
SELECT 1002, 'Auto Flight', 5
UNION
SELECT 1003, 'Servicing', 7
UNION
SELECT 1003, 'Auto Flight', 7
UNION
SELECT 1004, 'Servicing', 9
UNION
SELECT 1004, 'Auto Flight', 9
UNION
SELECT 1005, 'Servicing', 8
UNION
SELECT 1005, 'Auto Flight', 8
--
DECLARE @equipment_skill_temp TABLE (model_id VARCHAR(50), skill_id VARCHAR(50))
DECLARE @temp_model_id VARCHAR(50), @temp_skill_id VARCHAR(50)
DECLARE @equipment_user_temp TABLE (user_id VARCHAR(50))
The current code modified to work with the test data. Only user B is returned with the given values, because it's the only one with ratings between 5 and 9 for Servicing for both 747 and Cessna.
--Get all requested skills for all requested pieces of equipment
INSERT INTO @equipment_skill_temp (model_id, skill_id)
SELECT DISTINCT model_id, skill_id
FROM @ipr_equipment E
JOIN @skl_skill S ON 1 = 1
WHERE model_id IN ('747', 'Cessna') --model_id parameters
AND S.skill_id IN ('Servicing') --skill_id parameters
--Loop through all skills
WHILE EXISTS (
SELECT * FROM @equipment_skill_temp
)
BEGIN
--Select first equipment / skill combination
SELECT TOP 1 @temp_model_id = model_id,
@temp_skill_id = skill_id
FROM @equipment_skill_temp
--
--Remove it from the list
DELETE FROM @equipment_skill_temp
WHERE model_id = @temp_model_id
AND skill_id = @temp_skill_id
--
--Get users who are rated between 5 and 9 for current equipment / skill
INSERT INTO @equipment_user_temp (user_id)
SELECT DISTINCT user_id
FROM @ipr_equipment E
JOIN @ipr_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id
WHERE ISNULL(E.model_id, -1) = @temp_model_id
AND skill_id = @temp_skill_id
AND rating BETWEEN 5 AND 9 --range parameters
--
--Remove people who didn't meet the current equipment / skill rating requirement
DELETE FROM @result
WHERE user_id NOT IN (
SELECT user_id FROM @equipment_user_temp
)
--
--Reset user list
DELETE FROM @equipment_user_temp
END
--
--Select remaining users who meet all the criteria
SELECT user_id FROM @result
What I've tried that doesn't work. This returns users B and C, because user C has one of the two pieces of equipment rated between 5 and 9 for Servicing. That user shouldn't come up, since you need to have both pieces of equipment.
--Get all requested skills for all requested pieces of equipment
INSERT INTO @equipment_skill_temp (model_id, skill_id)
SELECT DISTINCT E.model_id, S.skill_id
FROM @ipr_equipment E
JOIN @skl_skill S ON 1 = 1
WHERE model_id IN ('747', 'Cessna') --model_id parameters
AND S.skill_id IN ('Servicing') --skill_id parameters
--
--Get users who are rated between 5 and 9 for ANY (the issue) equipment / skill
INSERT INTO @equipment_user_temp (user_id)
SELECT DISTINCT user_id
FROM @ipr_equipment E
JOIN @ipr_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id
JOIN @equipment_skill_temp E2S_temp
ON E2S_temp.model_id = E.model_id
AND E2S_temp.skill_id = E2S.skill_id
WHERE rating BETWEEN 5 AND 9 --range parameters
--
--Remove people who didn't meet the current equipment / skill rating requirement
DELETE FROM @result
WHERE user_id NOT IN (
SELECT user_id FROM @equipment_user_temp
)
--
--Select remaining users who meet all the criteria
SELECT user_id FROM @result
Hope that all makes some sense.
April 27, 2009 at 2:59 pm
Thanks 🙂
Loading it up, now.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 27, 2009 at 3:50 pm
I think your problem is solvable using a HAVING clause which ensures that the number of records for each user_id that is returned is equal to the number of model_id values passed to the query in the @modelList table variable. The DISTINCT keyword in the COUNT aggregate function is intended to ensure that any duplicate @ipr_equipment or @ipr_equipment_to_skill rows don't affect the results.
DECLARE @modelList TABLE (model_id varchar(20))
INSERT @modelList VALUES ('Cessna')
INSERT @modelList VALUES ('747')
DECLARE @skill_id varchar(20)
SELECT @skill_id = 'Servicing'
SELECT E.[user_id]
FROM @modelList ML
INNER JOIN @ipr_equipment E ON (ML.model_id = E.model_id)
INNER JOIN @ipr_equipment_to_skill E2S ON (E.equipment_id = E2S.equipment_id)
WHERE (E2S.skill_id = @skill_id)
AND (E2S.rating BETWEEN 5 AND 9)
GROUP BY E.[user_id]
HAVING (COUNT(DISTINCT E.model_id) = (SELECT COUNT(DISTINCT model_id) FROM @modelList))
April 27, 2009 at 4:38 pm
That seems to work. I'll have to wait till tomorrow to be able to fully test it.
A couple things to add into the twist, I only listed model_id to make things more simple, but there's actually a series_id as well. If something is series specific, model_id = -1 and series_id has a value, but if it's only model specific, it's the opposite. To take that into account, would I just need to make it
HAVING (COUNT(DISTINCT E.model_id) = (SELECT COUNT(DISTINCT model_id) FROM @modelSeriesList))
AND (COUNT(DISTINCT E.series_id) = (SELECT COUNT(DISTINCT series_id) FROM @modelList))
Also, there are chapter level skills and subchapter level skills. The subchapter levels are the ratings that are stored and are searched through by the code I've previously provided. If a chapter level skill is sent in, it looks up all the ratings for the subchapters within that chapter and then does a GROUP BY on user_id to get the average for that chapter for each user. I think I should be able to just add model_id and series_id to that GROUP BY and then GROUP BY the user_id with a SELECT around that, right?
April 27, 2009 at 9:23 pm
Sorry to leave this hanging so long. 🙂
I went the same route as Andrew, adding a @model table variable to go with @skl_skill to control the inputs to the query. The following returns the user_ids who have a rating of at least 5 for both skills for both models. This could be extended to add minimum ratings to the skills, so that you could specify different minimum ratings for different skills.
select r.user_id
from @result r
join @ipr_equipment i1 on i1.user_id = r.user_id
join @model m on m.model_id = i1.model_id -- consider only models found in @model
join @ipr_equipment_to_skill i2 on i2.equipment_id = i1.equipment_id
join @skl_skill s on s.skill_id = i2.skill_id -- consider only skills found in @skl_skill
where rating >=5
group by r.user_id
having count(*) = (select count(*) from @skl_skill CROSS JOIN @model) -- <<< IMPORTANT
It still seems odd to me that the identifier tying a user to a skill is named [equipment_id]. I would much prefer seeing a table like this, but I feel sure your schema is deeply embedded by now.
declare @sample table (user_id varchar(50), model_id, varchar(50), skill_id varchar(50), rating int)
--
What other functions does [equipment_id] serve?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 28, 2009 at 2:10 pm
Thanks for your help guys! Using the COUNT(*) seems to work. I went with getting the count from the joining of the model/series list and the skill list. That seemed to be the easiest way to allow for multiples of both and be able to seperate chapter and subchapter skills.
As for the use of "equipment_id", there were some legacy things, like the "ratingsmap" I mentioned, and there was the option of linking equipment from your profile to other records in the system. Both of those are gone, but there are still a couple reasons I want to keep them as 2 separate tables.
As you mentioned, it is pretty embedded in the system, so it'd be a pretty big task to change and test (especially since I'm only 1 of 2 developers :-P).
Secondly, we currently have a third table for "custom" pieces of equipment, and that is still using the "ratingsmap" method of storing ratings. Custom equipment isn't used as much, and there is no searching or real calculations done on it, so it's not as important, but I do want to get that out of there eventually as well. I've also been kicking around the idea of combining the custom equipment with our system equipment, so the schema for the IPR_equipment table would be something like:
IPR_equipment (equipment_id INT, user_id INT, model_id INT, series_id INT, is_custom BIT, manufacturer_name NVARCHAR(256), model_name NVARCHAR(256), series_name NVARCHAR(256))
Lastly, even without including the custom equipment in the same table, if I put model_id, series_id, and user_id all in the ratings table, that table would get really large, really fast. Right now, there's just the 2 INTs for equipment_id and skill_id, a TINYINT for rating, and 2 BITs for if the skill has actually been rated and for if the skill is applicable or not. And just with our initial test group of 2000 people using our system right now, there's over 23,000,000 records, totalling 427MB. I'm not too worried about size on the disk itself, but downloading the backups from our remote hosting would SUCK if it was significanly larger.
I think all of these issues can be addressed in time, but I'd still like to keep it as separate tables for being able to remove a piece of equipment without having to remove every rating for that equipment and being able to link equipment to other records in the future if necessary.
Oh, and just to give you an idea of what I had to work with when I first tried tackling this searching program a year ago, here's a "small" chuck of the code the last guy wrote that I dug up:
--Include all series for models selected in search without series
if exists(select * from @SearchersEQP where SeriesID is null)
begin
set @MustMatchNoSeries=1
-- ANY Series of ALL Models entered without a Series
select top 1 @sql='select J0.UID from (
SELECT E.UID, S.ModelID, E.SeriesID, E.EquipmentID
FROM dbo.IPR_Equipment E (NOLOCK)
JOIN dbo.EQP_Series S on S.SeriesID=E.SeriesID
WHERE E.Active = 1
AND E.UID '+cast(@UID as nvarchar(max)) +'
AND E.SeriesID is not null
AND S.ModelID IN(
SELECT DISTINCTE.ModelID
FROM dbo.IPR_Equipment E (NOLOCK)
JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)
ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID
AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'
AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')
UNION ALL
SELECT E.UID, e.ModelID, E.SeriesID, E.EquipmentID
FROM dbo.IPR_Equipment E (NOLOCK)
WHERE E.Active = 1
AND E.UID '+cast(@UID as nvarchar(max)) +'
AND e.ModelID IN(
SELECT DISTINCTE.ModelID
FROM dbo.IPR_Equipment E (NOLOCK)
JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)
ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID
AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'
AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')
) J0'
from @SearchersEQP
where SeriesID is null
Order by ID
select @sql=@SQL+' JOIN (
SELECT E.UID, S.ModelID, E.SeriesID, E.EquipmentID
FROM dbo.IPR_Equipment E (NOLOCK)
JOIN dbo.EQP_Series S on S.SeriesID=E.SeriesID
WHERE E.Active = 1
AND E.UID '+cast(@UID as nvarchar(max)) +'
AND E.SeriesID is not null
AND S.ModelID IN(
SELECT DISTINCTE.ModelID
FROM dbo.IPR_Equipment E (NOLOCK)
JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)
ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID
AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'
AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')
UNION ALL
SELECT E.UID, e.ModelID, E.SeriesID, E.EquipmentID
FROM dbo.IPR_Equipment E (NOLOCK)
WHERE E.Active = 1
AND E.UID '+cast(@UID as nvarchar(max)) +'
AND e.ModelID IN(
SELECT DISTINCTE.ModelID
FROM dbo.IPR_Equipment E (NOLOCK)
JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)
ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID
AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'
AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')
) J'+Cast(ID as nvarchar(max)) + ' on J0.UID=J'+cast (id as nvarchar(max))+'.UID '
from @SearchersEQP
where SeriesID is null
ORDER By ID
declare @TUID table (UID INT)
INSERT @TUID(UID)
EXECUTE sp_ExecuteSQL @sql
In his defense, he was struggling mostly because he had to deal with the slowness of extracting the ratings from the "ratingsmap", but still... :hehe:
April 28, 2009 at 2:31 pm
You're welcome, blake. Good luck to you.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply