Finding Records with ALL of Several Pieces of Criteria

  • I'm having some trouble with an issue that has troubled 2 developers with WAY more experience before me. 😛

    A little background: A user can have pieces of equipment in their profile. Each piece of equipment can have up to 500 skills associated with it. Each skill can be rated 0 through 9.

    The program that I am currently working on allows someone to search for someone else with a particular set of skills. So, you could be looking for someone who has rated themselves between 5 and 9 on a skill for equipment A and B. So, the only people who should show up would be ones who are rated between 5 and 9 for BOTH pieces of equipment A and B. You would send in the models, skills, and ratings range.

    The (simplified) tables are set up as:

    IPR_equipment (equipment_id INT, user_id INT, model_id)

    IPR_equipment_to_skill (equipment_id INT, skill_id INT, rating INT)

    Sending in a skill_id of 5733 and equipment with the model_ids of 1558 and 1559 and a range of 5 to 9, this is basically what the code would currently run. Table @result would already have a list of user_ids that have met other criteria.

    --Get all requested skills for all requested pieces of equipment

    INSERT INTO @equipment_skill_temp (model_id, skill_id)

    SELECT E.model_id, S.skill_id

    FROM IPR_equipment E

    JOIN SKL_skill S ON 1 = 1

    WHERE E.model_id IN (1558, 1559)

    AND S.skill_id IN (5733)

    --Loop through all skills

    WHILE EXISTS (

    SELECT * FROM @equipment_skill_temp

    )

    BEGIN

    --Select first equipment / skill combination

    SELECT TOP 1 @temp_model_id = model_id,

    @temp_skill_id = skill_id

    FROM @equipment_skill_temp

    --Remove it from the list

    DELETE FROM @equipment_skill_temp

    WHERE model_id = @temp_model_id

    AND skill_id = @temp_skill_id

    --Get users who are rated between 5 and 9 for current equipment / skill

    INSERT INTO @equipment_user_temp (user_id)

    SELECT DISTINCT user_id

    FROM IPR_equipment E

    JOIN IPR_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id

    WHERE ISNULL(E.model_id, -1) = @temp_model_id

    AND skill_id = @temp_skill_id

    AND rating BETWEEN 5 AND 9

    --Remove people who didn't meet the current equipment / skill rating requirement

    DELETE FROM @result

    WHERE user_id NOT IN (

    SELECT user_id FROM @equipment_user_temp

    )

    --Reset user list

    DELETE FROM @equipment_user_temp

    END

    --Select remaining users who meet all the criteria

    SELECT user_id FROM @result

    This what I came up with when trying to get away from a loop.

    --Get all requested skills for all requested pieces of equipment

    INSERT INTO @equipment_skill_temp (model_id, skill_id)

    SELECT E.model_id, S.skill_id

    FROM IPR_equipment E

    JOIN SKL_skill S ON 1 = 1

    WHERE E.model_id IN (1558, 1559)

    AND S.skill_id IN (5733)

    --Get users who are rated between 5 and 9 for ANY (the issue) equipment / skill

    INSERT INTO @equipment_user_temp (user_id)

    SELECT DISTINCT user_id

    FROM IPR_equipment E

    JOIN IPR_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id

    JOIN @equipment_skill_temp E2S_temp

    ON E2S_temp.model_id = E.model_id

    AND E2S_temp.skill_id = E2S.skill_id

    WHERE rating BETWEEN 5 AND 9

    --Remove people who didn't meet the current equipment / skill rating requirement

    DELETE FROM @result

    WHERE user_id NOT IN (

    SELECT user_id FROM @equipment_user_temp

    )

    --Select remaining users who meet all the criteria

    SELECT user_id FROM @result

    The problem is that you could have a 7 on one piece of equipment and 3 on the other and still show up, where the first one would basically treat them as seperate searches and just give you the results that overlap. Previous developers that have worked here have just come up with INCREDIBLY complicated series selects that produce strings to produce other strings to get the results, and it's never even worked like that. At least my loop produces the desired results, but from what I can tell, it's not the most efficient.

    This is somewhat of a result of me reading this article[/url], but I didn't want to clutter up that thread with me just not knowing enough about T-SQL. 😉

    Thanks for any ideas or even just nudges in the right direction!

  • Looking at the two little schema you supplied, I don't understand why the user ID is associated with a piece of equipment, rather than with skills. I would expect the skills table to look like

    UserID/Equipment/Skill/Rating

    This can be done, but it would be much better if you could give us a script to create a couple of temp tables with schema for the columns you need, and just a little sample data. Finally, how would you expect the output to look. That way we can supply you with tested solutions rather than just nudges, because we can see if our code produces the expected output from the input you supplied. Thanks.

    Bob

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The reason there are 2 seperate tables is because initially, someone else set up the IPR_equipment table as (equipment_id INT, user_id INT, model_id INT, ratingsmap NVARCHAR(MAX)) where "ratingsmap" would contain all of the ratings for that equipment in one big string like "5733:5|5734:7|5735:8". Currently, "ratingsmap" is still used a few places in our system that I haven't gotten to switch over yet, so both ratings methods are running side by side right now, updating each other with triggers.

    Also, equipment_id is unique in IPR_equipment, so 2 people could each have a piece of equipment with a model_id of 1558, but with unique equipment_ids. If a user removes a piece of equipment, only the 1 record in IPR_equipment is deactivated, but the 500 or so records in IPR_equipment_to_skill remain, so if they add that particular model back in, it can just reactivate the 1 record and all the ratings are still there.

    I will work on getting a sample script up.

    Thanks.

  • Ewww....

    Did the same people who wrote the first chunk of code design your DB ??

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Sadly, the first code block I posted was my vast improvement over the code that was there. Like I said, the original code created strings put together to make a GIANT select that took FOREVER to run. If I remember right, there was a chuck of code (maybe 30 lines long depending on how many skills were selected) for each piece of equipment. And typically, a user might put in 10 pieces of equipment with 10 skills, so then you're looking at code size with a factor of 100x... :crazy:

  • Ok... This is my first attempt to provide a working example of something on a forum, so I apologize in advance if I make this more confusing that what it already was. 😛

    Here's what I would consider to be valid sample data.

    • You have users A, B, and C.
    • You have models 747 and Cessna.
    • You have skills Servicing and Auto Flight.
    • User A has model 747 rated at 2 for Servicing and Auto Flight.
    • User A has model Cessna rated at 4 for Servicing and 5 for Auto Flight.
    • User B has model 747 rated at 7 for Servicing and Auto Flight.
    • User B has model Cessna rated at 9 for Servicing and Auto Flight.
    • User C has model 747 rated at 8 for Servicing and Auto Flight.

    DECLARE @result TABLE (user_id VARCHAR(50))

    DECLARE @ipr_equipment TABLE (equipment_id VARCHAR(50), user_id VARCHAR(50), model_id VARCHAR(50))

    DECLARE @skl_skill TABLE (skill_id VARCHAR(50))

    DECLARE @ipr_equipment_to_skill TABLE (equipment_id VARCHAR(50), skill_id VARCHAR(50), rating INT)

    --

    INSERT INTO @result (user_id)

    SELECT 'A'

    UNION

    SELECT 'B'

    UNION

    SELECT 'C'

    --

    INSERT INTO @ipr_equipment (equipment_id, user_id, model_id)

    SELECT 1001, 'A', '747'

    UNION

    SELECT 1002, 'A', 'Cessna'

    UNION

    SELECT 1003, 'B', '747'

    UNION

    SELECT 1004, 'B', 'Cessna'

    UNION

    SELECT 1005, 'C', '747'

    --

    INSERT INTO @skl_skill (skill_id)

    SELECT 'Servicing'

    UNION

    SELECT 'Auto Flight'

    --

    INSERT INTO @ipr_equipment_to_skill (equipment_id, skill_id, rating)

    SELECT 1001, 'Servicing', 2

    UNION

    SELECT 1001, 'Auto Flight', 2

    UNION

    SELECT 1002, 'Servicing', 4

    UNION

    SELECT 1002, 'Auto Flight', 5

    UNION

    SELECT 1003, 'Servicing', 7

    UNION

    SELECT 1003, 'Auto Flight', 7

    UNION

    SELECT 1004, 'Servicing', 9

    UNION

    SELECT 1004, 'Auto Flight', 9

    UNION

    SELECT 1005, 'Servicing', 8

    UNION

    SELECT 1005, 'Auto Flight', 8

    --

    DECLARE @equipment_skill_temp TABLE (model_id VARCHAR(50), skill_id VARCHAR(50))

    DECLARE @temp_model_id VARCHAR(50), @temp_skill_id VARCHAR(50)

    DECLARE @equipment_user_temp TABLE (user_id VARCHAR(50))

    The current code modified to work with the test data. Only user B is returned with the given values, because it's the only one with ratings between 5 and 9 for Servicing for both 747 and Cessna.

    --Get all requested skills for all requested pieces of equipment

    INSERT INTO @equipment_skill_temp (model_id, skill_id)

    SELECT DISTINCT model_id, skill_id

    FROM @ipr_equipment E

    JOIN @skl_skill S ON 1 = 1

    WHERE model_id IN ('747', 'Cessna') --model_id parameters

    AND S.skill_id IN ('Servicing') --skill_id parameters

    --Loop through all skills

    WHILE EXISTS (

    SELECT * FROM @equipment_skill_temp

    )

    BEGIN

    --Select first equipment / skill combination

    SELECT TOP 1 @temp_model_id = model_id,

    @temp_skill_id = skill_id

    FROM @equipment_skill_temp

    --

    --Remove it from the list

    DELETE FROM @equipment_skill_temp

    WHERE model_id = @temp_model_id

    AND skill_id = @temp_skill_id

    --

    --Get users who are rated between 5 and 9 for current equipment / skill

    INSERT INTO @equipment_user_temp (user_id)

    SELECT DISTINCT user_id

    FROM @ipr_equipment E

    JOIN @ipr_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id

    WHERE ISNULL(E.model_id, -1) = @temp_model_id

    AND skill_id = @temp_skill_id

    AND rating BETWEEN 5 AND 9 --range parameters

    --

    --Remove people who didn't meet the current equipment / skill rating requirement

    DELETE FROM @result

    WHERE user_id NOT IN (

    SELECT user_id FROM @equipment_user_temp

    )

    --

    --Reset user list

    DELETE FROM @equipment_user_temp

    END

    --

    --Select remaining users who meet all the criteria

    SELECT user_id FROM @result

    What I've tried that doesn't work. This returns users B and C, because user C has one of the two pieces of equipment rated between 5 and 9 for Servicing. That user shouldn't come up, since you need to have both pieces of equipment.

    --Get all requested skills for all requested pieces of equipment

    INSERT INTO @equipment_skill_temp (model_id, skill_id)

    SELECT DISTINCT E.model_id, S.skill_id

    FROM @ipr_equipment E

    JOIN @skl_skill S ON 1 = 1

    WHERE model_id IN ('747', 'Cessna') --model_id parameters

    AND S.skill_id IN ('Servicing') --skill_id parameters

    --

    --Get users who are rated between 5 and 9 for ANY (the issue) equipment / skill

    INSERT INTO @equipment_user_temp (user_id)

    SELECT DISTINCT user_id

    FROM @ipr_equipment E

    JOIN @ipr_equipment_to_skill E2S ON E2S.equipment_id = E.equipment_id

    JOIN @equipment_skill_temp E2S_temp

    ON E2S_temp.model_id = E.model_id

    AND E2S_temp.skill_id = E2S.skill_id

    WHERE rating BETWEEN 5 AND 9 --range parameters

    --

    --Remove people who didn't meet the current equipment / skill rating requirement

    DELETE FROM @result

    WHERE user_id NOT IN (

    SELECT user_id FROM @equipment_user_temp

    )

    --

    --Select remaining users who meet all the criteria

    SELECT user_id FROM @result

    Hope that all makes some sense.

  • Thanks 🙂

    Loading it up, now.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I think your problem is solvable using a HAVING clause which ensures that the number of records for each user_id that is returned is equal to the number of model_id values passed to the query in the @modelList table variable. The DISTINCT keyword in the COUNT aggregate function is intended to ensure that any duplicate @ipr_equipment or @ipr_equipment_to_skill rows don't affect the results.

    DECLARE @modelList TABLE (model_id varchar(20))

    INSERT @modelList VALUES ('Cessna')

    INSERT @modelList VALUES ('747')

    DECLARE @skill_id varchar(20)

    SELECT @skill_id = 'Servicing'

    SELECT E.[user_id]

    FROM @modelList ML

    INNER JOIN @ipr_equipment E ON (ML.model_id = E.model_id)

    INNER JOIN @ipr_equipment_to_skill E2S ON (E.equipment_id = E2S.equipment_id)

    WHERE (E2S.skill_id = @skill_id)

    AND (E2S.rating BETWEEN 5 AND 9)

    GROUP BY E.[user_id]

    HAVING (COUNT(DISTINCT E.model_id) = (SELECT COUNT(DISTINCT model_id) FROM @modelList))

  • That seems to work. I'll have to wait till tomorrow to be able to fully test it.

    A couple things to add into the twist, I only listed model_id to make things more simple, but there's actually a series_id as well. If something is series specific, model_id = -1 and series_id has a value, but if it's only model specific, it's the opposite. To take that into account, would I just need to make it

    HAVING (COUNT(DISTINCT E.model_id) = (SELECT COUNT(DISTINCT model_id) FROM @modelSeriesList))

    AND (COUNT(DISTINCT E.series_id) = (SELECT COUNT(DISTINCT series_id) FROM @modelList))

    Also, there are chapter level skills and subchapter level skills. The subchapter levels are the ratings that are stored and are searched through by the code I've previously provided. If a chapter level skill is sent in, it looks up all the ratings for the subchapters within that chapter and then does a GROUP BY on user_id to get the average for that chapter for each user. I think I should be able to just add model_id and series_id to that GROUP BY and then GROUP BY the user_id with a SELECT around that, right?

  • Sorry to leave this hanging so long. 🙂

    I went the same route as Andrew, adding a @model table variable to go with @skl_skill to control the inputs to the query. The following returns the user_ids who have a rating of at least 5 for both skills for both models. This could be extended to add minimum ratings to the skills, so that you could specify different minimum ratings for different skills.

    select r.user_id

    from @result r

    join @ipr_equipment i1 on i1.user_id = r.user_id

    join @model m on m.model_id = i1.model_id -- consider only models found in @model

    join @ipr_equipment_to_skill i2 on i2.equipment_id = i1.equipment_id

    join @skl_skill s on s.skill_id = i2.skill_id -- consider only skills found in @skl_skill

    where rating >=5

    group by r.user_id

    having count(*) = (select count(*) from @skl_skill CROSS JOIN @model) -- <<< IMPORTANT

    It still seems odd to me that the identifier tying a user to a skill is named [equipment_id]. I would much prefer seeing a table like this, but I feel sure your schema is deeply embedded by now.

    declare @sample table (user_id varchar(50), model_id, varchar(50), skill_id varchar(50), rating int)

    --

    What other functions does [equipment_id] serve?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks for your help guys! Using the COUNT(*) seems to work. I went with getting the count from the joining of the model/series list and the skill list. That seemed to be the easiest way to allow for multiples of both and be able to seperate chapter and subchapter skills.

    As for the use of "equipment_id", there were some legacy things, like the "ratingsmap" I mentioned, and there was the option of linking equipment from your profile to other records in the system. Both of those are gone, but there are still a couple reasons I want to keep them as 2 separate tables.

    As you mentioned, it is pretty embedded in the system, so it'd be a pretty big task to change and test (especially since I'm only 1 of 2 developers :-P).

    Secondly, we currently have a third table for "custom" pieces of equipment, and that is still using the "ratingsmap" method of storing ratings. Custom equipment isn't used as much, and there is no searching or real calculations done on it, so it's not as important, but I do want to get that out of there eventually as well. I've also been kicking around the idea of combining the custom equipment with our system equipment, so the schema for the IPR_equipment table would be something like:

    IPR_equipment (equipment_id INT, user_id INT, model_id INT, series_id INT, is_custom BIT, manufacturer_name NVARCHAR(256), model_name NVARCHAR(256), series_name NVARCHAR(256))

    Lastly, even without including the custom equipment in the same table, if I put model_id, series_id, and user_id all in the ratings table, that table would get really large, really fast. Right now, there's just the 2 INTs for equipment_id and skill_id, a TINYINT for rating, and 2 BITs for if the skill has actually been rated and for if the skill is applicable or not. And just with our initial test group of 2000 people using our system right now, there's over 23,000,000 records, totalling 427MB. I'm not too worried about size on the disk itself, but downloading the backups from our remote hosting would SUCK if it was significanly larger.

    I think all of these issues can be addressed in time, but I'd still like to keep it as separate tables for being able to remove a piece of equipment without having to remove every rating for that equipment and being able to link equipment to other records in the future if necessary.

    Oh, and just to give you an idea of what I had to work with when I first tried tackling this searching program a year ago, here's a "small" chuck of the code the last guy wrote that I dug up:

    --Include all series for models selected in search without series

    if exists(select * from @SearchersEQP where SeriesID is null)

    begin

    set @MustMatchNoSeries=1

    -- ANY Series of ALL Models entered without a Series

    select top 1 @sql='select J0.UID from (

    SELECT E.UID, S.ModelID, E.SeriesID, E.EquipmentID

    FROM dbo.IPR_Equipment E (NOLOCK)

    JOIN dbo.EQP_Series S on S.SeriesID=E.SeriesID

    WHERE E.Active = 1

    AND E.UID '+cast(@UID as nvarchar(max)) +'

    AND E.SeriesID is not null

    AND S.ModelID IN(

    SELECT DISTINCTE.ModelID

    FROM dbo.IPR_Equipment E (NOLOCK)

    JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)

    ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID

    AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'

    AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')

    UNION ALL

    SELECT E.UID, e.ModelID, E.SeriesID, E.EquipmentID

    FROM dbo.IPR_Equipment E (NOLOCK)

    WHERE E.Active = 1

    AND E.UID '+cast(@UID as nvarchar(max)) +'

    AND e.ModelID IN(

    SELECT DISTINCTE.ModelID

    FROM dbo.IPR_Equipment E (NOLOCK)

    JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)

    ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID

    AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'

    AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')

    ) J0'

    from @SearchersEQP

    where SeriesID is null

    Order by ID

    select @sql=@SQL+' JOIN (

    SELECT E.UID, S.ModelID, E.SeriesID, E.EquipmentID

    FROM dbo.IPR_Equipment E (NOLOCK)

    JOIN dbo.EQP_Series S on S.SeriesID=E.SeriesID

    WHERE E.Active = 1

    AND E.UID '+cast(@UID as nvarchar(max)) +'

    AND E.SeriesID is not null

    AND S.ModelID IN(

    SELECT DISTINCTE.ModelID

    FROM dbo.IPR_Equipment E (NOLOCK)

    JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)

    ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID

    AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'

    AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')

    UNION ALL

    SELECT E.UID, e.ModelID, E.SeriesID, E.EquipmentID

    FROM dbo.IPR_Equipment E (NOLOCK)

    WHERE E.Active = 1

    AND E.UID '+cast(@UID as nvarchar(max)) +'

    AND e.ModelID IN(

    SELECT DISTINCTE.ModelID

    FROM dbo.IPR_Equipment E (NOLOCK)

    JOIN dbo.CSRC_CriteriaEQP C'+cast(ID as nvarchar(max)) +' (NOLOCK)

    ON E.EquipmentID = C'+cast(ID as nvarchar(max)) + '.EquipmentID

    AND C'+Cast(ID as nvarchar(max)) +'.EquipmentID='+cast(EquipmentID as nvarchar(max))+'

    AND C'+Cast(ID as nvarchar(max)) +'.SearchID='+Cast(@SID as nvarchar(max))+')

    ) J'+Cast(ID as nvarchar(max)) + ' on J0.UID=J'+cast (id as nvarchar(max))+'.UID '

    from @SearchersEQP

    where SeriesID is null

    ORDER By ID

    declare @TUID table (UID INT)

    INSERT @TUID(UID)

    EXECUTE sp_ExecuteSQL @sql

    In his defense, he was struggling mostly because he had to deal with the slowness of extracting the ratings from the "ratingsmap", but still... :hehe:

  • You're welcome, blake. Good luck to you.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply