Select statement to only return records where joined records all meet condition

  • CREATE TABLE #tblLeaders

    (

    LeaderID int,

    Leader varchar(50)

    );

    INSERT INTO #tblLeaders

    (LeaderID, Leader)

    SELECT 1, 'Pete' UNION ALL

    SELECT 2, 'Jim' UNION ALL

    SELECT 3, 'Arthur'

    CREATE TABLE #tblMembers

    (

    MemberID int,

    LeaderID int,

    Member varchar(50),

    Verified bit

    );

    INSERT INTO #tblMembers

    (MemberID, LeaderID, Member, Verified)

    SELECT 1, 1, 'Frank', 0 UNION ALL

    SELECT 2, 1, 'Martha', 1 UNION ALL

    SELECT 3, 2, 'Betty', 1 UNION ALL

    SELECT 4, 2, 'Mary', 1 UNION ALL

    SELECT 5, 3, 'Jerry', 1 UNION ALL

    SELECT 6, 3, 'Marcia', 0

    SELECT *

    FROM #tblLeaders

    INNER JOIN #tblMembers on #tblLeaders.LeaderID = #tblMembers.LeaderID

    ORDER BY Leader, Member

    DROP TABLE #tblLeaders

    DROP TABLE #tblMembers

    The select statement above returns all leaders and members. But I only want leaders / members where every member is verified.

    So, in the data above, I only want to see the records returned for Jim because he is the only leader who has all his members verified.

    Thanks for any help.

    Edit. I found that this seems to do the job. Is it the best/right way to do it?

    SELECT *

    FROM #tblLeaders

    INNER JOIN #tblMembers on #tblLeaders.LeaderID = #tblMembers.LeaderID

    WHERE NOT EXISTS(SELECT 0 FROM #tblMembers WHERE LeaderID = #tblLeaders.LeaderID AND Verified = 0)

    ORDER BY Leader, Member

  • There isn't always a "best" or "right" way to do a given thing. Take a look at the following:

    DROP TABLE #tblLeaders

    DROP TABLE #tblMembers

    CREATE TABLE #tblLeaders (

    LeaderID int,

    Leader varchar(50)

    );

    INSERT INTO #tblLeaders (LeaderID, Leader)

    SELECT 1, 'Pete' UNION ALL

    SELECT 2, 'Jim' UNION ALL

    SELECT 3, 'Arthur'

    CREATE TABLE #tblMembers (

    MemberID int,

    LeaderID int,

    Member varchar(50),

    Verified bit

    );

    INSERT INTO #tblMembers (MemberID, LeaderID, Member, Verified)

    SELECT 1, 1, 'Frank', 0 UNION ALL

    SELECT 2, 1, 'Martha', 1 UNION ALL

    SELECT 3, 2, 'Betty', 1 UNION ALL

    SELECT 4, 2, 'Mary', 1 UNION ALL

    SELECT 5, 3, 'Jerry', 1 UNION ALL

    SELECT 6, 3, 'Marcia', 0

    ;WITH LEADERSHIP AS (

    SELECT L.LeaderID, L.Leader, MIN(CAST(M.Verified AS tinyint)) AS MIN_VRFD

    FROM #tblLeaders AS L

    INNER JOIN #tblMembers AS M

    on L.LeaderID = M.LeaderID

    GROUP BY L.LeaderID, L.Leader

    )

    SELECT L.LeaderID, L.Leader, M.MemberID, M.Member, M.Verified

    FROM LEADERSHIP AS L

    INNER JOIN #tblMembers AS M

    ON L.LeaderID = M.LeaderID

    WHERE L.MIN_VRFD = 1

    ORDER BY L.Leader, M.Member;

    Then look at the query plans for each and see which one looks like it uses the least resources and go from there. This alternative query at least gives you something else to look at, whether good, bad, or indifferent.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • That seems as a good approach. I'll leave 2 alternatives which might not be better.

    WITH VLeaders AS(

    SELECT l.LeaderID,

    l.Leader

    FROM #tblLeaders l

    INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID

    GROUP BY l.LeaderID, l.Leader

    HAVING COUNT(*) = SUM(CAST( m.Verified AS Int))

    )

    SELECT l.LeaderID,

    l.Leader,

    m.MemberID,

    m.Member

    FROM VLeaders l

    JOIN #tblMembers m on l.LeaderID = m.LeaderID

    ORDER BY Leader, Member;

    WITH CTE AS(

    SELECT l.LeaderID,

    l.Leader,

    m.MemberID,

    m.Member,

    COUNT(*) OVER(PARTITION BY l.LeaderID) membercount,

    SUM(CAST( m.Verified AS Int)) OVER(PARTITION BY l.LeaderID) verifiedmembers

    FROM #tblLeaders l

    INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID

    )

    SELECT LeaderID,

    Leader,

    MemberID,

    Member

    FROM CTE

    WHERE membercount = verifiedmembers

    ORDER BY Leader, Member;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Reply to sgmunson - thanks for your reply.

    I have to admit I have never got into query plans before ... I pressed Ctrl M before running your code and mine and for yours it showed.

    Query1: Query cost (relative to the batch) : 14%

    Query2: Query cost (relative to the batch) : 14%

    Query3: Query cost (relative to the batch) : 72%

    For mine it showed:

    Query1: Query cost (relative to the batch) : 17%

    Query2: Query cost (relative to the batch) : 17%

    Query3: Query cost (relative to the batch) : 66%

    So, given that Query1 and Query2 are identical in both bits of code - is it true to say that mine is a little more efficient as the cost of Query3 is a bit lower? Is that how you evaluate this sort of thing.

    How can you determine the speed of execution? Given that they both execute in 00:00:00 - would one have to run it against a bit data set to see if there is a time advantage?

    Any tips for generally determining whether what you are doing is a bad idea - or not?

  • Luis Cazares (5/29/2014)


    That seems as a good approach. I'll leave 2 alternatives which might not be better.

    WITH VLeaders AS(

    SELECT l.LeaderID,

    l.Leader

    FROM #tblLeaders l

    INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID

    GROUP BY l.LeaderID, l.Leader

    HAVING COUNT(*) = SUM(CAST( m.Verified AS Int))

    )

    SELECT l.LeaderID,

    l.Leader,

    m.MemberID,

    m.Member

    FROM VLeaders l

    JOIN #tblMembers m on l.LeaderID = m.LeaderID

    ORDER BY Leader, Member;

    WITH CTE AS(

    SELECT l.LeaderID,

    l.Leader,

    m.MemberID,

    m.Member,

    COUNT(*) OVER(PARTITION BY l.LeaderID) membercount,

    SUM(CAST( m.Verified AS Int)) OVER(PARTITION BY l.LeaderID) verifiedmembers

    FROM #tblLeaders l

    INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID

    )

    SELECT LeaderID,

    Leader,

    MemberID,

    Member

    FROM CTE

    WHERE membercount = verifiedmembers

    ORDER BY Leader, Member;

    Thank you for your reply. Incorporating the two initial select statements - your two solutions run at:

    14%, 14% and 72% for the first one and 15%, 15% and 70% for the second one.

  • sku370870 (5/29/2014)


    Reply to sgmunson - thanks for your reply.

    I have to admit I have never got into query plans before ... I pressed Ctrl M before running your code and mine and for yours it showed.

    Query1: Query cost (relative to the batch) : 14%

    Query2: Query cost (relative to the batch) : 14%

    Query3: Query cost (relative to the batch) : 72%

    For mine it showed:

    Query1: Query cost (relative to the batch) : 17%

    Query2: Query cost (relative to the batch) : 17%

    Query3: Query cost (relative to the batch) : 66%

    So, given that Query1 and Query2 are identical in both bits of code - is it true to say that mine is a little more efficient as the cost of Query3 is a bit lower? Is that how you evaluate this sort of thing.

    How can you determine the speed of execution? Given that they both execute in 00:00:00 - would one have to run it against a bit data set to see if there is a time advantage?

    Any tips for generally determining whether what you are doing is a bad idea - or not?

    Well, if it runs instantaneously in SSMS, there's not much to look at EXCEPT the query plan, unless you have some large test tables to play with and a server that won't cause others grief if you suddenly overload it. Differences of +/- 5% in where the cost is depend more on the nature of the plan for that most costly query. Is it using an entirely different methodology? Do you see NESTED LOOPs in there? Those are usually good to avoid when you can, but it's not always possible. You can always try declaring a DATETIME2 variable and then setting it before execution of the query, and then PRINT DATEDIFF(ms, @DECLARED_VAR, GETDATE()) to see how many milliseconds the query does take to run. Push comes to shove you can go to microseconds (mcs), or nanoseconds (ns).

    There are some things to generally avoid, such as triangular joins, or cartesian products, but the rest of it is often as much art as it is science... Just be sure to let logic and reason be your guide.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • You shouldn't compare cost percentages on query plans against 2 different queries. They are just estimates and can lead to wrong assumptions.

    Run your queries against a large number of rows, at least one thousand (and that's still too little).

    Try comparing times using SET STATISTICS TIME ON/OFF.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply