Select statement to only return records where joined records all meet condition

Question

Post reply

Select statement to only return records where joined records all meet condition

webskater

SSCarpal Tunnel

Points: 4214
More actions
May 29, 2014 at 12:31 pm

#289316

CREATE TABLE #tblLeaders
(
LeaderID int,
Leader varchar(50)
);
INSERT INTO #tblLeaders
(LeaderID, Leader)
SELECT 1, 'Pete' UNION ALL
SELECT 2, 'Jim' UNION ALL
SELECT 3, 'Arthur'
CREATE TABLE #tblMembers
(
MemberID int,
LeaderID int,
Member varchar(50),
Verified bit
);
INSERT INTO #tblMembers
(MemberID, LeaderID, Member, Verified)
SELECT 1, 1, 'Frank', 0 UNION ALL
SELECT 2, 1, 'Martha', 1 UNION ALL
SELECT 3, 2, 'Betty', 1 UNION ALL
SELECT 4, 2, 'Mary', 1 UNION ALL
SELECT 5, 3, 'Jerry', 1 UNION ALL
SELECT 6, 3, 'Marcia', 0
SELECT *
FROM #tblLeaders
INNER JOIN #tblMembers on #tblLeaders.LeaderID = #tblMembers.LeaderID
ORDER BY Leader, Member
DROP TABLE #tblLeaders
DROP TABLE #tblMembers
The select statement above returns all leaders and members. But I only want leaders / members where every member is verified.
So, in the data above, I only want to see the records returned for Jim because he is the only leader who has all his members verified.
Thanks for any help.
Edit. I found that this seems to do the job. Is it the best/right way to do it?
SELECT *
FROM #tblLeaders
INNER JOIN #tblMembers on #tblLeaders.LeaderID = #tblMembers.LeaderID
WHERE NOT EXISTS(SELECT 0 FROM #tblMembers WHERE LeaderID = #tblLeaders.LeaderID AND Verified = 0)
ORDER BY Leader, Member

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110651 More actions · Answer 1

There isn't always a "best" or "right" way to do a given thing. Take a look at the following:

DROP TABLE #tblLeaders

DROP TABLE #tblMembers

CREATE TABLE #tblLeaders (

LeaderID int,

Leader varchar(50)

);

INSERT INTO #tblLeaders (LeaderID, Leader)

SELECT 1, 'Pete' UNION ALL

SELECT 2, 'Jim' UNION ALL

SELECT 3, 'Arthur'

CREATE TABLE #tblMembers (

MemberID int,

LeaderID int,

Member varchar(50),

Verified bit

);

INSERT INTO #tblMembers (MemberID, LeaderID, Member, Verified)

SELECT 1, 1, 'Frank', 0 UNION ALL

SELECT 2, 1, 'Martha', 1 UNION ALL

SELECT 3, 2, 'Betty', 1 UNION ALL

SELECT 4, 2, 'Mary', 1 UNION ALL

SELECT 5, 3, 'Jerry', 1 UNION ALL

SELECT 6, 3, 'Marcia', 0

;WITH LEADERSHIP AS (

SELECT L.LeaderID, L.Leader, MIN(CAST(M.Verified AS tinyint)) AS MIN_VRFD

FROM #tblLeaders AS L

INNER JOIN #tblMembers AS M

on L.LeaderID = M.LeaderID

GROUP BY L.LeaderID, L.Leader

)

SELECT L.LeaderID, L.Leader, M.MemberID, M.Member, M.Verified

FROM LEADERSHIP AS L

INNER JOIN #tblMembers AS M

ON L.LeaderID = M.LeaderID

WHERE L.MIN_VRFD = 1

ORDER BY L.Leader, M.Member;

Then look at the query plans for each and see which one looks like it uses the least resources and go from there. This alternative query at least gives you something else to look at, whether good, bad, or indifferent.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

That seems as a good approach. I'll leave 2 alternatives which might not be better.

WITH VLeaders AS(

SELECT l.LeaderID,

l.Leader

FROM #tblLeaders l

INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID

GROUP BY l.LeaderID, l.Leader

HAVING COUNT(*) = SUM(CAST( m.Verified AS Int))

)

SELECT l.LeaderID,

l.Leader,

m.MemberID,

m.Member

FROM VLeaders l

JOIN #tblMembers m on l.LeaderID = m.LeaderID

ORDER BY Leader, Member;

WITH CTE AS(

SELECT l.LeaderID,

l.Leader,

m.MemberID,

m.Member,

COUNT(*) OVER(PARTITION BY l.LeaderID) membercount,

SUM(CAST( m.Verified AS Int)) OVER(PARTITION BY l.LeaderID) verifiedmembers

FROM #tblLeaders l

INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID

)

SELECT LeaderID,

Leader,

MemberID,

Member

FROM CTE

WHERE membercount = verifiedmembers

ORDER BY Leader, Member;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

webskater SSCarpal Tunnel Points: 4214 More actions · Answer 3

Reply to sgmunson - thanks for your reply.

I have to admit I have never got into query plans before ... I pressed Ctrl M before running your code and mine and for yours it showed.

Query1: Query cost (relative to the batch) : 14%

Query2: Query cost (relative to the batch) : 14%

Query3: Query cost (relative to the batch) : 72%

For mine it showed:

Query1: Query cost (relative to the batch) : 17%

Query2: Query cost (relative to the batch) : 17%

Query3: Query cost (relative to the batch) : 66%

So, given that Query1 and Query2 are identical in both bits of code - is it true to say that mine is a little more efficient as the cost of Query3 is a bit lower? Is that how you evaluate this sort of thing.

How can you determine the speed of execution? Given that they both execute in 00:00:00 - would one have to run it against a bit data set to see if there is a time advantage?

Any tips for generally determining whether what you are doing is a bad idea - or not?

webskater SSCarpal Tunnel Points: 4214 More actions · Answer 4

Luis Cazares (5/29/2014)
That seems as a good approach. I'll leave 2 alternatives which might not be better.
WITH VLeaders AS(
SELECT l.LeaderID,
l.Leader
FROM #tblLeaders l
INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID
GROUP BY l.LeaderID, l.Leader
HAVING COUNT(*) = SUM(CAST( m.Verified AS Int))
)
SELECT l.LeaderID,
l.Leader,
m.MemberID,
m.Member
FROM VLeaders l
JOIN #tblMembers m on l.LeaderID = m.LeaderID
ORDER BY Leader, Member;
WITH CTE AS(
SELECT l.LeaderID,
l.Leader,
m.MemberID,
m.Member,
COUNT(*) OVER(PARTITION BY l.LeaderID) membercount,
SUM(CAST( m.Verified AS Int)) OVER(PARTITION BY l.LeaderID) verifiedmembers
FROM #tblLeaders l
INNER JOIN #tblMembers m on l.LeaderID = m.LeaderID
)
SELECT LeaderID,
Leader,
MemberID,
Member
FROM CTE
WHERE membercount = verifiedmembers
ORDER BY Leader, Member;

Thank you for your reply. Incorporating the two initial select statements - your two solutions run at:

14%, 14% and 72% for the first one and 15%, 15% and 70% for the second one.

sgmunson SSC Guru Points: 110651 More actions · Answer 5

sku370870 (5/29/2014)
Reply to sgmunson - thanks for your reply.
I have to admit I have never got into query plans before ... I pressed Ctrl M before running your code and mine and for yours it showed.
Query1: Query cost (relative to the batch) : 14%
Query2: Query cost (relative to the batch) : 14%
Query3: Query cost (relative to the batch) : 72%
For mine it showed:
Query1: Query cost (relative to the batch) : 17%
Query2: Query cost (relative to the batch) : 17%
Query3: Query cost (relative to the batch) : 66%
So, given that Query1 and Query2 are identical in both bits of code - is it true to say that mine is a little more efficient as the cost of Query3 is a bit lower? Is that how you evaluate this sort of thing.
How can you determine the speed of execution? Given that they both execute in 00:00:00 - would one have to run it against a bit data set to see if there is a time advantage?
Any tips for generally determining whether what you are doing is a bad idea - or not?

Well, if it runs instantaneously in SSMS, there's not much to look at EXCEPT the query plan, unless you have some large test tables to play with and a server that won't cause others grief if you suddenly overload it. Differences of +/- 5% in where the cost is depend more on the nature of the plan for that most costly query. Is it using an entirely different methodology? Do you see NESTED LOOPs in there? Those are usually good to avoid when you can, but it's not always possible. You can always try declaring a DATETIME2 variable and then setting it before execution of the query, and then PRINT DATEDIFF(ms, @DECLARED_VAR, GETDATE()) to see how many milliseconds the query does take to run. Push comes to shove you can go to microseconds (mcs), or nanoseconds (ns).

There are some things to generally avoid, such as triangular joins, or cartesian products, but the rest of it is often as much art as it is science... Just be sure to let logic and reason be your guide.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 6

You shouldn't compare cost percentages on query plans against 2 different queries. They are just estimates and can lead to wrong assumptions.

Run your queries against a large number of rows, at least one thousand (and that's still too little).

Try comparing times using SET STATISTICS TIME ON/OFF.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2