Silly SQL Question

Question

Silly SQL Question

EsquireDeveloper

Old Hand

Points: 376
More actions
September 4, 2020 at 9:57 pm

#3786501

I'm sure there's a simple way to do this, but I can't seem to figure it out. I have data something like this:
ID, RESULT
1, Y
1, Y
1, Y
2, N
2, N
3, Y
3, N
4, Y
4, N
I only want to pull records where the ID has both a result of 'Y' and a result of 'N' (In this case, that would pull all records with ID 3 and 4)
My instinct is to create a view pulling all the "Y" values and another view pulling all the "N" values, and a third query matching the IDs of the two views, thereby showing me all IDs with both a Y and and N. Is there a way to do this in one query?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63503 More actions · Answer 1

something like this?

SELECT DISTINCT ID
FROM Response r
WHERE EXISTS (SELECT 1 FROM Response r2 WHERE r2.ID = r.ID AND r2.result = 'Y')
AND EXISTS (SELECT 1 FROM Response r2 WHERE r2.ID = r.ID AND r2.Result = 'N');

Sue_H SSC Guru Points: 90860 More actions · Answer 2

How you go about it may depend upon how much data you have in the table or some other details we don't know about. You'd just have to play around with any suggestions. A CTE may be an option - something along the lines of:

WITH CTE (ID, Result, Previous) 
AS
(
SELECT ID, Result, Lag(Result, 1, Result) OVER(PARTITION BY ID ORDER BY ID) AS Previous
FROM YourTable
)
SELECT ID 
FROM CTE
WHERE Result <> Previous

Sue

Sergiy SSC Guru Points: 110208 More actions · Answer 3

You don't need to scan the table 3 times.

2 would be perfectly enough:

SELECT ID
FROM Response r
WHERE r.result = 'Y'
AND EXISTS (SELECT * FROM Response r2 WHERE r2.ID = r.ID AND r2.Result = 'N')
GROUP BY ID;

And please avoid using the keyword DISTINCT. It's a very bad habit.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 4

Actually, a single scan would be just fine:

SELECT ID
FROM Response r
GROUP BY ID
HAVING MAX(result) <> MIN(result);

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 5

Sergiy wrote:

Actually, a single scan would be just fine:
SELECT ID
FROM Response r
GROUP BY ID
HAVING MAX(result) <> MIN(result);

You've just gotta love the old ways. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 6

If the only values available in the result column are Y or N - you could do this:

 Select r.id
   From Response r
  Group By
        r.id
 Having count_big(Distinct r.result) = 2;

Note: using count_big eliminates an implicit conversion as the result from count needs to be converted to bigint (for some reason). This produces the same plan as Sergiy's solution with HAVING MAX(result) <> MIN(result) - the difference will be in the sort where the distinct sort will eliminate duplicate rows and the max/min does not.

Test both versions in your actual code to determine which is better for your implementation.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 7

Jeffrey Williams wrote:

If the only values available in the result column are Y or N - you could do this:
 Select r.id
   From Response r
  Group By
        r.id
 Having count_big(Distinct r.result) = 2;
Note: using count_big eliminates an implicit conversion as the result from count needs to be converted to bigint (for some reason). This produces the same plan as Sergiy's solution with HAVING MAX(result) <> MIN(result) - the difference will be in the sort where the distinct sort will eliminate duplicate rows and the max/min does not.
Test both versions in your actual code to determine which is better for your implementation.

A very wise man told me long ago...

"A Developer must not guess... a Developer must KNOW!" -- Sergiy Flocka - Circa 2007. 😀

You just can't determine which piece of code will run faster by looking at the execution plan (I've been seriously bitten in the distant past by doing so) and, as you said, a test needs to be done. Since we don't have the OP's data, let's make some.

For those wanting to test for themselves, here's some code to create a larger table and the fnTally function can be found at the article from the last link in my signature line below.

--===== Create and populate the test table
   DROP TABLE IF EXISTS #TestTable
;
   WITH cteID AS
(
 SELECT  ID=id.N
        ,RowMultiplier = ABS(CHECKSUM(NEWID())%3)+1
   FROM dbo.fnTally(1,1000000) id
)
 SELECT  id.ID
        ,Result = IIF(ABS(CHECKSUM(NEWID())%2) = 0, 'N','Y')
   INTO #TestTable
   FROM cteID id
  CROSS APPLY dbo.fnTally(1,RowMultiplier) rows
 OPTION (MAXDOP 1)
;
GO
-----------------------------------------------------------------------------------------------------------------------
  PRINT REPLICATE('=',119);
        RAISERROR('========== Sergiy''s code ==========',0,0) WITH NOWAIT;
;
DECLARE @BitBucket INT;
    SET STATISTICS TIME,IO ON;
 SELECT @BitBucket = ID
   FROM #TestTable r
  GROUP BY ID
 HAVING MAX(result) <> MIN(result);
    SET STATISTICS TIME,IO OFF;
GO
-----------------------------------------------------------------------------------------------------------------------
  PRINT REPLICATE('=',119);
        RAISERROR('========== Jeff William''s code ==========',0,0) WITH NOWAIT;
;
DECLARE @BitBucket INT;
    SET STATISTICS TIME,IO ON;
 Select @BitBucket = r.id
   From #TestTable r
  Group By
        r.id
 Having count_big(Distinct r.result) = 2;
    SET STATISTICS TIME,IO OFF;
GO

Run Results...

(1999756 rows affected)
=======================================================================================================================
========== Sergiy's code ==========
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestTable__________________________________________________________________________________________________________00000000094E'. 
Scan count 1, logical reads 4454, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 844 ms,  elapsed time = 840 ms.
=======================================================================================================================
========== Jeff William's code ==========
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestTable__________________________________________________________________________________________________________00000000094E'. 
Scan count 1, logical reads 4454, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1235 ms,  elapsed time = 1233 ms.

For this test, Jeff Williams code runs about 46% slower. Must be the DISTINCT Sergiy was talking about 😉

To Jeff's point, even though we tested with large enough randomized data above, the OP still needs to test in their environment.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)