September 4, 2020 at 9:57 pm
I'm sure there's a simple way to do this, but I can't seem to figure it out. I have data something like this:
ID, RESULT
1, Y
1, Y
1, Y
2, N
2, N
3, Y
3, N
4, Y
4, N
I only want to pull records where the ID has both a result of 'Y' and a result of 'N' (In this case, that would pull all records with ID 3 and 4)
My instinct is to create a view pulling all the "Y" values and another view pulling all the "N" values, and a third query matching the IDs of the two views, thereby showing me all IDs with both a Y and and N. Is there a way to do this in one query?
September 4, 2020 at 10:54 pm
something like this?
SELECT DISTINCT ID
FROM Response r
WHERE EXISTS (SELECT 1 FROM Response r2 WHERE r2.ID = r.ID AND r2.result = 'Y')
AND EXISTS (SELECT 1 FROM Response r2 WHERE r2.ID = r.ID AND r2.Result = 'N');
September 4, 2020 at 11:58 pm
How you go about it may depend upon how much data you have in the table or some other details we don't know about. You'd just have to play around with any suggestions. A CTE may be an option - something along the lines of:
WITH CTE (ID, Result, Previous)
AS
(
SELECT ID, Result, Lag(Result, 1, Result) OVER(PARTITION BY ID ORDER BY ID) AS Previous
FROM YourTable
)
SELECT ID
FROM CTE
WHERE Result <> Previous
Sue
September 5, 2020 at 12:22 am
You don't need to scan the table 3 times.
2 would be perfectly enough:
SELECT ID
FROM Response r
WHERE r.result = 'Y'
AND EXISTS (SELECT * FROM Response r2 WHERE r2.ID = r.ID AND r2.Result = 'N')
GROUP BY ID;
And please avoid using the keyword DISTINCT. It's a very bad habit.
_____________
Code for TallyGenerator
September 5, 2020 at 12:26 am
Actually, a single scan would be just fine:
SELECT ID
FROM Response r
GROUP BY ID
HAVING MAX(result) <> MIN(result);
_____________
Code for TallyGenerator
September 5, 2020 at 4:14 pm
Actually, a single scan would be just fine:
SELECT ID
FROM Response r
GROUP BY ID
HAVING MAX(result) <> MIN(result);
You've just gotta love the old ways. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
September 5, 2020 at 4:20 pm
If the only values available in the result column are Y or N - you could do this:
Select r.id
From Response r
Group By
r.id
Having count_big(Distinct r.result) = 2;
Note: using count_big eliminates an implicit conversion as the result from count needs to be converted to bigint (for some reason). This produces the same plan as Sergiy's solution with HAVING MAX(result) <> MIN(result) - the difference will be in the sort where the distinct sort will eliminate duplicate rows and the max/min does not.
Test both versions in your actual code to determine which is better for your implementation.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 5, 2020 at 11:10 pm
If the only values available in the result column are Y or N - you could do this:
Select r.id
From Response r
Group By
r.id
Having count_big(Distinct r.result) = 2;Note: using count_big eliminates an implicit conversion as the result from count needs to be converted to bigint (for some reason). This produces the same plan as Sergiy's solution with HAVING MAX(result) <> MIN(result) - the difference will be in the sort where the distinct sort will eliminate duplicate rows and the max/min does not.
Test both versions in your actual code to determine which is better for your implementation.
A very wise man told me long ago...
"A Developer must not guess... a Developer must KNOW!" -- Sergiy Flocka - Circa 2007. 😀
You just can't determine which piece of code will run faster by looking at the execution plan (I've been seriously bitten in the distant past by doing so) and, as you said, a test needs to be done. Since we don't have the OP's data, let's make some.
For those wanting to test for themselves, here's some code to create a larger table and the fnTally function can be found at the article from the last link in my signature line below.
--===== Create and populate the test table
DROP TABLE IF EXISTS #TestTable
;
WITH cteID AS
(
SELECT ID=id.N
,RowMultiplier = ABS(CHECKSUM(NEWID())%3)+1
FROM dbo.fnTally(1,1000000) id
)
SELECT id.ID
,Result = IIF(ABS(CHECKSUM(NEWID())%2) = 0, 'N','Y')
INTO #TestTable
FROM cteID id
CROSS APPLY dbo.fnTally(1,RowMultiplier) rows
OPTION (MAXDOP 1)
;
GO
-----------------------------------------------------------------------------------------------------------------------
PRINT REPLICATE('=',119);
RAISERROR('========== Sergiy''s code ==========',0,0) WITH NOWAIT;
;
DECLARE @BitBucket INT;
SET STATISTICS TIME,IO ON;
SELECT @BitBucket = ID
FROM #TestTable r
GROUP BY ID
HAVING MAX(result) <> MIN(result);
SET STATISTICS TIME,IO OFF;
GO
-----------------------------------------------------------------------------------------------------------------------
PRINT REPLICATE('=',119);
RAISERROR('========== Jeff William''s code ==========',0,0) WITH NOWAIT;
;
DECLARE @BitBucket INT;
SET STATISTICS TIME,IO ON;
Select @BitBucket = r.id
From #TestTable r
Group By
r.id
Having count_big(Distinct r.result) = 2;
SET STATISTICS TIME,IO OFF;
GO
Run Results...
(1999756 rows affected)
=======================================================================================================================
========== Sergiy's code ==========
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestTable__________________________________________________________________________________________________________00000000094E'.
Scan count 1, logical reads 4454, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 844 ms, elapsed time = 840 ms.
=======================================================================================================================
========== Jeff William's code ==========
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestTable__________________________________________________________________________________________________________00000000094E'.
Scan count 1, logical reads 4454, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1235 ms, elapsed time = 1233 ms.
For this test, Jeff Williams code runs about 46% slower. Must be the DISTINCT Sergiy was talking about 😉
To Jeff's point, even though we tested with large enough randomized data above, the OP still needs to test in their environment.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply