Brain freeze.

Question

Brain freeze.

LittleDuck

Old Hand

Points: 330
More actions
March 26, 2015 at 8:33 am

#295486

Right I'm pulling blank today.
Number | Letter
1 A
1 B
2 A
3 A
3 B
4 A
4 C
5 A
I'm trying to retrieve all of the numbers that have the letter A AND B
For the above I'd be aiming for the result of number 1 and number 3.
Anyone point me in the right direction of achieving this.
Thanks

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

LittleDuck (3/26/2015)
Right I'm pulling blank today.
Number | Letter
1 A
1 B
2 A
3 A
3 B
4 A
4 C
5 A
I'm trying to retrieve all of the numbers that have the letter A AND B
For the above I'd be aiming for the result of number 1 and number 3.
Anyone point me in the right direction of achieving this.
Thanks

Here's some DDL to create and populate a temp table:

if object_id('tempdb.dbo.#letters', 'u') is not null drop table #letters;

create table #letters (

num integer,

letter char(1));

insert into #letters(num, letter)

values(1, 'A'),

(1, 'B'),

(2, 'A'),

(3, 'A'),

(3, 'B'),

(4, 'A'),

(4, 'C'),

(5, 'A');

To pull the results you want, you want the results of your two queries where they intersect:

select num

from #letters

where letter = 'A'

intersect

select num

from #letters

where letter = 'b';

There are probably several ways to do this. This is just one of them.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

LittleDuck Old Hand Points: 330 More actions · Answer 2

Ah cool. My brain came alive and I came up with a couple of ways to achieve it.

Damn me for only have 4 hours sleep last night.

Cheers for the quick reply.

SELECT DISTINCT Number, letter

FROM Testing

WHERE EXISTS (SELECT 1

FROM Testing t WHERE Letter = 'A'

AND Testing.Number = t.Number) AND EXISTS (SELECT 1 FROM

Testing t WHERE Letter = 'B' AND Testing.Number = t.Number) AND Letter IN ('A', 'B')

And a messier one

SELECT DATA.Number, DATA.Letter FROM (SELECT Number, Letter

FROM [Testing]

WHERE Testing.Letter = 'A') DATA

INNER JOIN (SELECT Number, Letter

FROM [Testing]

WHERE Testing.Letter = 'B') B ON B.Number = DATA.Number

UNION ALL

SELECT B.Number, B.Letter FROM (SELECT Number, Letter

FROM [Testing]

WHERE Testing.Letter = 'A') DATA

INNER JOIN (SELECT Number, Letter

FROM [Testing]

WHERE Testing.Letter = 'B') B ON B.Number = DATA.Number

Ed Wagner SSC Guru Points: 287026 More actions · Answer 3

I know this is just test data, but take a look at the query plans for all of them.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

Just another way to get this result.

SELECT num

FROM #letters

WHERE letter IN('A', 'B')

GROUP BY num

HAVING COUNT( DISTINCT letter) = 2

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Cadavre SSC-Forever Points: 41690 More actions · Answer 5

With no index, as in the sample data, here's a couple more options to add to your own, Ed's and Luis' :w00t:

SELECT a.num

FROM #letters a

CROSS APPLY ( SELECT TOP 1

1

FROM #letters b

WHERE a.num = b.num

AND b.letter = 'B'

) ca ( X )

WHERE a.letter = 'A';

SELECT a.num

FROM #letters a

WHERE EXISTS ( SELECT 1

FROM #letters b

WHERE a.num = b.num

AND b.letter = 'B' )

AND a.letter = 'A';

I thought we could do some performance testing, over 1,000,000 rows and with no indexes.

Test Environment Setup: -

IF OBJECT_ID('tempdb..#testEnvironment') IS NOT NULL

BEGIN

DROP TABLE #testEnvironment;

END;

SELECT TOP 1000000

( ABS(CHECKSUM(NEWID())) % 100000 ) + 1 AS num,

CHAR(( ( ABS(CHECKSUM(NEWID())) % 3 ) + 1 ) + 64) AS letter

INTO #testEnvironment

FROM master.dbo.syscolumns sc1

CROSS JOIN master.dbo.syscolumns sc2

CROSS JOIN master.dbo.syscolumns sc3;

So that's #testEnvironment populated with some semi-random data. Now lets' create a harness to run the queries in and record the timings: -

IF OBJECT_ID('tempdb..#performance') IS NOT NULL

BEGIN

DROP TABLE #performance;

END;

CREATE TABLE #performance

(

Query VARCHAR(100),

StartTime DATETIME,

EndTime DATETIME

);

GO

DECLARE @StartTime DATETIME,

@Holder INT;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET @StartTime = GETDATE();

SELECT @Holder = num

FROM ( SELECT num

FROM #testEnvironment

WHERE letter = 'A'

INTERSECT

SELECT num

FROM #testEnvironment

WHERE letter = 'B'

) a;

INSERT INTO #performance

( Query, StartTime, EndTime )

VALUES ( 'Intersect', @StartTime, GETDATE() );

GO 50

DECLARE @StartTime DATETIME,

@Holder INT;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET @StartTime = GETDATE();

SELECT @Holder = num

FROM #testEnvironment

WHERE letter IN ( 'A', 'B' )

GROUP BY num

HAVING COUNT(DISTINCT letter) = 2;

INSERT INTO #performance

(

Query,

StartTime,

EndTime

)

VALUES (

'Having Count',

@StartTime,

GETDATE()

);

GO 50

DECLARE @StartTime DATETIME,

@Holder INT;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET @StartTime = GETDATE();

SELECT @Holder = a.num

FROM #testEnvironment a

CROSS APPLY ( SELECT TOP 1

1

FROM #testEnvironment b

WHERE a.num = b.num

AND b.letter = 'B'

) ca ( X )

WHERE a.letter = 'A';

INSERT INTO #performance

(

Query,

StartTime,

EndTime

)

VALUES (

'Cross Apply',

@StartTime,

GETDATE()

);

GO 50

DECLARE @StartTime DATETIME,

@Holder INT;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET @StartTime = GETDATE();

SELECT @Holder = a.num

FROM #testEnvironment a

WHERE EXISTS ( SELECT 1

FROM #testEnvironment b

WHERE a.num = b.num

AND b.letter = 'B' )

AND a.letter = 'A';

INSERT INTO #performance

( Query, StartTime, EndTime )

VALUES ( 'Exists', @StartTime, GETDATE() );

GO 50

DECLARE @StartTime DATETIME,

@Holder INT;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET @StartTime = GETDATE();

SELECT @Holder = num

FROM ( SELECT DISTINCT

num,

letter

FROM #testEnvironment

WHERE EXISTS ( SELECT 1

FROM #testEnvironment t

WHERE letter = 'A'

AND #testEnvironment.num = t.num )

AND EXISTS ( SELECT 1

FROM #testEnvironment t

WHERE letter = 'B'

AND #testEnvironment.num = t.num )

AND letter IN ( 'A', 'B' )

) a;

INSERT INTO #performance

(

Query,

StartTime,

EndTime

)

VALUES (

'Distinct Exists',

@StartTime,

GETDATE()

);

GO 50

DECLARE @StartTime DATETIME,

@Holder INT;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET @StartTime = GETDATE();

SELECT @Holder = num

FROM ( SELECT DATA.num

FROM ( SELECT num,

letter

FROM #testEnvironment

WHERE #testEnvironment.letter = 'A'

) DATA

INNER JOIN ( SELECT num,

letter

FROM #testEnvironment

WHERE #testEnvironment.letter = 'B'

) B ON B.num = DATA.num

UNION ALL

SELECT B.num

FROM ( SELECT num,

letter

FROM #testEnvironment

WHERE #testEnvironment.letter = 'A'

) DATA

INNER JOIN ( SELECT num,

letter

FROM #testEnvironment

WHERE #testEnvironment.letter = 'B'

) B ON B.num = DATA.num

) a;

INSERT INTO #performance

(

Query,

StartTime,

EndTime

)

VALUES (

'UNION ALL',

@StartTime,

GETDATE()

);

GO 50

So "#performance" will hold the timings of each query after 50 runs. I've used the batch "GO" to repeat each query because in testing, I've found that loops can affect the timing comparisons for some queries, so I now run them inside batch separators like that.

We can run the following to get a bit of a total time elapsed for 50 runs: -

SELECT a.Query,

[Total Time Taken (seconds)] = a.Total / 1000000.0,

[Average Time Taken (seconds)] = ( a.Total / 50.0 ) / 1000000.0

FROM ( SELECT Query,

Total = SUM(DATEDIFF(MICROSECOND, StartTime, EndTime))

FROM #performance

GROUP BY Query

) a;

On my box, the results are as follows: -

Query Total Time Taken (seconds) Average Time Taken (seconds)

------------------ --------------------------------------- ---------------------------------------

Cross Apply 109.038000000 2.180760000000000

Distinct Exists 10.292000000 0.205840000000000

Exists 5.277000000 0.105540000000000

Having Count 5.747000000 0.114940000000000

Intersect 4.209000000 0.084180000000000

UNION ALL 14.643000000 0.292860000000000

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Ed Wagner SSC Guru Points: 287026 More actions · Answer 6

Nice job on the testing. Thanks. I half-expected a slew of other approaches to come from the OP. 😛

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

I believe that this is relational division with remainder:

High Performance Relational Division in SQL Server [/url]

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St