Brain freeze.

  • Right I'm pulling blank today.

    Number | Letter

    1 A

    1 B

    2 A

    3 A

    3 B

    4 A

    4 C

    5 A

    I'm trying to retrieve all of the numbers that have the letter A AND B

    For the above I'd be aiming for the result of number 1 and number 3.

    Anyone point me in the right direction of achieving this.

    Thanks

  • LittleDuck (3/26/2015)


    Right I'm pulling blank today.

    Number | Letter

    1 A

    1 B

    2 A

    3 A

    3 B

    4 A

    4 C

    5 A

    I'm trying to retrieve all of the numbers that have the letter A AND B

    For the above I'd be aiming for the result of number 1 and number 3.

    Anyone point me in the right direction of achieving this.

    Thanks

    Here's some DDL to create and populate a temp table:

    if object_id('tempdb.dbo.#letters', 'u') is not null drop table #letters;

    create table #letters (

    num integer,

    letter char(1));

    insert into #letters(num, letter)

    values(1, 'A'),

    (1, 'B'),

    (2, 'A'),

    (3, 'A'),

    (3, 'B'),

    (4, 'A'),

    (4, 'C'),

    (5, 'A');

    To pull the results you want, you want the results of your two queries where they intersect:

    select num

    from #letters

    where letter = 'A'

    intersect

    select num

    from #letters

    where letter = 'b';

    There are probably several ways to do this. This is just one of them.

  • Ah cool. My brain came alive and I came up with a couple of ways to achieve it.

    Damn me for only have 4 hours sleep last night.

    Cheers for the quick reply.

    SELECT DISTINCT Number, letter

    FROM Testing

    WHERE EXISTS (SELECT 1

    FROM Testing t WHERE Letter = 'A'

    AND Testing.Number = t.Number) AND EXISTS (SELECT 1 FROM

    Testing t WHERE Letter = 'B' AND Testing.Number = t.Number) AND Letter IN ('A', 'B')

    And a messier one

    SELECT DATA.Number, DATA.Letter FROM (SELECT Number, Letter

    FROM [Testing]

    WHERE Testing.Letter = 'A') DATA

    INNER JOIN (SELECT Number, Letter

    FROM [Testing]

    WHERE Testing.Letter = 'B') B ON B.Number = DATA.Number

    UNION ALL

    SELECT B.Number, B.Letter FROM (SELECT Number, Letter

    FROM [Testing]

    WHERE Testing.Letter = 'A') DATA

    INNER JOIN (SELECT Number, Letter

    FROM [Testing]

    WHERE Testing.Letter = 'B') B ON B.Number = DATA.Number

  • I know this is just test data, but take a look at the query plans for all of them.

  • Just another way to get this result.

    SELECT num

    FROM #letters

    WHERE letter IN('A', 'B')

    GROUP BY num

    HAVING COUNT( DISTINCT letter) = 2

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • With no index, as in the sample data, here's a couple more options to add to your own, Ed's and Luis' :w00t:

    SELECT a.num

    FROM #letters a

    CROSS APPLY ( SELECT TOP 1

    1

    FROM #letters b

    WHERE a.num = b.num

    AND b.letter = 'B'

    ) ca ( X )

    WHERE a.letter = 'A';

    SELECT a.num

    FROM #letters a

    WHERE EXISTS ( SELECT 1

    FROM #letters b

    WHERE a.num = b.num

    AND b.letter = 'B' )

    AND a.letter = 'A';

    I thought we could do some performance testing, over 1,000,000 rows and with no indexes.

    Test Environment Setup: -

    IF OBJECT_ID('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    SELECT TOP 1000000

    ( ABS(CHECKSUM(NEWID())) % 100000 ) + 1 AS num,

    CHAR(( ( ABS(CHECKSUM(NEWID())) % 3 ) + 1 ) + 64) AS letter

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1

    CROSS JOIN master.dbo.syscolumns sc2

    CROSS JOIN master.dbo.syscolumns sc3;

    So that's #testEnvironment populated with some semi-random data. Now lets' create a harness to run the queries in and record the timings: -

    IF OBJECT_ID('tempdb..#performance') IS NOT NULL

    BEGIN

    DROP TABLE #performance;

    END;

    CREATE TABLE #performance

    (

    Query VARCHAR(100),

    StartTime DATETIME,

    EndTime DATETIME

    );

    GO

    DECLARE @StartTime DATETIME,

    @Holder INT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SET @StartTime = GETDATE();

    SELECT @Holder = num

    FROM ( SELECT num

    FROM #testEnvironment

    WHERE letter = 'A'

    INTERSECT

    SELECT num

    FROM #testEnvironment

    WHERE letter = 'B'

    ) a;

    INSERT INTO #performance

    ( Query, StartTime, EndTime )

    VALUES ( 'Intersect', @StartTime, GETDATE() );

    GO 50

    DECLARE @StartTime DATETIME,

    @Holder INT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SET @StartTime = GETDATE();

    SELECT @Holder = num

    FROM #testEnvironment

    WHERE letter IN ( 'A', 'B' )

    GROUP BY num

    HAVING COUNT(DISTINCT letter) = 2;

    INSERT INTO #performance

    (

    Query,

    StartTime,

    EndTime

    )

    VALUES (

    'Having Count',

    @StartTime,

    GETDATE()

    );

    GO 50

    DECLARE @StartTime DATETIME,

    @Holder INT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SET @StartTime = GETDATE();

    SELECT @Holder = a.num

    FROM #testEnvironment a

    CROSS APPLY ( SELECT TOP 1

    1

    FROM #testEnvironment b

    WHERE a.num = b.num

    AND b.letter = 'B'

    ) ca ( X )

    WHERE a.letter = 'A';

    INSERT INTO #performance

    (

    Query,

    StartTime,

    EndTime

    )

    VALUES (

    'Cross Apply',

    @StartTime,

    GETDATE()

    );

    GO 50

    DECLARE @StartTime DATETIME,

    @Holder INT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SET @StartTime = GETDATE();

    SELECT @Holder = a.num

    FROM #testEnvironment a

    WHERE EXISTS ( SELECT 1

    FROM #testEnvironment b

    WHERE a.num = b.num

    AND b.letter = 'B' )

    AND a.letter = 'A';

    INSERT INTO #performance

    ( Query, StartTime, EndTime )

    VALUES ( 'Exists', @StartTime, GETDATE() );

    GO 50

    DECLARE @StartTime DATETIME,

    @Holder INT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SET @StartTime = GETDATE();

    SELECT @Holder = num

    FROM ( SELECT DISTINCT

    num,

    letter

    FROM #testEnvironment

    WHERE EXISTS ( SELECT 1

    FROM #testEnvironment t

    WHERE letter = 'A'

    AND #testEnvironment.num = t.num )

    AND EXISTS ( SELECT 1

    FROM #testEnvironment t

    WHERE letter = 'B'

    AND #testEnvironment.num = t.num )

    AND letter IN ( 'A', 'B' )

    ) a;

    INSERT INTO #performance

    (

    Query,

    StartTime,

    EndTime

    )

    VALUES (

    'Distinct Exists',

    @StartTime,

    GETDATE()

    );

    GO 50

    DECLARE @StartTime DATETIME,

    @Holder INT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SET @StartTime = GETDATE();

    SELECT @Holder = num

    FROM ( SELECT DATA.num

    FROM ( SELECT num,

    letter

    FROM #testEnvironment

    WHERE #testEnvironment.letter = 'A'

    ) DATA

    INNER JOIN ( SELECT num,

    letter

    FROM #testEnvironment

    WHERE #testEnvironment.letter = 'B'

    ) B ON B.num = DATA.num

    UNION ALL

    SELECT B.num

    FROM ( SELECT num,

    letter

    FROM #testEnvironment

    WHERE #testEnvironment.letter = 'A'

    ) DATA

    INNER JOIN ( SELECT num,

    letter

    FROM #testEnvironment

    WHERE #testEnvironment.letter = 'B'

    ) B ON B.num = DATA.num

    ) a;

    INSERT INTO #performance

    (

    Query,

    StartTime,

    EndTime

    )

    VALUES (

    'UNION ALL',

    @StartTime,

    GETDATE()

    );

    GO 50

    So "#performance" will hold the timings of each query after 50 runs. I've used the batch "GO" to repeat each query because in testing, I've found that loops can affect the timing comparisons for some queries, so I now run them inside batch separators like that.

    We can run the following to get a bit of a total time elapsed for 50 runs: -

    SELECT a.Query,

    [Total Time Taken (seconds)] = a.Total / 1000000.0,

    [Average Time Taken (seconds)] = ( a.Total / 50.0 ) / 1000000.0

    FROM ( SELECT Query,

    Total = SUM(DATEDIFF(MICROSECOND, StartTime, EndTime))

    FROM #performance

    GROUP BY Query

    ) a;

    On my box, the results are as follows: -

    Query Total Time Taken (seconds) Average Time Taken (seconds)

    ------------------ --------------------------------------- ---------------------------------------

    Cross Apply 109.038000000 2.180760000000000

    Distinct Exists 10.292000000 0.205840000000000

    Exists 5.277000000 0.105540000000000

    Having Count 5.747000000 0.114940000000000

    Intersect 4.209000000 0.084180000000000

    UNION ALL 14.643000000 0.292860000000000


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Nice job on the testing. Thanks. I half-expected a slew of other approaches to come from the OP. 😛

  • I believe that this is relational division with remainder:

    High Performance Relational Division in SQL Server [/url]


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply