Check that a specific Group By condition does not exist

  • Lynn Pettis (9/7/2012)


    Since all you want is the TemplateName where BaseStage is 1 in at least 1 record in a group, the following works:

    That is not quite what I want.

    I want the TemplateName where BaseStage is 1 in at least 1 record in EVERY group (that relates to a template).

    So, I don't want Template 1 because, in the first group of three records (for Template 1), BaseStage is zero in all of them. Notwithstanding that BaseStage is 1 for the second group of three records (for Template 1). This doesn't matter. If any group of records for a template does not have a BaseStage of 1 - the Template is not complete so I don't want it returned.

    Thanks again.

  • Lynn Pettis (9/7/2012)


    So what you are saying is that BaseStage must be 1 in at least one record of BOTH groups, correct?

    Correct.

  • Try this:

    CREATE TABLE #tblTemplates

    (

    TemplateID int,

    TemplateName varchar(50)

    )

    GO

    INSERT INTO #tblTemplates (TemplateID, TemplateName)

    SELECT 1, 'Template 1' UNION ALL

    SELECT 2, 'Template 2'

    GO

    CREATE TABLE #tblBlocks

    (

    TemplateID int,

    BlockID int,

    FieldID int,

    BaseStage bit

    )

    GO

    INSERT INTO #tblBlocks (TemplateID, BlockID, FieldID, BaseStage)

    SELECT 1, 1, 1, 0 UNION ALL

    SELECT 1, 1, 2, 0 UNION ALL

    SELECT 1, 1, 3, 0 UNION ALL

    SELECT 1, 2, 4, 0 UNION ALL

    SELECT 1, 2, 5, 1 UNION ALL

    SELECT 1, 2, 6, 0 UNION ALL

    SELECT 2, 1, 7, 1 UNION ALL

    SELECT 2, 1, 8, 0 UNION ALL

    SELECT 2, 1, 9, 0 UNION ALL

    SELECT 2, 2, 10, 1 UNION ALL

    SELECT 2, 2, 11, 0 UNION ALL

    SELECT 2, 2, 12, 0

    GO

    WITH BaseBlocks AS (

    SELECT

    tb.TemplateID,

    tb.BlockID,

    MAX(CAST(tb.BaseStage AS INT)) BaseStage

    FROM

    #tblBlocks tb

    GROUP BY

    tb.TemplateID,

    tb.BlockID

    )

    SELECT

    *

    FROM

    #tblTemplates t1

    WHERE

    NOT EXISTS (SELECT 1 FROM BaseBlocks tb1 WHERE tb1.TemplateID = t1.TemplateID AND tb1.BaseStage = 0);

    go

    DROP TABLE #tblBlocks

    DROP TABLE #tblTemplates

    GO

  • SELECT TemplateID

    FROM #tblBlocks

    GROUP BY TemplateID

    HAVING COUNT(DISTINCT CASE WHEN #tblBlocks.BaseStage = 1 THEN BlockID END) = COUNT(DISTINCT BlockID)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (9/7/2012)


    SELECT TemplateID

    FROM #tblBlocks

    GROUP BY TemplateID

    HAVING COUNT(DISTINCT CASE WHEN #tblBlocks.BaseStage = 1 THEN BlockID END) = COUNT(DISTINCT BlockID)

    That does the job. Thanks very much.

  • That does the job too. Again, thanks very much.

    Just wish I understood both the solutions that work.

  • You may want to compare the execution plans between the two solutions provided:

    CREATE TABLE #tblTemplates

    (

    TemplateID int,

    TemplateName varchar(50)

    )

    GO

    INSERT INTO #tblTemplates (TemplateID, TemplateName)

    SELECT 1, 'Template 1' UNION ALL

    SELECT 2, 'Template 2'

    GO

    CREATE TABLE #tblBlocks

    (

    TemplateID int,

    BlockID int,

    FieldID int,

    BaseStage bit

    )

    GO

    INSERT INTO #tblBlocks (TemplateID, BlockID, FieldID, BaseStage)

    SELECT 1, 1, 1, 0 UNION ALL

    SELECT 1, 1, 2, 0 UNION ALL

    SELECT 1, 1, 3, 0 UNION ALL

    SELECT 1, 2, 4, 0 UNION ALL

    SELECT 1, 2, 5, 1 UNION ALL

    SELECT 1, 2, 6, 0 UNION ALL

    SELECT 2, 1, 7, 1 UNION ALL

    SELECT 2, 1, 8, 0 UNION ALL

    SELECT 2, 1, 9, 0 UNION ALL

    SELECT 2, 2, 10, 1 UNION ALL

    SELECT 2, 2, 11, 0 UNION ALL

    SELECT 2, 2, 12, 0

    GO

    WITH BaseBlocks AS (

    SELECT

    tb.TemplateID,

    tb.BlockID,

    MAX(CAST(tb.BaseStage AS INT)) BaseStage

    FROM

    #tblBlocks tb

    GROUP BY

    tb.TemplateID,

    tb.BlockID

    )

    SELECT

    *

    FROM

    #tblTemplates t1

    WHERE

    NOT EXISTS (SELECT 1 FROM BaseBlocks tb1 WHERE tb1.TemplateID = t1.TemplateID AND tb1.BaseStage = 0);

    go

    SELECT TemplateID

    FROM #tblBlocks

    GROUP BY TemplateID

    HAVING COUNT(DISTINCT CASE WHEN #tblBlocks.BaseStage = 1 THEN BlockID END) = COUNT(DISTINCT BlockID)

    go

    DROP TABLE #tblBlocks

    DROP TABLE #tblTemplates

    GO

  • Lynn Pettis (9/7/2012)


    You may want to compare the execution plans between the two solutions provided:

    CREATE TABLE #tblTemplates

    (

    TemplateID int,

    TemplateName varchar(50)

    )

    GO

    INSERT INTO #tblTemplates (TemplateID, TemplateName)

    SELECT 1, 'Template 1' UNION ALL

    SELECT 2, 'Template 2'

    GO

    CREATE TABLE #tblBlocks

    (

    TemplateID int,

    BlockID int,

    FieldID int,

    BaseStage bit

    )

    GO

    INSERT INTO #tblBlocks (TemplateID, BlockID, FieldID, BaseStage)

    SELECT 1, 1, 1, 0 UNION ALL

    SELECT 1, 1, 2, 0 UNION ALL

    SELECT 1, 1, 3, 0 UNION ALL

    SELECT 1, 2, 4, 0 UNION ALL

    SELECT 1, 2, 5, 1 UNION ALL

    SELECT 1, 2, 6, 0 UNION ALL

    SELECT 2, 1, 7, 1 UNION ALL

    SELECT 2, 1, 8, 0 UNION ALL

    SELECT 2, 1, 9, 0 UNION ALL

    SELECT 2, 2, 10, 1 UNION ALL

    SELECT 2, 2, 11, 0 UNION ALL

    SELECT 2, 2, 12, 0

    GO

    WITH BaseBlocks AS (

    SELECT

    tb.TemplateID,

    tb.BlockID,

    MAX(CAST(tb.BaseStage AS INT)) BaseStage

    FROM

    #tblBlocks tb

    GROUP BY

    tb.TemplateID,

    tb.BlockID

    )

    SELECT

    *

    FROM

    #tblTemplates t1

    WHERE

    NOT EXISTS (SELECT 1 FROM BaseBlocks tb1 WHERE tb1.TemplateID = t1.TemplateID AND tb1.BaseStage = 0);

    go

    SELECT TemplateID

    FROM #tblBlocks

    GROUP BY TemplateID

    HAVING COUNT(DISTINCT CASE WHEN #tblBlocks.BaseStage = 1 THEN BlockID END) = COUNT(DISTINCT BlockID)

    go

    DROP TABLE #tblBlocks

    DROP TABLE #tblTemplates

    GO

    If you're concerned with performance, then do this:

    SELECT TemplateID

    FROM (

    SELECT TemplateID, BlockID,

    MAX(CASE WHEN BaseStage = 1 THEN 1 ELSE 0 END) AS BackStage_Found

    --MAX(CAST(BaseStage AS tinyint)) AS BackStage_Found --or this, if your col is a bit

    FROM #tblBlocks

    GROUP BY TemplateID, BlockID

    ) AS derived

    GROUP BY TemplateID

    HAVING MIN(BackStage_Found) = 1

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • The real test will be against a 1,000,000 (or so) table. I will try and set up such a test tonight at home. One of the things I noticed between the first two solutions was the number of table scans. Have a large enough table, and I think having 2 scans against the same large table is going to be slower. I may be wrong and will admit it if my testing proves otherwise, but then that is the reason for testing isn't it.

    What may appear logical isn't always that way.

  • I do like the latest version. Looks like it should win the 1,000,000 row test.

  • Lynn Pettis (9/7/2012)


    The real test will be against a 1,000,000 (or so) table. I will try and set up such a test tonight at home. One of the things I noticed between the first two solutions was the number of table scans. Have a large enough table, and I think having 2 scans against the same large table is going to be slower. I may be wrong and will admit it if my testing proves otherwise, but then that is the reason for testing isn't it.

    What may appear logical isn't always that way.

    Yes, it's two table scans but it avoids the loop join. It will be interesting, albeit that I didn't write the first query with a concern for performance but for SQL compactness / understandability / "self-describedness", since I didn't figure there would be enough TemplateIds to cause a performance issue anyway :-).

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks again guys.

    As a matter of interest - the real world scenario is a maximum of 5 templates.

    Each template has 15 to 20 blocks.

    Each block has anything from 1 to 30 fields.

    A template is not useable unless every block has a StageBase set ... so the real world requirement is to look at 5 templates and only return the ones where every block within the template has a BaseStage set.

    Cheers again.

  • Lynn Pettis (9/7/2012)


    The real test will be against a 1,000,000 (or so) table. I will try and set up such a test tonight at home. One of the things I noticed between the first two solutions was the number of table scans. Have a large enough table, and I think having 2 scans against the same large table is going to be slower. I may be wrong and will admit it if my testing proves otherwise, but then that is the reason for testing isn't it.

    What may appear logical isn't always that way.

    So, the 1M rows testing, how did it go?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (9/10/2012)


    Lynn Pettis (9/7/2012)


    The real test will be against a 1,000,000 (or so) table. I will try and set up such a test tonight at home. One of the things I noticed between the first two solutions was the number of table scans. Have a large enough table, and I think having 2 scans against the same large table is going to be slower. I may be wrong and will admit it if my testing proves otherwise, but then that is the reason for testing isn't it.

    What may appear logical isn't always that way.

    So, the 1M rows testing, how did it go?

    Not enough hours in the day this weekend. I will have to work over the couple of days. Hit me up again on Wednesday for an update.

  • Just a quick test using 6 million rows:

    Lynn's method:

    Elapsed Time (ms) 8276

    ScottPletcher's first method:

    Warning: Null value is eliminated by an aggregate or other SET operation.

    Elapsed Time (ms) 12244

    ScottPletcher's second method:

    Elapsed Time (ms) 7464

    I ran this several times with very similar results. The third method on 6 million rows is about 1 second faster.

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply