Need help re-organizing query - if possible - so it is more efficient

  • PW (9/4/2008)


    >>but then the filter "D2.CORPOID IS NULL AND D.CORPOID IS NOT NULL" negates the join! The query will never return any rows!

    Nope, that's not true in this case.

    The check on D2.CORPOID IS NULL is how the query is implementing "NOT EXISTS" functionality.

    When you want to perform "NOT EXISTS" in SQL there are several options, here are 3 of them:

    1. NOT EXISTS ( subquery )

    2. NOT IN ( subquery )

    3. LEFT JOIN Table2 and check NULL on 1 of the columns in Table2

    The query is written to use method #3. The condition D2.CORPOID IS NULL evaluates to True in cases where the LEFT JOIN does not join to any matching rows.

    I'm not sure I agree.

    I tried this test.

    Created 2 tables test1 and test2.

    CREATE TABLE [dbo].[test1](

    [oid] [int] NULL,

    [corpoid] [int] NULL,

    [data] [varchar](50) NULL

    )

    CREATE TABLE [dbo].[test2](

    [oid] [int] NULL,

    [corpoid] [int] NULL,

    [data] [varchar](50) NULL

    )

    insert into test1 values (1, 10, 'a')

    insert into test1 values (2, 20, 'b')

    insert into test1 values (3, 30, 'c')

    insert into test2 values (1, NULL, 'd')

    insert into test2 values (2, NULL, 'e')

    insert into test2 values (3, 30, 'f')

    This returns no rows:

    select * from test1

    left join test2

    on test1.corpoid = test2.corpoid

    where test1.corpoid is null and test2.corpoid is not null

    This returns the following 3 rows (as expected):

    select * from test1

    left join test2

    on test1.corpoid = test2.corpoid

    oid corpoid data oid corpoid data

    ----------- ----------- -------------------------------------------------- ----------- ----------- --------------------------------------------------

    1 10 a NULL NULL NULL

    2 20 b NULL NULL NULL

    3 30 c 3 30 f

    (3 row(s) affected)

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • OK, more correctly re-stating #3:

    3. LEFT JOIN Table2 and check NULL on 1 of the non-nullable columns in Table2

    It wasn't clear that the column being joined on allows Nulls.

  • I'm sorry, you are absolutely right, this does work indeed (returns 2 rows as expected):

    select * from test1

    left join test2

    on test1.corpoid = test2.corpoid

    where test1.corpoid is NOT null and test2.corpoid is null

    My original query in the test had the "NOT" misplaced.

    I'm getting tired! :w00t:

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • I hope this settles things!

    CREATE TABLE#Sample

    (

    coID INT,

    cotID INT,

    ccbID INT,

    mcT CHAR(1)

    )

    INSERT#Sample

    SELECT1, 1, 100000, 'R' UNION ALL-- Test case 1 pass

    SELECT2, 1, 100000, 'y' UNION ALL-- Test case 2 fail

    SELECT3, 1, 100000, 'F' UNION ALL-- Test case 3 fail

    SELECT3, 2, 100000, 'y' UNION ALL-- Test case 3 fail

    SELECT4, 9, 100000, 'f' UNION ALL-- Test case 4 pass

    SELECT4, 8, 100000, 'g'-- Test case 4 pass

    -- Original query

    SELECTd.coID,

    MIN(d.cotID)

    FROM#Sample AS d

    LEFT JOIN(

    SELECTcoID

    FROM#Sample

    WHEREccbID = 100000

    AND mcT = 'Y'

    GROUP BYcoID

    ) AS d2 ON d2.coID = d.coID

    WHEREd.ccbID = 100000

    AND d2.coID IS NULL

    AND d.coID IS NOT NULL

    GROUP BYd.coID

    -- Peso

    SELECTcoID,

    MIN(cotID)

    FROM#Sample

    WHEREccbID = 100000

    AND coID IS NOT NULL

    GROUP BYcoID

    HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

    DROP TABLE#Sample

    Original query execution plan is

    |--Stream Aggregate(GROUP BY)

    |--Sort(ORDER BY)

    |--Filter

    |--Nested Loops(Left Outer Join)

    |--Table Scan

    |--Compute Scalar

    |--Stream Aggregate

    |--Table Scan

    And the query plan for my suggestion

    |--Filter

    |--Stream Aggregate(GROUP BY)

    |--Sort(ORDER BY)

    |--Compute Scalar

    |--Table Scan

    And if I also create the suggested index, this is the execution plan for the original query

    |--Stream Aggregate(GROUP BY)

    |--Filter

    |--Nested Loops(Left Outer Join)

    |--Index Seek

    |--Compute Scalar

    |--Stream Aggregate

    |--Table Scan

    And this is the execution plan for my suggestion with the suggested index in place

    |--Filter

    |--Stream Aggregate(GROUP BY)

    |--Compute Scalar

    |--Index Seek


    N 56°04'39.16"
    E 12°55'05.25"

  • And for closure, this is a comparison between all queries with and without indexes.

    CREATE TABLE#Sample_noindex

    (

    coID INT,

    cotID INT,

    ccbID INT,

    mcT CHAR(1)

    )

    INSERT#Sample_noindex

    SELECT1, 1, 100000, 'R' UNION ALL-- Test case 1 pass

    SELECT2, 1, 100000, 'y' UNION ALL-- Test case 2 fail

    SELECT3, 1, 100000, 'F' UNION ALL-- Test case 3 fail

    SELECT3, 2, 100000, 'y' UNION ALL-- Test case 3 fail

    SELECT4, 9, 100000, 'f' UNION ALL-- Test case 4 pass

    SELECT4, 8, 100000, 'g'-- Test case 4 pass

    CREATE TABLE#Sample_index

    (

    coID INT,

    cotID INT,

    ccbID INT,

    mcT CHAR(1)

    )

    INSERT#Sample_index

    SELECT1, 1, 100000, 'R' UNION ALL-- Test case 1 pass

    SELECT2, 1, 100000, 'y' UNION ALL-- Test case 2 fail

    SELECT3, 1, 100000, 'F' UNION ALL-- Test case 3 fail

    SELECT3, 2, 100000, 'y' UNION ALL-- Test case 3 fail

    SELECT4, 9, 100000, 'f' UNION ALL-- Test case 4 pass

    SELECT4, 8, 100000, 'g'-- Test case 4 pass

    CREATE NONCLUSTERED INDEX IX_tblName ON #Sample_index (ccbID, coID, cotID, mcT)

    -- Original query without index

    SELECTd.coID,

    MIN(d.cotID)

    FROM#Sample_noindex AS d

    LEFT JOIN(

    SELECTcoID

    FROM#Sample_noindex

    WHEREccbID = 100000

    AND mcT = 'Y'

    GROUP BYcoID

    ) AS d2 ON d2.coID = d.coID

    WHEREd.ccbID = 100000

    AND d2.coID IS NULL

    AND d.coID IS NOT NULL

    GROUP BYd.coID

    -- Peso without index

    SELECTcoID,

    MIN(cotID)

    FROM#Sample_noindex

    WHEREccbID = 100000

    AND coID IS NOT NULL

    GROUP BYcoID

    HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

    -- Original query with index

    SELECTd.coID,

    MIN(d.cotID)

    FROM#Sample_index AS d

    LEFT JOIN(

    SELECTcoID

    FROM#Sample_index

    WHEREccbID = 100000

    AND mcT = 'Y'

    GROUP BYcoID

    ) AS d2 ON d2.coID = d.coID

    WHEREd.ccbID = 100000

    AND d2.coID IS NULL

    AND d.coID IS NOT NULL

    GROUP BYd.coID

    -- Peso with index

    SELECTcoID,

    MIN(cotID)

    FROM#Sample_index

    WHEREccbID = 100000

    AND coID IS NOT NULL

    GROUP BYcoID

    HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

    DROP TABLE#Sample_noindex,

    #Sample_index


    N 56°04'39.16"
    E 12°55'05.25"

  • And the final test, as realistic as possible without index.

    CREATE TABLE#Original

    (

    RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    coID INT,

    cotID INT,

    ccbID INT,

    mcT CHAR(1)

    )

    INSERT#Original

    (

    coID,

    cotID,

    ccbID,

    mcT

    )

    SELECTTOP 2402039

    ABS(CHECKSUM(NEWID())) % 88589,

    ABS(CHECKSUM(NEWID())) % 93500,

    99932 + ABS(CHECKSUM(NEWID())) % 136,

    CHAR(88 + ABS(CHECKSUM(NEWID())) % 2)

    FROMsysobjects AS so1

    CROSS JOINsysobjects AS so2

    -- Original query

    SELECTd.coID,

    MIN(d.cotID)

    FROM#Original AS d

    LEFT JOIN(

    SELECTcoID

    FROM#Original

    WHEREccbID = 100000

    AND mcT = 'Y'

    GROUP BYcoID

    ) AS d2 ON d2.coID = d.coID

    WHEREd.ccbID = 100000

    AND d2.coID IS NULL

    AND d.coID IS NOT NULL

    GROUP BYd.coID

    -- Peso

    SELECTcoID,

    MIN(cotID)

    FROM#Original

    WHEREccbID = 100000

    AND coID IS NOT NULL

    GROUP BYcoID

    HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

    DROP TABLE#Original


    N 56°04'39.16"
    E 12°55'05.25"

  • And as realistic as possible with index

    CREATE TABLE#Original

    (

    RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    coID INT,

    cotID INT,

    ccbID INT,

    mcT CHAR(1)

    )

    INSERT#Original

    (

    coID,

    cotID,

    ccbID,

    mcT

    )

    SELECTTOP 2402039

    ABS(CHECKSUM(NEWID())) % 88589,

    ABS(CHECKSUM(NEWID())) % 93500,

    99932 + ABS(CHECKSUM(NEWID())) % 136,

    CHAR(88 + ABS(CHECKSUM(NEWID())) % 2)

    FROMsysobjects AS so1

    CROSS JOINsysobjects AS so2

    CREATE NONCLUSTERED INDEX IX_tblName ON #Original (ccbID, coID, cotID, mcT)

    -- Original query

    SELECTd.coID,

    MIN(d.cotID)

    FROM#Original AS d

    LEFT JOIN(

    SELECTcoID

    FROM#Original

    WHEREccbID = 100000

    AND mcT = 'Y'

    GROUP BYcoID

    ) AS d2 ON d2.coID = d.coID

    WHEREd.ccbID = 100000

    AND d2.coID IS NULL

    AND d.coID IS NOT NULL

    GROUP BYd.coID

    -- Peso

    SELECTcoID,

    MIN(cotID)

    FROM#Original

    WHEREccbID = 100000

    AND coID IS NOT NULL

    GROUP BYcoID

    HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

    DROP TABLE#Original


    N 56°04'39.16"
    E 12°55'05.25"

  • Thank you for all this work!

    I will talk to the developer about changing the query accordingly.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Thank you for the feedback.

    It's nice to see the query went from about 800 ms / 15500 reads down to 15 ms / 56 reads.


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso (9/5/2008)


    Thank you for the feedback.

    It's nice to see the query went from about 800 ms / 15500 reads down to 15 ms / 56 reads.

    The only drawback I see with your solution - and don't take this the wrong way 😉 - is that it is not as readable/understandable as the original query.

    I don't think I understand that last line fully:

    HAVING MAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • What the query does, is that it select all records where coID is not NULL and where ccbID = 100000.

    For this set of records, the query then group them over coID.

    The HAVING PART then checks if any of the records for every group has a mcT = 'Y'.

    If so, the query translate it to a 1, all other mcT values get a 0.

    To be a valid group, no record must have a mcT value of 'Y', ie 1.

    So max for any group need to be 0, ie no record for any group has a mcT value of 'Y'. Which is exactly what the LEFT JOIN is doing.

    However, if the two only possible values for mCT are 'Y' and 'N', you can simplify the HAVING part as

    HAVING MAX(mCT) = 'N'


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso (9/5/2008)


    What the query does, is that it select all records where coID is not NULL and where ccbID = 100000.

    For this set of records, the query then group them over coID.

    The HAVING PART then checks if any of the records for every group has a mcT = 'Y'.

    If so, the query translate it to a 1, all other mcT values get a 0.

    To be a valid group, no record must have a mcT value of 'Y', ie 1.

    So max for any group need to be 0, ie no record for any group has a mcT value of 'Y'. Which is exactly what the LEFT JOIN is doing.

    However, if the two only possible values for mCT are 'Y' and 'N', you can simplify the HAVING part as

    HAVING MAX(mCT) = 'N'

    I see, it's clearer for me now. Thanks again!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Heh... if YOU had a question on what it does, others will to. Time to break out the grossly underused "dash dash" function. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hi

  • Jeff Moden (9/6/2008)


    Heh... if YOU had a question on what it does, others will to. Time to break out the grossly underused "dash dash" function. 😉

    I still don't fully get it, it's a tough one... 😉

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply