Need help re-organizing query - if possible - so it is more efficient

Question

Need help re-organizing query - if possible - so it is more efficient

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 1

PW (9/4/2008)
>>but then the filter "D2.CORPOID IS NULL AND D.CORPOID IS NOT NULL" negates the join! The query will never return any rows!
Nope, that's not true in this case.
The check on D2.CORPOID IS NULL is how the query is implementing "NOT EXISTS" functionality.
When you want to perform "NOT EXISTS" in SQL there are several options, here are 3 of them:
1. NOT EXISTS ( subquery )
2. NOT IN ( subquery )
3. LEFT JOIN Table2 and check NULL on 1 of the columns in Table2
The query is written to use method #3. The condition D2.CORPOID IS NULL evaluates to True in cases where the LEFT JOIN does not join to any matching rows.

I'm not sure I agree.

I tried this test.

Created 2 tables test1 and test2.

CREATE TABLE [dbo].[test1](
[oid] [int] NULL,
[corpoid] [int] NULL,
[data] [varchar](50) NULL
)
CREATE TABLE [dbo].[test2](
[oid] [int] NULL,
[corpoid] [int] NULL,
[data] [varchar](50) NULL
)
insert into test1 values (1, 10, 'a')
insert into test1 values (2, 20, 'b')
insert into test1 values (3, 30, 'c')
insert into test2 values (1, NULL, 'd')
insert into test2 values (2, NULL, 'e')
insert into test2 values (3, 30, 'f')

This returns no rows:

select * from test1
left join test2
on test1.corpoid = test2.corpoid
where test1.corpoid is null and test2.corpoid is not null

This returns the following 3 rows (as expected):

select * from test1
left join test2
on test1.corpoid = test2.corpoid

oid corpoid data oid corpoid data
----------- ----------- -------------------------------------------------- ----------- ----------- --------------------------------------------------
1 10 a NULL NULL NULL
2 20 b NULL NULL NULL
3 30 c 3 30 f
(3 row(s) affected)

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

PW-201837 SSC-Insane Points: 20805 More actions · Answer 2

OK, more correctly re-stating #3:

3. LEFT JOIN Table2 and check NULL on 1 of the non-nullable columns in Table2

It wasn't clear that the column being joined on allows Nulls.

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 3

I'm sorry, you are absolutely right, this does work indeed (returns 2 rows as expected):

select * from test1
left join test2
on test1.corpoid = test2.corpoid
where test1.corpoid is NOT null and test2.corpoid is null

My original query in the test had the "NOT" misplaced.

I'm getting tired! :w00t:

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 4

I hope this settles things!

CREATE TABLE#Sample

(

coID INT,

cotID INT,

ccbID INT,

mcT CHAR(1)

)

INSERT#Sample

SELECT1, 1, 100000, 'R' UNION ALL-- Test case 1 pass

SELECT2, 1, 100000, 'y' UNION ALL-- Test case 2 fail

SELECT3, 1, 100000, 'F' UNION ALL-- Test case 3 fail

SELECT3, 2, 100000, 'y' UNION ALL-- Test case 3 fail

SELECT4, 9, 100000, 'f' UNION ALL-- Test case 4 pass

SELECT4, 8, 100000, 'g'-- Test case 4 pass

-- Original query

SELECTd.coID,

MIN(d.cotID)

FROM#Sample AS d

LEFT JOIN(

SELECTcoID

FROM#Sample

WHEREccbID = 100000

AND mcT = 'Y'

GROUP BYcoID

) AS d2 ON d2.coID = d.coID

WHEREd.ccbID = 100000

AND d2.coID IS NULL

AND d.coID IS NOT NULL

GROUP BYd.coID

-- Peso

SELECTcoID,

MIN(cotID)

FROM#Sample

WHEREccbID = 100000

AND coID IS NOT NULL

GROUP BYcoID

HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

DROP TABLE#Sample

Original query execution plan is

|--Stream Aggregate(GROUP BY)

|--Sort(ORDER BY)

|--Filter

|--Nested Loops(Left Outer Join)

|--Table Scan

|--Compute Scalar

|--Stream Aggregate

|--Table Scan

And the query plan for my suggestion

|--Filter

|--Stream Aggregate(GROUP BY)

|--Sort(ORDER BY)

|--Compute Scalar

|--Table Scan

And if I also create the suggested index, this is the execution plan for the original query

|--Stream Aggregate(GROUP BY)

|--Filter

|--Nested Loops(Left Outer Join)

|--Index Seek

|--Compute Scalar

|--Stream Aggregate

|--Table Scan

And this is the execution plan for my suggestion with the suggested index in place

|--Filter

|--Stream Aggregate(GROUP BY)

|--Compute Scalar

|--Index Seek

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 5

And for closure, this is a comparison between all queries with and without indexes.

CREATE TABLE#Sample_noindex

(

coID INT,

cotID INT,

ccbID INT,

mcT CHAR(1)

)

INSERT#Sample_noindex

SELECT1, 1, 100000, 'R' UNION ALL-- Test case 1 pass

SELECT2, 1, 100000, 'y' UNION ALL-- Test case 2 fail

SELECT3, 1, 100000, 'F' UNION ALL-- Test case 3 fail

SELECT3, 2, 100000, 'y' UNION ALL-- Test case 3 fail

SELECT4, 9, 100000, 'f' UNION ALL-- Test case 4 pass

SELECT4, 8, 100000, 'g'-- Test case 4 pass

CREATE TABLE#Sample_index

(

coID INT,

cotID INT,

ccbID INT,

mcT CHAR(1)

)

INSERT#Sample_index

SELECT1, 1, 100000, 'R' UNION ALL-- Test case 1 pass

SELECT2, 1, 100000, 'y' UNION ALL-- Test case 2 fail

SELECT3, 1, 100000, 'F' UNION ALL-- Test case 3 fail

SELECT3, 2, 100000, 'y' UNION ALL-- Test case 3 fail

SELECT4, 9, 100000, 'f' UNION ALL-- Test case 4 pass

SELECT4, 8, 100000, 'g'-- Test case 4 pass

CREATE NONCLUSTERED INDEX IX_tblName ON #Sample_index (ccbID, coID, cotID, mcT)

-- Original query without index

SELECTd.coID,

MIN(d.cotID)

FROM#Sample_noindex AS d

LEFT JOIN(

SELECTcoID

FROM#Sample_noindex

WHEREccbID = 100000

AND mcT = 'Y'

GROUP BYcoID

) AS d2 ON d2.coID = d.coID

WHEREd.ccbID = 100000

AND d2.coID IS NULL

AND d.coID IS NOT NULL

GROUP BYd.coID

-- Peso without index

SELECTcoID,

MIN(cotID)

FROM#Sample_noindex

WHEREccbID = 100000

AND coID IS NOT NULL

GROUP BYcoID

HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

-- Original query with index

SELECTd.coID,

MIN(d.cotID)

FROM#Sample_index AS d

LEFT JOIN(

SELECTcoID

FROM#Sample_index

WHEREccbID = 100000

AND mcT = 'Y'

GROUP BYcoID

) AS d2 ON d2.coID = d.coID

WHEREd.ccbID = 100000

AND d2.coID IS NULL

AND d.coID IS NOT NULL

GROUP BYd.coID

-- Peso with index

SELECTcoID,

MIN(cotID)

FROM#Sample_index

WHEREccbID = 100000

AND coID IS NOT NULL

GROUP BYcoID

HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

DROP TABLE#Sample_noindex,

#Sample_index

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 6

And the final test, as realistic as possible without index.

CREATE TABLE#Original

(

RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

coID INT,

cotID INT,

ccbID INT,

mcT CHAR(1)

)

INSERT#Original

(

coID,

cotID,

ccbID,

mcT

)

SELECTTOP 2402039

ABS(CHECKSUM(NEWID())) % 88589,

ABS(CHECKSUM(NEWID())) % 93500,

99932 + ABS(CHECKSUM(NEWID())) % 136,

CHAR(88 + ABS(CHECKSUM(NEWID())) % 2)

FROMsysobjects AS so1

CROSS JOINsysobjects AS so2

-- Original query

SELECTd.coID,

MIN(d.cotID)

FROM#Original AS d

LEFT JOIN(

SELECTcoID

FROM#Original

WHEREccbID = 100000

AND mcT = 'Y'

GROUP BYcoID

) AS d2 ON d2.coID = d.coID

WHEREd.ccbID = 100000

AND d2.coID IS NULL

AND d.coID IS NOT NULL

GROUP BYd.coID

-- Peso

SELECTcoID,

MIN(cotID)

FROM#Original

WHEREccbID = 100000

AND coID IS NOT NULL

GROUP BYcoID

HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

DROP TABLE#Original

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 7

And as realistic as possible with index

CREATE TABLE#Original

(

RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

coID INT,

cotID INT,

ccbID INT,

mcT CHAR(1)

)

INSERT#Original

(

coID,

cotID,

ccbID,

mcT

)

SELECTTOP 2402039

ABS(CHECKSUM(NEWID())) % 88589,

ABS(CHECKSUM(NEWID())) % 93500,

99932 + ABS(CHECKSUM(NEWID())) % 136,

CHAR(88 + ABS(CHECKSUM(NEWID())) % 2)

FROMsysobjects AS so1

CROSS JOINsysobjects AS so2

CREATE NONCLUSTERED INDEX IX_tblName ON #Original (ccbID, coID, cotID, mcT)

-- Original query

SELECTd.coID,

MIN(d.cotID)

FROM#Original AS d

LEFT JOIN(

SELECTcoID

FROM#Original

WHEREccbID = 100000

AND mcT = 'Y'

GROUP BYcoID

) AS d2 ON d2.coID = d.coID

WHEREd.ccbID = 100000

AND d2.coID IS NULL

AND d.coID IS NOT NULL

GROUP BYd.coID

-- Peso

SELECTcoID,

MIN(cotID)

FROM#Original

WHEREccbID = 100000

AND coID IS NOT NULL

GROUP BYcoID

HAVINGMAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

DROP TABLE#Original

N 56°04'39.16"
E 12°55'05.25"

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 8

Thank you for all this work!

I will talk to the developer about changing the query accordingly.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 9

Thank you for the feedback.

It's nice to see the query went from about 800 ms / 15500 reads down to 15 ms / 56 reads.

N 56°04'39.16"
E 12°55'05.25"

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 10

Peso (9/5/2008)
Thank you for the feedback.
It's nice to see the query went from about 800 ms / 15500 reads down to 15 ms / 56 reads.

The only drawback I see with your solution - and don't take this the wrong way 😉 - is that it is not as readable/understandable as the original query.

I don't think I understand that last line fully:

HAVING MAX(CASE WHEN mcT = 'Y' THEN 1 ELSE 0 END) = 0

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 11

What the query does, is that it select all records where coID is not NULL and where ccbID = 100000.

For this set of records, the query then group them over coID.

The HAVING PART then checks if any of the records for every group has a mcT = 'Y'.

If so, the query translate it to a 1, all other mcT values get a 0.

To be a valid group, no record must have a mcT value of 'Y', ie 1.

So max for any group need to be 0, ie no record for any group has a mcT value of 'Y'. Which is exactly what the LEFT JOIN is doing.

However, if the two only possible values for mCT are 'Y' and 'N', you can simplify the HAVING part as

HAVING MAX(mCT) = 'N'

N 56°04'39.16"
E 12°55'05.25"

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 12

Peso (9/5/2008)
What the query does, is that it select all records where coID is not NULL and where ccbID = 100000.
For this set of records, the query then group them over coID.
The HAVING PART then checks if any of the records for every group has a mcT = 'Y'.
If so, the query translate it to a 1, all other mcT values get a 0.
To be a valid group, no record must have a mcT value of 'Y', ie 1.
So max for any group need to be 0, ie no record for any group has a mcT value of 'Y'. Which is exactly what the LEFT JOIN is doing.
However, if the two only possible values for mCT are 'Y' and 'N', you can simplify the HAVING part as
HAVING MAX(mCT) = 'N'

I see, it's clearer for me now. Thanks again!

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 13

Heh... if YOU had a question on what it does, others will to. Time to break out the grossly underused "dash dash" function. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sai_kumar554 Valued Member Points: 62 More actions · Answer 14

sai_kumar554

Valued Member

Points: 62

September 7, 2008 at 5:01 am

#868213

hi

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 15

Jeff Moden (9/6/2008)
Heh... if YOU had a question on what it does, others will to. Time to break out the grossly underused "dash dash" function. 😉

I still don't fully get it, it's a tough one... 😉

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]