Alternate Way to Write query: ISNULL function causing issues

Question

Alternate Way to Write query: ISNULL function causing issues

chandan_jha18

SSChampion

Points: 11330
More actions
June 6, 2016 at 9:18 am

#319355

Dear All,
Our dev has given some deployment and it has a query that I could point to which looks non-sargeable. Here is a sample query:
select top 10 pm.pdenddate from GlobalPD.PDDataTimeSeriesPE_tmp_01 pt
inner join pddata.Globalpd.PdDataMarketSignals_Recal_tbl pm(nolock)
on pm.pddataid=pt.pddataid
and pm.asofdate=pt.asofdate
and isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')
where pm.asofdate>='2000-01-01';
One of the tables in question are fairly big(400 GB) and another is relatively small,at 500 MB. Earlier both were plain heaps, I created some indexes on them but I suspect that the ISNULL function used above renders them almost useless. Is there away I can advise them to re-write this and not break the logic at the same time. Attached are table schemas in question..
Regards
Chandan Jha

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 1

Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.

😎

drew.allen SSC Guru Points: 76996 More actions · Answer 2

Eirikur Eiriksson (6/6/2016)
Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.
😎

It's not in this case. If the pm.pdenddate is NULL and the pt.pdenddate is not NULL or '1900-01-01' (or vice versa) this expression will evaluate to TRUE whereas removing the ISNULLs it will evaluate to UNKNOWN.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

chandan_jha18 SSChampion Points: 11330 More actions · Answer 3

So is there no way out of this mess? Is it possible to have a new table with same structure with a computed column where we do a bulk load and let this computed column take a value based on the incoming column.

I remember reading it somewhere here in one of the forum questions that NULL has no meaning as such and to avoid issues when loading such a table from some source, NULLs should be converted to an year like 1900 or something like that so that when this value is encountered, we know that the value was unknown for this.

Thanks

Chandan

drew.allen SSC Guru Points: 76996 More actions · Answer 4

chandan_jha18 (6/6/2016)
So is there no way out of this mess? Is it possible to have a new table with same structure with a computed column where we do a bulk load and let this computed column take a value based on the incoming column.
I remember reading it somewhere here in one of the forum questions that NULL has no meaning as such and to avoid issues when loading such a table from some source, NULLs should be converted to an year like 1900 or something like that so that when this value is encountered, we know that the value was unknown for this.
Thanks
Chandan

Unfortunately, that advice is based on a gross oversimplification. It does make sense to assign default values in SOME cases, but not ALL cases. It makes sense to use 1900-01-01 in SOME cases, but not ALL cases. Specifically, it makes sense to use 1900-01-01 for START dates as long as that start date is the earliest possible date in your database. If your data contains records with start dates before 1900-01-01, then you should use some other value as your start date.

On the other hand, you should NEVER use 1900-01-01 for END dates. The default end date should always be the latest possible date in your database. Suggested values include 9999-12-30, 9999-01-01, and 9000-01-01, but any date will work as long as it is the latest possible date.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Sergiy SSC Guru Points: 110208 More actions · Answer 5

and NOT (pm.pdenddate=pt.pdenddate

OR (pm.pdenddate is NULL and pt.pdenddate is NULL)

)

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

And notice that Drew did NOT suggest using 9999-12-31. That's because the classic method of isolating date ranges requires a "AND SomeDTColumn < @EndDate +1" and all similar variations. I personally use a literal of "9999", which translates to 9999-01-01, which also leaves room for adding a month to the end date.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 7

drew.allen (6/6/2016)
Eirikur Eiriksson (6/6/2016)
Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.
😎
It's not in this case. If the pm.pdenddate is NULL and the pt.pdenddate is not NULL or '1900-01-01' (or vice versa) this expression will evaluate to TRUE whereas removing the ISNULLs it will evaluate to UNKNOWN.
Drew

You are right, thanks for the correction.

😎

DouglasH SSCommitted Points: 1669 More actions · Answer 8

Post the query plan (Actual) please. We don't know what your data looks like so we don't know what indexes will be used.

Also I notice there's no 'ORDER BY' on your TOP - is this deliberate?

colin.frame Default port Points: 1483 More actions · Answer 9

Sergiy (6/6/2016)
and NOT (pm.pdenddate=pt.pdenddate
OR (pm.pdenddate is NULL and pt.pdenddate is NULL)
)

This logic doesn't quite match what's given - the pm.pdenddate=pt.pdenddate part excludes all lines where one side only of the = operator is NULL.

An alternative would be to replace line (in the original query)

and isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')

with

AND (

pm.pdenddate <> pt.pdenddate -- This excludes all lines with a NULL entry in either column, as would the "=" operator

OR

(pm.pdenddate IS NULL AND pt.pdenddate IS NOT NULL)

OR

(pm.pdenddate IS NOT NULL AND pt.pdenddate IS NULL)

)

I've assumed that none of the data has a pdenddate of '01/01/1900'.

Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>. You can find the IFCode tags under the INSERT options when you are writing a post.

EdVassie SSC Guru Points: 60476 More actions · Answer 10

I think both the original code and all the alternatives suggested would prevent the use of an index to resolve the clause using pdenddate. This is due to the <> operator or the use of OR operators.

I suggest you look at an index on both tables that will work with pddataid and asofdate, and have pdenddate as an included column.

Also, as DouglasH said, you have a TOP clause without an ORDER BY, which means you could get any 10 of however many rows that qualify. If you need a specific TOP 10, then add an ORDER BY and consider adding those columns to the key fields in the index.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Sergiy SSC Guru Points: 110208 More actions · Answer 11

colin.frame (6/8/2016)
Sergiy (6/6/2016)
and NOT (pm.pdenddate=pt.pdenddate
OR (pm.pdenddate is NULL and pt.pdenddate is NULL)
)
This logic doesn't quite match what's given - the pm.pdenddate=pt.pdenddate part excludes all lines where one side only of the = operator is NULL.

You're missing NOT in front of the opening bracket.

_____________
Code for TallyGenerator

simon.barnes Old Hand Points: 370 More actions · Answer 12

a) both the things DouglasH said

b) you mentioned they were heaps and now have indexes; but do they now have clustered indexes? If not then they're still heaps. This may or may not continue to affect performance

c) I've often found the analyzer to give up and go home after the first date seek on custom-built temporal tables, but you may wish to read up on the various implementations available and try some of them out with your data to see if you can increase read performance. Nothing to lose if you're at the end of your tether; I actually spend most of my.development time tweaking the structures, indexes and queries around each other until I get the most acceptable balance between storage and performance (though I may be guilty of overdoing it just to bleed the last few milliseconds out)

d) I'm a recent convert to INTERSECT for such cases due to the equal evaluation of NULL pointers and have seen improvements on some (not all) plans; give the below snippet a go in case it's an easy win:

isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')

to

not exists (select pm.pdenddate intersect select pt.pdenddate)

Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.

RAThor SSC Eights! Points: 941 More actions · Answer 13

Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.

You are correct about the short ISO format (yyyyMMdd). But unfortunately the ODBC canonical format 'yyyy-MM-dd' can still be ambiguous.

The longer ISO8601 format -- yyyy-mm-ddThh:mi:ss.mmm or yyyy-mm-ddThh:mi:ss.mmmZ -- is interpreted consistently. The following example was derived from this answer on Stack Overflow

[font="Courier New"]declare @Foo DATETIME

SET DATEFORMAT YMD

-- this will be the 1st of March in MDY

SET @Foo = '2012-03-01'

SELECT 'YMD: not ISO', @Foo

SET DATEFORMAT MDY

-- this will be the 1st of March in MDY

SET @Foo = '2012-03-01'

SELECT 'MDY: not ISO', @Foo

SET DATEFORMAT DMY

-- this will be the 3rd of january in DMY

SET @Foo = '2012-03-01'

SELECT 'DMY: Not ISO', @Foo

SET DATEFORMAT YDM

-- this will be the 3rd of january in YDM

SET @Foo = '2012-03-01'

SELECT 'YDM: not ISO', @Foo

/* returns

YMD: not ISO 2012-03-01 00:00:00.000

MDY: not ISO 2012-03-01 00:00:00.000

DMY: Not ISO 2012-01-03 00:00:00.000

YDM: not ISO 2012-01-03 00:00:00.000

*/

[/font]

drew.allen SSC Guru Points: 76996 More actions · Answer 14

simon.barnes (6/8/2016)
a) both the things DouglasH said
b) you mentioned they were heaps and now have indexes; but do they now have clustered indexes? If not then they're still heaps. This may or may not continue to affect performance
c) I've often found the analyzer to give up and go home after the first date seek on custom-built temporal tables, but you may wish to read up on the various implementations available and try some of them out with your data to see if you can increase read performance. Nothing to lose if you're at the end of your tether; I actually spend most of my.development time tweaking the structures, indexes and queries around each other until I get the most acceptable balance between storage and performance (though I may be guilty of overdoing it just to bleed the last few milliseconds out)
d) I'm a recent convert to INTERSECT for such cases due to the equal evaluation of NULL pointers and have seen improvements on some (not all) plans; give the below snippet a go in case it's an easy win:
isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')
to
not exists (select pm.pdenddate intersect select pt.pdenddate)
Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.

This suffers the same exact problem that I mentioned here http://www.sqlservercentral.com/Forums/FindPost1792490.aspx

DECLARE @comparisons TABLE (

pm_end_date DATE,

pt_end_date DATE

)

INSERT @comparisons(pm_end_date, pt_end_date)

VALUES(NULL, NULL),

(NULL, '1900'),

('1900', NULL),

('1900', '1900')

SELECT *

FROM @comparisons c

SELECT *

FROM @comparisons c

WHERE ISNULL(c.pm_end_date,'1900') <> ISNULL(c.pt_end_date, '1900')

SELECT *

FROM @comparisons c

WHERE NOT EXISTS ( SELECT c.pm_end_date INTERSECT SELECT c.pt_end_date )

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA