June 6, 2016 at 9:18 am
Dear All,
Our dev has given some deployment and it has a query that I could point to which looks non-sargeable. Here is a sample query:
select top 10 pm.pdenddate from GlobalPD.PDDataTimeSeriesPE_tmp_01 pt
inner join pddata.Globalpd.PdDataMarketSignals_Recal_tbl pm(nolock)
on pm.pddataid=pt.pddataid
and pm.asofdate=pt.asofdate
and isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')
where pm.asofdate>='2000-01-01';
One of the tables in question are fairly big(400 GB) and another is relatively small,at 500 MB. Earlier both were plain heaps, I created some indexes on them but I suspect that the ISNULL function used above renders them almost useless. Is there away I can advise them to re-write this and not break the logic at the same time. Attached are table schemas in question..
Regards
Chandan Jha
June 6, 2016 at 9:43 am
Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.
😎
June 6, 2016 at 10:33 am
Eirikur Eiriksson (6/6/2016)
Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.😎
It's not in this case. If the pm.pdenddate is NULL and the pt.pdenddate is not NULL or '1900-01-01' (or vice versa) this expression will evaluate to TRUE whereas removing the ISNULLs it will evaluate to UNKNOWN.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 6, 2016 at 12:16 pm
So is there no way out of this mess? Is it possible to have a new table with same structure with a computed column where we do a bulk load and let this computed column take a value based on the incoming column.
I remember reading it somewhere here in one of the forum questions that NULL has no meaning as such and to avoid issues when loading such a table from some source, NULLs should be converted to an year like 1900 or something like that so that when this value is encountered, we know that the value was unknown for this.
Thanks
Chandan
June 6, 2016 at 1:30 pm
chandan_jha18 (6/6/2016)
So is there no way out of this mess? Is it possible to have a new table with same structure with a computed column where we do a bulk load and let this computed column take a value based on the incoming column.I remember reading it somewhere here in one of the forum questions that NULL has no meaning as such and to avoid issues when loading such a table from some source, NULLs should be converted to an year like 1900 or something like that so that when this value is encountered, we know that the value was unknown for this.
Thanks
Chandan
Unfortunately, that advice is based on a gross oversimplification. It does make sense to assign default values in SOME cases, but not ALL cases. It makes sense to use 1900-01-01 in SOME cases, but not ALL cases. Specifically, it makes sense to use 1900-01-01 for START dates as long as that start date is the earliest possible date in your database. If your data contains records with start dates before 1900-01-01, then you should use some other value as your start date.
On the other hand, you should NEVER use 1900-01-01 for END dates. The default end date should always be the latest possible date in your database. Suggested values include 9999-12-30, 9999-01-01, and 9000-01-01, but any date will work as long as it is the latest possible date.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 6, 2016 at 8:37 pm
and NOT (pm.pdenddate=pt.pdenddate
OR (pm.pdenddate is NULL and pt.pdenddate is NULL)
)
_____________
Code for TallyGenerator
June 6, 2016 at 9:07 pm
And notice that Drew did NOT suggest using 9999-12-31. That's because the classic method of isolating date ranges requires a "AND SomeDTColumn < @EndDate +1" and all similar variations. I personally use a literal of "9999", which translates to 9999-01-01, which also leaves room for adding a month to the end date.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 7, 2016 at 5:46 am
drew.allen (6/6/2016)
Eirikur Eiriksson (6/6/2016)
Quick thought, the isnull statement is redundant ad null is never equal to null. It will on the other hand impede performance as the resulting execution plan will use scans rather than seeks.😎
It's not in this case. If the pm.pdenddate is NULL and the pt.pdenddate is not NULL or '1900-01-01' (or vice versa) this expression will evaluate to TRUE whereas removing the ISNULLs it will evaluate to UNKNOWN.
Drew
You are right, thanks for the correction.
😎
June 7, 2016 at 6:47 am
Post the query plan (Actual) please. We don't know what your data looks like so we don't know what indexes will be used.
Also I notice there's no 'ORDER BY' on your TOP - is this deliberate?
June 8, 2016 at 3:12 am
Sergiy (6/6/2016)
and NOT (pm.pdenddate=pt.pdenddateOR (pm.pdenddate is NULL and pt.pdenddate is NULL)
)
This logic doesn't quite match what's given - the pm.pdenddate=pt.pdenddate part excludes all lines where one side only of the = operator is NULL.
An alternative would be to replace line (in the original query)
and isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')
with
AND (
pm.pdenddate <> pt.pdenddate -- This excludes all lines with a NULL entry in either column, as would the "=" operator
OR
(pm.pdenddate IS NULL AND pt.pdenddate IS NOT NULL)
OR
(pm.pdenddate IS NOT NULL AND pt.pdenddate IS NULL)
)
I've assumed that none of the data has a pdenddate of '01/01/1900'.
Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>
. You can find the IFCode tags under the INSERT options when you are writing a post.
June 8, 2016 at 4:22 am
I think both the original code and all the alternatives suggested would prevent the use of an index to resolve the clause using pdenddate. This is due to the <> operator or the use of OR operators.
I suggest you look at an index on both tables that will work with pddataid and asofdate, and have pdenddate as an included column.
Also, as DouglasH said, you have a TOP clause without an ORDER BY, which means you could get any 10 of however many rows that qualify. If you need a specific TOP 10, then add an ORDER BY and consider adding those columns to the key fields in the index.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
June 8, 2016 at 4:31 am
colin.frame (6/8/2016)
Sergiy (6/6/2016)
and NOT (pm.pdenddate=pt.pdenddateOR (pm.pdenddate is NULL and pt.pdenddate is NULL)
)
This logic doesn't quite match what's given - the pm.pdenddate=pt.pdenddate part excludes all lines where one side only of the = operator is NULL.
You're missing NOT in front of the opening bracket.
_____________
Code for TallyGenerator
June 8, 2016 at 10:39 pm
a) both the things DouglasH said
b) you mentioned they were heaps and now have indexes; but do they now have clustered indexes? If not then they're still heaps. This may or may not continue to affect performance
c) I've often found the analyzer to give up and go home after the first date seek on custom-built temporal tables, but you may wish to read up on the various implementations available and try some of them out with your data to see if you can increase read performance. Nothing to lose if you're at the end of your tether; I actually spend most of my.development time tweaking the structures, indexes and queries around each other until I get the most acceptable balance between storage and performance (though I may be guilty of overdoing it just to bleed the last few milliseconds out)
d) I'm a recent convert to INTERSECT for such cases due to the equal evaluation of NULL pointers and have seen improvements on some (not all) plans; give the below snippet a go in case it's an easy win:
isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')
to
not exists (select pm.pdenddate intersect select pt.pdenddate)
Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.
June 9, 2016 at 8:49 am
Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.
You are correct about the short ISO format (yyyyMMdd). But unfortunately the ODBC canonical format 'yyyy-MM-dd' can still be ambiguous.
The longer ISO8601 format -- yyyy-mm-ddThh:mi:ss.mmm or yyyy-mm-ddThh:mi:ss.mmmZ -- is interpreted consistently. The following example was derived from this answer on Stack Overflow
[font="Courier New"]declare @Foo DATETIME
SET DATEFORMAT YMD
-- this will be the 1st of March in MDY
SET @Foo = '2012-03-01'
SELECT 'YMD: not ISO', @Foo
SET DATEFORMAT MDY
-- this will be the 1st of March in MDY
SET @Foo = '2012-03-01'
SELECT 'MDY: not ISO', @Foo
SET DATEFORMAT DMY
-- this will be the 3rd of january in DMY
SET @Foo = '2012-03-01'
SELECT 'DMY: Not ISO', @Foo
SET DATEFORMAT YDM
-- this will be the 3rd of january in YDM
SET @Foo = '2012-03-01'
SELECT 'YDM: not ISO', @Foo
/* returns
YMD: not ISO 2012-03-01 00:00:00.000
MDY: not ISO 2012-03-01 00:00:00.000
DMY: Not ISO 2012-01-03 00:00:00.000
YDM: not ISO 2012-01-03 00:00:00.000
*/
[/font]
June 9, 2016 at 9:09 am
simon.barnes (6/8/2016)
a) both the things DouglasH saidb) you mentioned they were heaps and now have indexes; but do they now have clustered indexes? If not then they're still heaps. This may or may not continue to affect performance
c) I've often found the analyzer to give up and go home after the first date seek on custom-built temporal tables, but you may wish to read up on the various implementations available and try some of them out with your data to see if you can increase read performance. Nothing to lose if you're at the end of your tether; I actually spend most of my.development time tweaking the structures, indexes and queries around each other until I get the most acceptable balance between storage and performance (though I may be guilty of overdoing it just to bleed the last few milliseconds out)
d) I'm a recent convert to INTERSECT for such cases due to the equal evaluation of NULL pointers and have seen improvements on some (not all) plans; give the below snippet a go in case it's an easy win:
isnull(pm.pdenddate,'01/01/1900')<>isnull(pt.pdenddate,'01/01/1900')
to
not exists (select pm.pdenddate intersect select pt.pdenddate)
Forgot to mention e) as a best practice, avoid such date formats in code. 'yyyy-MM-dd' and 'yyyyMMdd' are both ANSI-standard (I believe) and should.be interpreted correctly and unambigously by the engine no matter what your regional or language settings.
This suffers the same exact problem that I mentioned here http://www.sqlservercentral.com/Forums/FindPost1792490.aspx
DECLARE @comparisons TABLE (
pm_end_date DATE,
pt_end_date DATE
)
INSERT @comparisons(pm_end_date, pt_end_date)
VALUES(NULL, NULL),
(NULL, '1900'),
('1900', NULL),
('1900', '1900')
SELECT *
FROM @comparisons c
SELECT *
FROM @comparisons c
WHERE ISNULL(c.pm_end_date,'1900') <> ISNULL(c.pt_end_date, '1900')
SELECT *
FROM @comparisons c
WHERE NOT EXISTS ( SELECT c.pm_end_date INTERSECT SELECT c.pt_end_date )
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply