September 9, 2016 at 1:39 am
Gregory Hart (9/8/2016)
Not all UDF are bad, ... if correctly written, they can be tremendous for establishing and persisting in standards and also in code control when modifications occur to a process that is repeatedly used.
Indeed. The in-line table UDFs work very well. Scalar UDFs however, suffer from being called on every single row, so even ones that have no variables and no data access still have the overhead from being executed once per row for every row. It's no where near as bad as for scalar UDFs that do data access, but it's far from trivial too (and unfortunately natively compiling the function doesn't improve matters)
Why don't I post it on their forums?
A working and in-place confidentially agreement that prohibits me.
Connect isn't a forum. It's Microsoft's 'bug and suggestions' site, where you post bugs you've found or suggestions for new features for their products.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 9, 2016 at 1:44 am
The Dixie Flatline (9/8/2016)
And, ... PLEASE, PLEASE post a valid usage if you're aware of one in REAL business data management.
Averages, percentiles, counts of non-responses.
A scientific model that I've been working on for a while where the divisors can sometimes legitimately be NULL and if they are the results should be NULL, but dividing by 0 means that the process should be aborted, as in the physical world X items cannot be divided among 0 locations (vague, but there are heavy NDAs around this whole process)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 9, 2016 at 3:01 am
Gregory Hart (9/8/2016)
OK, ... let this post/thread be a lesson to all you "know-it-all" experts. Lessons learned:Not all UDF are bad, ... if correctly written, they can be tremendous for establishing and persisting in standards and also in code control when modifications occur to a process that is repeatedly used.
One of the very first things we teach our new juniors – the few who don’t know this already – is how appallingly bad UDF’s can be, and the reasons why.
When a software company truly listens to their users, they become successful and thus generate greater profits. Microsoft has persistently NOT listened to its users, hence why they could barely even give away the Windows 10 operating system. In contrast, their approach to SQL 2016 (and all other versions of SQL Enterprise that are Post Version 6.5) listens VERY CAREFULLY to their user base, and thus has attributed to the tremendous success of MS SQL products.
Even though a behavior may be TECHNICALLY correct, that does not outweigh the practical application of the process. It may be technically correct and logically validated that Hitler was one of the greatest leaders in all recorded history, but; such does not validate the horrors of his rule. Similarly, an error output for division by Zero may be "technically" correct, but the derived benefit from such is far outweighed by its inconvenience. Thus, it would seem to be a great "selling point" for Microsoft to adopt this approach instead of fighting it.
Why don't I post it on their forums?
A working and in-place confidentially agreement that prohibits me.
Please don’t take this personally but from the evidence on this thread it appears to me that you may not have sufficient exposure to the product to make a useful change recommendation. A surfeit of hubris won't compensate for a dearth of experience.
Edit: English
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 9, 2016 at 3:54 am
Gregory Hart (9/8/2016)
It may be technically correct and logically validated that Hitler was one of the greatest leaders in all recorded history, but; such does not validate the horrors of his rule.
As an aside to this (very interesting and entertaining) debate, I did a search and this is the 36th time Hitler has been mentioned on SSC.
September 9, 2016 at 4:24 am
Scalar UDF vs built-in function on 100 million rows.
Disclaimer, I had to run this on a different machine to the 1 million row test. 1 million was done on a desktop with 32GB memory, quad-core processor and spinning disks for the DB files. 100 million done on a laptop, quad core processor, 16GB memory and SSD for database files.
Edit: Both running SQL Server 2014 Developer Edition, I don't have a SQL 2008 R2 server any longer.
The only difference from the earlier code is the insert.
INSERT INTO dbo.SomeTable (Numerator, Divisor)
SELECT TOP (100000000) RAND(CHECKSUM(NEWID()))*50000, RAND(CHECKSUM(NEWID()))*50000
FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2 CROSS JOIN msdb.sys.columns c3
GO
Results from Stats Time
-- Insert into a table variable to remove the overhead of SSMS's row display.
DECLARE @t TABLE (Result REAL)
INSERT INTO @t (Result)
SELECT dbo.Divide(Numerator, Divisor) FROM dbo.SomeTable
GO
SQL Server Execution Times:
CPU time = 699141 ms, elapsed time = 808036 ms.
That's 11 min, 39 seconds of CPU time
DECLARE @t TABLE (Result REAL)
INSERT INTO @t (Result)
SELECT Numerator/NULLIF(Divisor,0) FROM dbo.SomeTable
GO
SQL Server Execution Times:
CPU time = 43656 ms, elapsed time = 44525 ms.
vs 44 seconds of CPU time.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 9, 2016 at 6:51 am
I was not expecting Godwin to apply here...
Thanks for running the tests Gail. I ran the iTVF version too for 1m rows & the times are identical to the NULLIF run (both 287ms).
As you'd expect given they're identical statements from SQL's point of view.
CREATE FUNCTION dbo.Divide_TVF(@Numerator Real, @Denominator Real)
RETURNS TABLE AS
RETURN SELECT @Numerator/NULLIF(@Denominator,0) as Result
GO
DECLARE @t TABLE (Result REAL)
INSERT INTO @t (Result)
SELECT d.Result FROM dbo.SomeTable st cross apply dbo.Divide_TVF(Numerator, Divisor) d
GO
SQL 2016 Dev edition
Scalar UDF (and look at those reads!)
Table '#ACD2B8AF'. Scan count 0, logical reads 1001607, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SomeTable'. Scan count 1, logical reads 3832, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 2832352, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 8438 ms, elapsed time = 8674 ms.
(1000000 row(s) affected)
Select with NULLIF
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'SomeTable'. Scan count 1, logical reads 3832, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 287 ms.
(1000000 row(s) affected)
Select with iTVF
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'SomeTable'. Scan count 1, logical reads 3832, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 287 ms.
(1000000 row(s) affected)
September 9, 2016 at 6:58 am
Totally unsurprising, Gail. Except perhaps to Mr. Hart.
He's obviously a newcomer to SQL Server if he is trying to make a case for scalar functions. What he wants is for SQL Server to act like MySQL in its non-strict mode, which he will probably argue is the DB of choice for true business development professionals.
By the way, just for fun I did a quick search on "Oracle Divide By Zero". Also "IBM DB2 Divide By Zero"
What do they suggest over there?
NULLIF()
I'm going to return to an earlier position. We are dealing with an erudite, but ignorant, troll. He doesn't want to implement fixes to his problem. He doesn't want to actually suggest a correction to Microsoft. He just wants to mouth off in the forum in the misguided belief that he will "triumph" and establish his superiority over the rest of us, including all you MVPs. He is not listening, he is merely arguing. I find this sort of thing sad, as well as annoying.
That said, I'm unsubscribing from this thread because life is too short.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 9, 2016 at 7:44 am
Thanks.
I do prefer the simplicity of simply disallowing divide by zero, instead of treating it as NULL. The usual argument from looking at division as the inverse of multiplication is enough consideration for me, i.e., if you say x/0=y, then whatever you put in for y, then you would need either for y*0=x, or to define another exception that says that in this one case division is not treated as the inverse of multiplication.
This does best address the OP question as far as I can tell. If 1/0 = NULL there are two problems:
1. The implication is that NULL * 0 = 1. Nothing multiplied by zero can equal 1 so right off the bat there's a problem.
2. As defined in SQL NULL cannot "equal" (=) anything. So any equation in SQL that includes "NULL =" (a) has to be disallowed or (b) has to be accepted with the caveat that bad results will likely be returned.
The tone of the OP is another issue. Very combative and trollish as many others have indicated.
See also:
https://www.simple-talk.com/sql/t-sql-programming/how-to-get-nulls-horribly-wrong-in-sql-server/[/url]
http://www.sqlservercentral.com/articles/T-SQL/understandingthedifferencebetweenisnull/871/[/url]
https://en.wikipedia.org/wiki/Null_(SQL) , specifically:
For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following:
NULL / 0
- webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
September 9, 2016 at 8:54 am
GilaMonster (9/9/2016)
Gregory Hart (9/8/2016)
Such use of a UDF is ideal, and will respond at least as efficiently as a standard SQL function. Check it out, ... do a query execution plan and any other test using the UDF I posted (dbo.Divide) compared to NullIf. You may find yourself VERY surprised (I won't brag and tell you the outcome, ... please do the tests, especially do a test of timing the executions using both methods and against a very large data set).With pleasure.
CREATE TABLE SomeTable (
Numerator REAL, -- I ordinarily wouldn't use FLOAT/REAL in financial systems, but to avoid any suggestion that the function overhead is from data type conversions
Divisor REAL
)
GO
-- insert a million rows
INSERT INTO dbo.SomeTable (Numerator, Divisor)
SELECT TOP (1000000) RAND(CHECKSUM(NEWID()))*50000, RAND(CHECKSUM(NEWID()))*50000
FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2
GO
-- and set some to 0
UPDATE dbo.SomeTable
SET Divisor = 0
WHERE Divisor IN (SELECT TOP(5) PERCENT Divisor FROM dbo.SomeTable ORDER BY NEWID())
GO
CREATE FUNCTION dbo.Divide(@Numerator Real, @Denominator Real)
RETURNS Real AS
/*
Purpose: Handle Division by Zero errors
Description: User Defined Scalar Function
Parameter(s): @Numerator and @Denominator
Test it:
SELECT 'Numerator = 0' Division, dbo.fn_CORP_Divide(0,16) Results
UNION ALL
SELECT 'Denominator = 0', dbo.fn_CORP_Divide(16,0)
UNION ALL
SELECT 'Numerator is NULL', dbo.fn_CORP_Divide(NULL,16)
UNION ALL
SELECT 'Denominator is NULL', dbo.fn_CORP_Divide(16,NULL)
UNION ALL
SELECT 'Numerator & Denominator is NULL', dbo.fn_CORP_Divide(NULL,NULL)
UNION ALL
SELECT 'Numerator & Denominator = 0', dbo.fn_CORP_Divide(0,0)
UNION ALL
SELECT '16 / 4', dbo.fn_CORP_Divide(16,4)
UNION ALL
SELECT '16 / 3', dbo.fn_CORP_Divide(16,3)
*/
BEGIN
RETURN
CASE WHEN @Denominator = 0 THEN
NULL
ELSE
@Numerator / @Denominator
END
END
GO
Then with Stats Time ON.
-- Insert into a table variable to remove the overhead of SSMS's row display.
DECLARE @t TABLE (Result REAL)
INSERT INTO @t (Result)
SELECT dbo.Divide(Numerator, Divisor) FROM dbo.SomeTable
GO
SQL Server Execution Times:
CPU time = 3328 ms, elapsed time = 6453 ms.
DECLARE @t TABLE (Result REAL)
INSERT INTO @t (Result)
SELECT Numerator/NULLIF(Divisor,0) FROM dbo.SomeTable
GO
SQL Server Execution Times:
CPU time = 219 ms, elapsed time = 212 ms.
Nope, not particularly surprised (but maybe that's because I did this kind of test for a blog post earlier this year)
But, you did say 'very large data set', and a million rows isn't very large, so let me re-test with 100 million rows (this might take a while), I'll post when it's done.
Just a bit of an add here. Besides the stats, execution plans lie about udfs such as this. Run an estimated plan to get the fuller picture. When you gather exec plan info for udfs, the udf part of the plan will typically show 0 cost which isn't accurate and can also be proven by pulling the cached plan for that udf.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
September 9, 2016 at 9:24 am
Thank you all.
September 13, 2016 at 12:04 pm
In terms of a real situation, I want a division by zero to return an error so I can trap for that and handle it. There's no reason to have the application break. Instead, we handle the error and return appropriate content.
Returning a NULL could be a way of trapping a situation as well, but NULL is known.
Gregory, you've said a few times that NULL is the lack of a number / value. That is incorrect. NULL is an unknown value. It could be 4, it could be -1000, it could be pi. The value is unknown, not lacking.
September 13, 2016 at 2:05 pm
Nulls are actually both! :w00t: The original definition from Dr. Codd was that they are a marker for a missing value. But that changed later in the second version of the relational model, he created two kinds of nulls. Something is missing because it has value is unknown was one type; something is missing because the attribute itself does not appear in the entity the second type. For example my hair color might be modeled with the null (I assume everybody knows I am bald). But I have the possibility of hair. My feather color might be modeled with the null, but I am never going to have feathers.
SQL was based on the first version of the relational model, so we only have one kind of null.
One of my little safety tips is it when you have a noble column you are obligated to define what the null means relative to that attribute. For example, if you use (start_date, end_date) pairs the idiom has been that a null in the end_date means "eternity"in that context. In other context, it might mean the value is not been provided yet but it will be resolved somehow (actual value, statistical trick, arbitrary decision, etc.). Likewise a different context. The null might mean this attribute does not apply for this particular entity. The created nulls are also problematic; look at outer joins, rollup, cubes, etc.
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
September 13, 2016 at 4:57 pm
(start_date, end_date) pairs the idiom has been that a null in the end_date means "eternity"in that context
My experience has been that it means one of two possible outcomes:
1) Eternity in that it is the biggest end date
2) Eternity in that , because there is no end date, the start date is null, and hence there is no time span
September 13, 2016 at 8:15 pm
Gregory Hart (9/13/2016)
(start_date, end_date) pairs the idiom has been that a null in the end_date means "eternity"in that context
My experience has been that it means one of two possible outcomes:
1) Eternity in that it is the biggest end date
2) Eternity in that , because there is no end date, the start date is null, and hence there is no time span
Another one to your collection:
3) defined by the default duration or default end_date specified in EntityType table.
_____________
Code for TallyGenerator
September 13, 2016 at 8:30 pm
(start_date, end_date) pairs the idiom has been that a null in the end_date means "eternity"in that context
My experience has been that it means one of two possible outcomes:
1) Eternity in that it is the biggest end date
2) Eternity in that , because there is no end date, the start date is null, and hence there is no time span
Using is not the same as eternity as the greatest in date. That is "9999–12–31" in the ISO standards. There may be local conventions based on who is doing this but that is it. And no on point number two, just because you know when some immortal begin to live, you do not know when he will die (ever get involved with the highlander series of movies?)
The SQL temporal model is incomplete, but we can fake it, we really need a plus or minus (eternity) just like the IEEE has its +inf and +inf symbols
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Viewing 15 posts - 46 through 60 (of 64 total)
You must be logged in to reply to this topic. Login to reply