Significant real-life errors, accidents and catastrophes caused by Nulls

  • I am researching for an MSc into significant real-life errors, accidents and catastrophes caused by use of SQL databases where Null is involved. If you know of any such incidents please let me know!!!

    I would appreciate any comments, ideas, suggestion, experiences to areas such as:

    1 The volume of testing and the test coverage of SQL statements - both embedded and stored procedures.

    In my experience of dotNet/SQL n-tier business application the code is usually unit tested and often regeression tested, but the SQL statements are barely tested if at all - unless they cause the code to fail.

    2 The testing of report content which is based on SQL queries.

    Software defects emerege after deployment and are usually identified by some obvious failure of the software. However if a report runs with defects (but does not fail) it may not be obvious that the content is erroneous. How and when do such defects become known?

    3 SQL statements may run and produce correct results until Null(s) appear in the data.

    The statements may not fail but the meaning of their results may not be what the user wanted or expected. How can test cases adequately cover the possibilites of nulls in any (non-null) field? Again how and when would such defects become known?

    4 Defensive programming - to what extent, and how, do developers use language features such as COALESCE or IS NULL to avoid possible defects?

    5 Allowing Nulls - how is the decision made whether a field allows Null or not?

    If default values are used how are the values decided upon and how are they dealt with in queries (for example if FieldX = 'n/a" then this means Not applicable). Default date values are especially interesting

    6 Bad experiences caused by confusion over using e.g SQL SET ANSI NULL or other SQL features that interact with use of Null (especially in distriubted databases where settings may vary between databases)

    7 Bad experiences caused by using SQL aggregate functions - when Nulls occur in the data.

    thanks

    Andrew

  • I have seen your previous posting about this subject so I am wondering why you would write on a subject that is strange ideas from mathematicians. Software is about implementation and we on the implementation side of this issue thinks it is a debate from mathematician who thinks ANSI SQL is flawed and ANSI SQL purists who have not used the vendors products so don't know what they are talking about. Check the thread below for my opinion on the issue.

    http://www.sqlservercentral.com/Forums/Topic660702-149-1.aspx

    Kind regards,
    Gift Peddie

  • I'm with Gift on this.

    Nulls don't cause problems. Designers and developers who think that we shouldn't be allowed to use Nulls do cause problems. I can give you lots of stories about those.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • In many cases, problems start with people creating every column nullable - except for the primary key column(s) - without popping the questing of for that specific column NULL is needed.

    Later on a new programmer is coming in, and off course replaces a value with NULL because the value should be removed. So then they end up with e.g. (var)char columns having en empty string value or spaces and other being NULL.

    And then you get the typical query "bugs" using where colx = '' or where colx <> '' and many others.

    In many classes or summits you will at least once be confronted with "what's the result for this query starting with that data" containing the null pitfall.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Gosh Gift Peddie you have a long memeory! My last posting was early 2008 I think.

    I think you have a principled view on Null, which is fine as far as it goes. I was simply asking for pragmatic experience - particularly of post-deployment defects.

    Whether SQL Null is adequate/inadequate is really a logcal question of "fitness for purpose" when representing data in a database. It is specific to the problem domain being addressed

    For an excellent and very readable alternative view on this read this article - Stephen Henley "The problem of missing data in geoscience databases". Computers & Geosciences 32 (2006) 1368–1377 - which is about real-life issues with representing data.

    thanks

    Andrew

  • Well RBarry Young - my sympathies. You seem to have been up against principled views. . Tut. This is science not religion and rational thought is needed!!!

    Null, default values, normalisation are about the only built-in ways of representing Incomplete Information in SQL databases.

    This is fine if the data you deal with is amenable to using Null and/or default values. What do you do if the data or problem domain is NOT amenable to this?

    Anyways I am interested in the LACK of information about post -deployment defects caused by SQL / Null. Of course it may be that all these incompetent developers produce defect-free SQL. The same developers do NOT produce defect-free code...

    Perhaps declarative languages are free of run-time defects? Perhaps the defects are never found and reports on the data are just wrong and no-one know...

  • I think you have a principled view on Null, which is fine as far as it goes. I was simply asking for pragmatic experience - particularly of post-deployment defects.

    All the post deployment defects in RDBMS I know are from unskilled object developers who do not know how to match information during migration. In 2006 the state of Indiana DMV live people are dead and dead people are live because some object developer decided a tool should migrate complex drivers license data.

    For an excellent and very readable alternative view on this read this article - Stephen Henley "The problem of missing data in geoscience databases". Computers & Geosciences 32 (2006) 1368–1377 - which is about real-life issues with representing data.

    Geoscience may just mean academics hired cheap labor or decided they can do it when they are not SQL trained I have taught a PHD mathematician SQL because he was smart enough to know his education does not include SQL. And on the Microsoft platform before 2008 most Geo data was in Access so I could tell you about the Access database that took more than one year to migrate. I was in a project a developer was to implement distance component in Oracle 9i and the developer said in a meeting he is working on it and I said I did not see the math so there is no distance without the math, if I did not join the project that project would have deployed without actual distance code. Software is skills business most companies hired the unskilled and blame platforms. That developer was working on that component for six months before I joined the project and nobody in that team knew he was writing useless code.

    Kind regards,
    Gift Peddie

  • Actually, the Henley ppt seemed pretty good, don't know about the article though.

    Andrew (4/11/2009)


    Well RBarry Young - my sympathies. You seem to have been up against principled views.

    They needed the sympathy, not me.

    This is science not religion and rational thought is needed!!!

    Some of us practice it in both. 🙂

    Anyways I am interested in the LACK of information about post -deployment defects caused by SQL / Null.

    and I am concerned with the automatic elevation of "association with NULLs" to "causation by NULLs", a theme that I have noticed for many years.

    Of course it may be that all these incompetent developers produce defect-free SQL. The same developers do NOT produce defect-free code...

    Perhaps declarative languages are free of run-time defects? Perhaps the defects are never found and reports on the data are just wrong and no-one know...

    I don't think that I said or implied anything like that.

    Besides, I wouldn't call them "incompetent developers". I have hired, mentored and/or trained many out of university and in my experience once they have been adequately trained, most are quite competent. So I would tend towards categorizing them as "incompetently trained".

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Hi Andrew

    1 The volume of testing and the test coverage of SQL statements - both embedded and stored procedures.

    In my experience of dotNet/SQL n-tier business application the code is usually unit tested and often regeression tested, but the SQL statements are barely tested if at all - unless they cause the code to fail.

    I don't think that a bad test is the problem of NULL values.

    2 The testing of report content which is based on SQL queries.

    Software defects emerege after deployment and are usually identified by some obvious failure of the software. However if a report runs with defects (but does not fail) it may not be obvious that the content is erroneous. How and when do such defects become known?

    Same depends on any wrong calculations within the report. Just have a look to Excel which is still one of the most used analysis tools of the world. It always possible that a report does not throw an error but the data are still wrong.

    3 SQL statements may run and produce correct results until Null(s) appear in the data.

    The statements may not fail but the meaning of their results may not be what the user wanted or expected. How can test cases adequately cover the possibilites of nulls in any (non-null) field? Again how and when would such defects become known?

    Previously you spoke about DotNet so just have a look to it. A "string" can also be null/Nothing in .Net and that's good. Yes, value types cannot be null, but this is sometimes a problem and the reason for the nullable value-types (in C# int?). Especially the value zero for an int is often a problem. If you have any count property in any object there might be a huge difference between "0" and "unknown".

    4 Defensive programming - to what extent, and how, do developers use language features such as COALESCE or IS NULL to avoid possible defects?

    I handle the communication between SQL Server and application programming interface by a tier which maps DBNull.Value (NULL in .Net) to null/Nothing and back.

    Greets

    Flo

  • Thanks Florian

    I am primarily interested in defects in software, which leads to an interest in testing.

    There is surprsingly little information about testing SQL statements for correctness and robustness.

    So I am interested in empirical experiences of post-deployment defects with SQL statments.

    Null used incompetently is an obvious source of lack of correctness / robustess. (I am using competence in the professional sense here of expereinece + skill +knowlrdge, not in the usual English rather derogatory sense)

    Having said that....

    I am NOT interested in low-level debates about SQL Null. The flaws in SQL Null are well known. well understood, and well discussed over the last 30 years. A tool should do what it says on the tin. Then it is a good tool in the hands of a competent user.

    The commerical imperatives behind the lack of development for handling Incomplete Information in SQL (unchanged since 1992) are also well understood

    I am interested in high-level debates about how to represent Incomplete Information in relational databases when the problem domain is more complex than "value not known at the time of entry" or "value not applicable". For example most (arguably all) scientific measurements are imprecise, some are unknowable

    I appreciate most business software is essentially transaction processing and models business processes which are (or should be) designed to be simple input-output steps where if data values are not known the transformation will not take place. Banking is an obvious and frequently quoted exmaple.

    There are other problem domains however which are not simple transaction processing. Crime investigation software is a good example where the certainty and completeness of information is as important as the actual values stored.

    thanks

    Andrew

  • I'll post a note that I saw a system that had issues determining which orders had been placed for customers when a delivery date wasn't specified. There was no default, and when the order entry person wasn't sure, they entered the order, but had no delivery date listed. In the database, this was a NULL, and we had a report totaling things on delivery date. Those orders with NULLs in them weren't included and that caused some strange inventory issues.

  • Thanks Steve.

    I guess in some cases these kinds of defect are discovered because the "user" knew there ought to be some orders.

    The "information in the system" depends where you draw the system boundary!

    This seems a more severe case where only some orders were absent, which went unnoticed.

    Surely there are many more incidents like this one - or do they go for ever unnoticed...

    Andrew

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply