For a recent project I searched for 2 days for a way to uniformly deal with null values in XML correctly, only to find that the solutions found on the Internet are mostly very complex and error prone. I do not pretend knowing everything about XML and SQL, I just puzzled a lot of the little bits of information together and think I found an easy way how to deal with null values correctly. This I want to share with you all in this article and I hope you will all benefit from me putting my findings in writing.
I assume that when you start worrying about getting null values from XML, you've already worked somewhat with XML in SQL Server. For this reason I won't be explaining the details on creating XML and reading data from XML here. For help with issues in generating or retrieving values from XML I suggest you look up the documentation in BOL or one of the many excellent articles on FOR XML PATH, .nodes(), .value(), .query() and other commands.
NULL values in XML?
XML consists of "normal characters" only, so how can you put a null value in an XML document? For this purpose the W3 committee invented the "http://www.w3.org/2001/XMLSchema-instance" name space. Most of the times prefixed as xsi. Among other things, this name space defines a nil attribute. This nil attribute can be added to any element to indicate the element "does not have a value in it", or in T-SQL terminology: is NULL. For your XML document to have a null value in it, you need to declare this namespace in the document and then on the appropriate element(s), add the nil attribute. Most of times the prefix xsi is used for this name space but this is not a requirement. Any other prefix can be used just as well. For better readability I tend to use xsi most the time, and I suggest you all do so too.
How do you make SQL generate XML with nil attributes?
Normally when you use FOR XML PATH, for any columns that have a null value, no element is generated. You can make SQL generate elements for these columns by adding an additional clause to the FOR XML PATH clause. Adding ", elements xsinil" as a clause does 2 things: it adds the http://www.w3.org/2001/XMLSchema-instance namespace declaration to the generated document, and it generates elements for the columns with null in them. This clause automatically adds the attribute xsi:nil = "true" to any such elements. For example, this code:
declare @xml xml; select @xml = (
select
null as [value]
for xml path('field'), type, elements xsinil
); select @xml;
results in this XML document:
<field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<value xsi:nil="true" />
</field>
How do you get a null value from such an XML document?
If you just use the .value() function to retrieve a value from an XML document, you get the value from the element, converted into whichever type you specified. Even for the documents with name space mentioned above and the nil attributes in them will get some value returned. Only if the requested element is missing from the document will you get NULL returned.
So how do we retrieve the null value from an element in this document to, for example, insert it into a table? Before I show you the answer, we will have a look at the .value() function. This function takes 2 parameters: an XPATH query plus a type specification. The function returns the contents of the element found by following the path and then converts it into the type specified. The content of the element found is always converted into the specified type. If the contents cannot be converted into this type, some default value is returned. Which default value is returned depends on the type specified. Only if the path does not lead to an existing element, is NULL returned. For example, to read the contents of the value element in my example XML document we would normally use:
declare @xml xml select @xml = N'
<field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<value xsi:nil="true" />
</field>'; select
@xml.value('/field[1]/value[1]', 'datetime');
go
This works fine for all XML documents where there is an actual date in this "value" element. In our case however, there is no value in it. It is supposed to return NULL. The query above, however, does not return NULL, it returns 1900-01-01 00:00:00.000 for any empty elements. This is all according to the documentation, so Microsoft isn't going to help us since they think this is to be the expected behavior.
Martin Honnen led me on the way to the solution to my problem in his answer in this thread over at the MSDN forums. To get NULL returned by the .value() function I simply had to add an additional clause to the XPATH, querying the xsi:nil attribute for the element. This way I could make the .value() function believe the element did not exist, and thus got my NULL value returned from it. The XPATH condition to add is [not(@xsi:nil = "true")]. For the XML parser to understand that the xsi: prefix matches with the XMLSchema-instance name space we need to declare this namespace before we can execute the query. You can do this using the "with xmlnamespaces" clause. Here is the same example but now adapted with this new knowledge to retrieve a null value from the XML document:
declare @xml xml select @xml = N'
<field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<value xsi:nil="true" />
</field>'; with xmlnamespaces('http://www.w3.org/2001/XMLSchema-instance' as xsi)
select
@xml.value('/field[1]/value[1][not(@xsi:nil = "true")]', 'datetime');
Conclusion
Adding a filter condition [not(@xsi:nil = "true")] to the XPATH makes the .value() function not find an element where the nil attribute is present, thus it will return NULL instead of the default value, just like we want it to. And the best thing is that this works identical for any type specified.