Microsoft SQL Server has become an increasingly friendly data platform for developers. It natively supports XML for a large number of years and now, keeping up with trends, we can use built-in JSON support starting with SQL Server 2016.
Does that mean we should all ditch XML and start using JSON? No. It depends mostly on the target of your data output processing. Do you have external services that interchange data over XML with external parties and a schema is agreed on both parts? You should stick to the XML data type and native functions. Are you targeting micro services architecture or a flexible metadata and data store? You should take advantage of the newly created support for JSON.
The XML support on SQL Server is not on the scope of this article, so let’s jump into some JSON fundamentals.
If you do have a fixed schema for your JSON document, you are better off using relational tables and common data types. When querying these tables, use the “FOR JSON” option on your T-SQL script. This way you’ll have a ready to go formatted output.
Let’s use the SQL Server 2016 Worldwide Importers sample database, available for download on Github. Take a look at the view named Website.customers. Lets query the first record and output it on JSON format:
SELECT [CustomerID]
,[CustomerName]
,[CustomerCategoryName]
,[PrimaryContact]
,[AlternateContact]
,[PhoneNumber]
,[FaxNumber]
,[BuyingGroupName]
,[WebsiteURL]
,[DeliveryMethod]
,[CityName]
,DeliveryLocation.ToString() as DeliveryLocation
,[DeliveryRun]
,[RunPosition]
FROM [WideWorldImporters].[Website].[Customers]
WHERE CustomerID=1
FOR JSON AUTO
Please note that we have a geography data type column, DeliveryLocation, which brings up two important workarounds you should perform:
Frist, the conversion to a string is necessary, otherwise you’ll get an error like:
FOR JSON cannot serialize CLR objects. Cast CLR types explicitly into one of the supported types in FOR JSON queries.
Second, JSON works on key/value pairs syntax so you’ll have to give an alias to the converted value. If you fail to do so, the error presented is:
Column expressions and data sources without names or aliases cannot be formatted as JSON text using FOR JSON clause. Add alias to the unnamed column or table.
Having learned this, the produced formatted output is as follows:
[ { "CustomerID": 1, "CustomerName": "Tailspin Toys (Head Office)", "CustomerCategoryName": "Novelty Shop", "PrimaryContact": "Waldemar Fisar", "AlternateContact": "Laimonis Berzins", "PhoneNumber": "(308) 555-0100", "FaxNumber": "(308) 555-0101", "BuyingGroupName": "Tailspin Toys", "WebsiteURL": "http://www.tailspintoys.com", "DeliveryMethod": "Delivery Van", "CityName": "Lisco", "DeliveryLocation": "POINT (-102.6201979 41.4972022)", "DeliveryRun": "", "RunPosition": "" } ]
You can also use JSON as an input for DML statements of INSERT/UPDATE/DELETE with OPENJSON. So it’s possible to go with JSON on all data operations.
If you do not know the structure of your data or you want it flexible, than you can store it as a JSON formatted string, inside an NVARCHAR data typed column. An example of this use, on the Worldwide Importers database is the CustomFields column on the Application.People table. You can view the column contents in tabular format with these statements:
declare @json nvarchar(max) SELECT @json=[CustomFields] FROM [WideWorldImporters].[Application].[People] where PersonID=8 select * from openjson(@json)
The result is displayed on the results grid:
Another way to query this record, possible when you know the structure and key names inside the JSON data , uses the JSON_VALUE and JSON_QUERY functions:
SELECT JSON_QUERY([CustomFields],'$.OtherLanguages') as OtherLanguages, JSON_VALUE([CustomFields],'$.HireDate') as HireDate, JSON_VALUE([CustomFields],'$.Title') as Title, JSON_VALUE([CustomFields],'$.PrimarySalesTerritory') as PrimarySalesTerritory, JSON_VALUE([CustomFields],'$.CommissionRate') as CommissionRate FROM [WideWorldImporters].[Application].[People] where PersonID=8
The result is displayed in tabular format on the results grid:
The main concern here will be the query criteria and how to index them. Imagine we want to query all the people employed on the company after 2011. You could run the following query:
SELECT personID,fullName,JSON_VALUE(CustomFields,'$.HireDate') as hireDate FROM [WideWorldImporters].[Application].[People] where IsEmployee=1 and year(cast(JSON_VALUE(CustomFields,'$.HireDate') as date))>2011
Remember JSON_VALUE returns a single text value of type nvarchar(4000). We’ll have to cast the return into a date and then extract the year to check the search criteria. The actual execution plan is as follows:
To check out how we index a JSON document we’ll need the creation of a computed column. For the sake of example, because the Application.People table is versioned and adding computed column while system-versioning is ON is not supported, let’s change tables and use Sales.Invoices. The ReturnedDeliveryData is filled JSON data. Let’s get some rows to get a feel on it:
SELECT TOP 100 [InvoiceID] ,[CustomerID] ,JSON_QUERY([ReturnedDeliveryData],'$.Events') FROM [WideWorldImporters].[Sales].[Invoices]
We can see on the results grid that the first event is always “Ready for collection”:
Let’s try to get the invoices for deliveries ready for collection on March, 2016:
SELECT [InvoiceID] ,[CustomerID] ,CONVERT(datetime, CONVERT(varchar,JSON_VALUE([ReturnedDeliveryData],'$.Events[0].EventTime')),126) FROM [WideWorldImporters].[Sales].[Invoices] WHERE CONVERT(datetime, CONVERT(varchar,JSON_VALUE([ReturnedDeliveryData],'$.Events[0].EventTime')),126) BETWEEN '20160301' AND '20160331'
The actual execution plan is as follows:
Now let’s add a computed column, named “ReadyDate”, with our ready for collection expression result:
ALTER TABLE [WideWorldImporters].[Sales].[Invoices] ADD ReadyDate AS CONVERT(datetime, CONVERT(varchar,JSON_VALUE([ReturnedDeliveryData],'$.Events[0].EventTime')),126)
After, repeating tha same query but using the newly created computed column:
SELECT [InvoiceID] ,[CustomerID] ,ReadyDate FROM [WideWorldImporters].[Sales].[Invoices] WHERE ReadyDate BETWEEN '20160301' AND '20160331'
The execution plan is the same, but SSMS advises on a missing index creation:
So we’ll follow the good advice and index the computed column to help out the searches on that column:
/* The Query Processor estimates that implementing the following index could improve the query cost by 99.272%. */CREATE NONCLUSTERED INDEX IX_Invoices_ReadyDate ON [Sales].[Invoices] ([ReadyDate]) INCLUDE ([InvoiceID],[CustomerID]) GO
Let’s repeat the query and check out its execution plan:
With the new index, we’ve managed a great improvement on performance and are able to query the JSON data as fast as if the data was actually on a table column.
The lessons learned here are:
- JSON can be used effectively on SQL Server 2016, although it’s not implemented as a native data type;
- You should always provide alias for expression results on your data output if you’d like to JSON formatting;
- The JSON_VALUE and JSON_QUERY functions parse and get your data in varchar format, so you’ve got to convert it to the data type you want;
- It’s possible to index JSON search fields with the help of computed columns .