JSON Select is a new library that exposes several CLR functions that make it easy to pull values out of JSON strings in SQL Server 2005+. In this article we'll look at how we can use it to easily interrogate and even index values from JSON.
First things first, a small disclaimer, I am the author of JSON Select, and as such I have interest in you using it.
An Example
Now, onto the fun stuff. Imagine you're the DBA in a business that's starting up a new online bookstore. The website developers have made heavy use of JSON in the database to simplify and speed up development. Version 1.0 is released, purchases are starting to come through and, as one might expect, upper-middle-management are asking for a bunch of reports. They want to know all there is to know about customers and orders and stock levels and goodness knows what else on an hourly basis. Of course, this is a piece of cake, a few select statements with aggregate functions and group-bys and you'll be well on your way.
Or, at least, you would be, if all the useful data wasn't hidden away in nvarchar(max) columns of JSON!
At this point you have a few options:
- Dump SQL Server in favour of something like Postgres with native JSON support
- Wait for Microsoft to implement JSON Support and then upgrade. Upper-middle-management will have to be a bit patient.
- Write something for yourself to parse the JSON
- Use JSON Select and save yourself a lot of hassle
Let's go with option 4. Imagine your Order table looks something like this:
create table [Order] ( [Id] int identity(1,1), [Json] nvarchar(max) null, [RowVersion] rowversion not null, constraint [PK_Order] primary key ([Id]) )
And the JSON for an order looks something like this (this bookstore has free shipping and only services Australia for the sake of simplicity):
{
"Name": "P. Sherman",
"Address": {
"Line1": "42 Wallaby Way",
"Line2": null,
"Suburb": "Sydney",
"State": "NSW",
"Postcode": "2000"
},
"OrderLines": [
{
"ProductId": 72,
"Description": "Hairy Maclary from Donaldson's Dairy",
"Quantity": 1,
"UnitPrice": 5.95,
"LineTotal": 5.95
}
],
"OrderTotal": 5.95,
"DateTimeOrdered": "2015-07-02 13:45:71"
}
The sales manager wants to know how sales are doing by state so he can collaborate with the marketing manager on developing an advertising plan with her team. Here's how you might start writing a query to do that with JSON Select, using the JsonNVarChar450(), JsonDecimal() and JsonDateTime() functions.
select dbo.JsonNVarChar450([Json], 'Address.State') [State], count(*) [OrderCount], avg(dbo.JsonDecimal([Json], 'OrderTotal')) [AverageOrderTotal], sum(dbo.JsonDecimal([Json], 'OrderTotal')) [TotalSales], from [Order] where dbo.JsonDateTime([Json], 'DateTimeOrdered', 'yyyy-MM-dd HH:mm:ss') > @someVariable group by dbo.JsonNVarChar450([Json], 'Address.State')
Here you can see the basics of how JSON Select is used. There are a few different functions for returning various data types, and for the most part they take a string of JSON and another string specifying what you're looking for in the JSON, which I like to call a query.
The query language for JSON Select is pretty straightforward - if you've used a programming language like C# or JavaScript before, it's just like accessing the properties of an object. Properties on the base level can be simply accessed by their name, and you use a period (".") to access subproperties. You can also use indexing on arrays with the familiar [zero-based-index]
syntax, e.g. OrderLines[0].Description
.
One important thing to note here is that, depending on the size of the database, things could start to get slow very quickly, as you're parsing a JSON document on each row (sometimes more than once). While there is some internal caching to help speed things up a bit, if you're going to run a query like this frequently it would definitely be worth considering adding an index. Which brings us on to...
Indexing
Indexing is one of those things that hasn't traditionally been very easy to do with JSON, but with JSON Select it's rather simple. If you already know how to index the result of a function call, then this will be nothing new, but if you don't, here's a quick run down. Basically, you need to make a persisted computed column, and then add an index on that column, as follows. Let's speed up our sales-by-state report from earlier, by adding an index to the DateTimeOrdered
property:
alter table [Order] add DateTimeOrdered as dbo.JsonDateTime([Json], 'DateTimeOrdered', 'yyyy-MM-dd HH:mm:ss') persisted go create index IX_Order_DateTimeOrdered on [Order](DateTimeOrdered)
This should speed up our query, as SQL Server can now eliminate rows that are older than what we ask for in our where clause. We don't need to change the query, but because we've added a computed column to the [Order]
table, we could rewrite the where clause to make it a bit more readable, like so:
select dbo.JsonNVarChar450([Json], 'Address.State') [State], count(*) [OrderCount], avg(dbo.JsonDecimal([Json], 'OrderTotal')) [AverageOrderTotal], sum(dbo.JsonDecimal([Json], 'OrderTotal')) [TotalSales], from [Order] where [DateTimeOrdered] > @someVariable group by dbo.JsonNVarChar450([Json], 'Address.State')
That's All, Folks!
Well, that's about it for this brief introduction to JSON Select. Please head over to https://www.jsonselect.com to find out more, and please let me know what you think in the comments 🙂 This is my first article on SQLServerCentral, so I'd definitely appreciate any feedback on the content and style of the article. Thank you for reading!