[read this post on Mr. Fox SQL blog]
Recently I had a requirement to combine data that I already had in SQL Server (2016) with JSON document data already stored in Azure CosmosDB. Both databases were operational and continuously accepting data so I didn’t want to go to the trouble of doing the delta load thing between them, instead I just wanted to be able to query directly on demand.
And so – the purpose of this article is to outline the method to connect direct to Azure CosmosDB from SQL Server using a SQL Linked Server.
Finally … SQL & NoSQL … together at last!
For those interested to learn more about Azure CosmosDB, check out my previous blog post here – https://mrfoxsql.wordpress.com/2016/05/11/azure-documentdb-preparing-loading-querying-data/
Or the official documentation here – https://docs.microsoft.com/en-us/azure/cosmos-db/
And so right up front – this solution only works for SQL Server on VM/IaaS – and is not supported for Azure SQL DB (ASDB) – mainly as ASDB doesn’t support SQL Linked Servers! (Damn, they say!)
What Data is in my CosmosDB Database?
For the sake of this blog post – I’ve created a simple JSON document structure as shown below by combining several tables in SQL Server AdventureWorks database. I then extracted the data into JSON documents and loaded them into my Azure CosmosDB database.
[ { "id": "15447", "Name": { "First": "Regina", "Last": "Madan" }, "Orders": [ { "OrderID": "58744", "OrderDate": "2013-10-27", "OrderNumber": "SO58744", "Amount": 27.614 } ] } ]
If you are interested in how this is done – then my previous blog post here shows you an example of the method I used – https://mrfoxsql.wordpress.com/2016/05/11/azure-documentdb-preparing-loading-querying-data/
See “Creating JSON Documents from SQL Server”
Setting Up the CosmosDB ODBC System DSN
To connect SQL Server to CosmosDB you need an ODBC driver installed onto the VM. Its a little fiddly the first time you set it up, but once done you often dont need to revisit it.
You can get the ODBC driver for free here – https://docs.microsoft.com/en-us/azure/cosmos-db/odbc-driver#install
The one you want is most likely the 64-bit MSI for 64-bit Windows.
Once installed, when you open the ODBC client tool you should see the driver installed something like below. And, yes, yes, I know, its still called DocumentDB here!
Under the System DSN tab, click ADD a new entry for the Driver and you will see something like this below. This one is already setup for my own Azure CosmosDB.
The values you want to enter are;
- Data Source Name: ACDB (or whatever – but keep it short as we’ll use this in SQL)
- Host: https://COSMOSDBNAME.documents.azure.com:443/
- Access Key: COSMOSDBKEY
You can get your COSMOSDBNAME and COSMOSDBKEY from the Azure Portal under the Keys section of your Azure CosmosDB Account.
TEST the connection to ensure its working OK.
Setting Up the CosmosDB ODBC Schema
This next bit is probably the most tricky, so I will try to explain it.
Azure CosmosDB is a NoSQL database and stores data in the form of JSON documents. JSON is by design a fairly loose format. Given we want to query it like a relational database then we need to “flatten” the documents into a format which reads like standard relational tables.
What we're effectively doing is giving the JSON documents a defined structure that makes sense when reading it from SQL Server. This is otherwise known as Schema-On-Read.
To do this we need to use the Schema Editor installed along with the Azure CosmosDB ODBC driver. You can read more about the process to do that here – https://docs.microsoft.com/en-us/azure/cosmos-db/odbc-driver#a-idcollection-mappingastep-3-create-a-schema-definition-using-the-collection-mapping-method
Essentially – what this tool does is help you automatically define the schema you want to use for the JSON documents stored in your Azure CosmosDB. It is this schema we will see when querying CosmosDB from SQL Server.
When I run the Schema Creator Tool (using the Create New + Sample Method) across my own Azure CosmosDB database, it auto-creates a flattened schema that looks as below. At a glance, you can see a close match to the fields in the JSON document sample I showed at the start of this blog post.
Any columns you see named “_<column>” are standard Azure CosmosDB system generated.
The generated schema has 2 tables in a parent/child structure; [Customers]–>[Orders]
Lastly, but absolutely and definitely not least, you MUST ensure that you attach that saved schema format file to your ODBC DSN. You do this via clicking on the Advanced Options button and selecting the schema format file you just created.
Several times I’ve made edits to the schema format file only to have forgotten to select it and attach it to the ODBC DSN – and then later wondered where the hell my changes went.
Here’s what that bit looks like…
Tips for Using the CosmosDB ODBC Schema Editor
Now I’m going to impart a nickles worth of free advice which I hope will help should you run into difficulties (which if you are like me you probably will!)
I’ve been playing with the ODBC Schema Editor across different JSON documents stored in different CosmosDB’s and I’ve noted a few things below…
- The defined schema is based on a representative sample of JSON documents present in your CosmosDB database. If your CosmosDB is empty the tool returns nothing. I have found you generally need at least around 120 or so documents to make it work best (or make it work at all actually).
- The Schema Editor doesn’t work so well when CosmosDB contains JSON documents which comprise of several different/wide formats. Why yes, I know that’s what JSON and NoSQL is all about, but hey unfortunately to read that data in a relational database system we need to consistently and reliably flatten it out. If you have massive variation in your JSON documents you’ll probably run into issues here – and you will need to do further manual format edits. Generally I found the Schema Editor works best if you have a reasonably consistent JSON document structure with only minor variations.
- You can manually edit the schema format file it creates in the Schema Editor tool, but the changes you can do are limited to turning on/off fields, renaming, and changing mapped data types.
- You can save the schema format file from the Schema Editor tool which simply writes out the definitions as a JSON file which is generally under 100KB.
- I strongly recommend to take a backup copy of the schema format file into your code management solution. It needs to be treated as a version controlled object.
- You can manually re-edit the schema format file anytime via loading it back into Schema Editor again and selecting the option to load a local schema format file. You dont need to regenerate it from scratch every time.
- Rather nicely you can also manually edit the schema format file via your favourite JSON editor outside of the ODBC Tool. I’ve done things like manually renaming auto-created field names, data type changes, visibility settings, some basic structural changes, etc. I suggest to take a copy of the file before you start – you can easily + royally screw things up with a wayward search/replace command.
- If you do manually edit the schema format file via an editor then you can load it back into the official ODBC Schema Editor again – and if it loads OK then you are probably good with your manual changes.
- However – so far I have found that if you get any change incorrect (manual or via the tool) you generally won’t know until such time as you query that table/field from the SQL Linked Server via SSMS and your query subsequently explodes with a page full of unhelpful red generic error messages. More on this below.
Setting up the Azure CosmosDB SQL Linked Server
Once the ODBC driver and System DSN is configured, then all we need to do is to create a standard SQL Server Linked Server to reference the ODBC System DSN!
Open SSMS, create a new connection to your SQL Server 2016 instance and run the code.
USE [master] GO EXEC master.dbo.sp_addlinkedserver @server = N'ACDB', @srvproduct=N'ACDB', @provider=N'MSDASQL', @datasrc=N'ACDB' EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'ACDB',@useself=N'False',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'collation compatible', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'data access', @optvalue=N'true' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'dist', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'pub', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'rpc', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'rpc out', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'sub', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'connect timeout', @optvalue=N'0' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'collation name', @optvalue=null EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'lazy schema validation', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'query timeout', @optvalue=N'0' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'use remote collation', @optvalue=N'false' EXEC master.dbo.sp_serveroption @server=N'ACDB', @optname=N'remote proc transaction promotion', @optvalue=N'false' GO
Once created you will then see the linked server in SSMS as per below.
Finally – Querying CosmosDB from SQL Server
To reference the SQL linked server we need to use OPENQUERY, which executes the specified pass-through query on the linked server. Directly referencing the table via the 4-part naming does not work (ie server.database.owner.table will fail).
Here’s some examples of the types of SQL queries we can run across the ODBC driver.
Customers Rowcount
SELECT * FROM OPENQUERY ( ACDB, ' SELECT COUNT(*) AS ROWCNT FROM PerfTest.CustomerOrders.CustomerOrders ' ) GO
Orders Rowcount
(NOTE how I had to double quote “” around names with embedded brackets [ ] ). Ideally I should have renamed this to something normal in the Schema Editor tool.
SELECT * FROM OPENQUERY ( ACDB, ' SELECT COUNT(*) AS ROWCNT FROM PerfTest.CustomerOrders."CustomerOrders_Orders[]" ' ) GO
Top 10 Customers Where Name = X
SELECT * FROM OPENQUERY ( ACDB, ' SELECT TOP 10 * FROM PerfTest.CustomerOrders.CustomerOrders WHERE Name_First = ''Regina'' ' )
Orders Where Amount > X
SELECT * FROM OPENQUERY ( ACDB, ' SELECT * FROM PerfTest.CustomerOrders."CustomerOrders_Orders[]" WHERE "CustomerOrders_Orders[]_Amount" > 50000 ' )
Customers Join Orders Where ID Between X and Y Order By Last_Name
SELECT * FROM OPENQUERY ( ACDB, ' SELECT c.*, o.id AS Order_Customer_ID, o."CustomerOrders_Orders[]_OrderID", o."CustomerOrders_Orders[]_Amount" FROM PerfTest.CustomerOrders.CustomerOrders AS c INNER JOIN PerfTest.CustomerOrders."CustomerOrders_Orders[]" AS o ON c.id = o.id WHERE c.id BETWEEN 29000 and 29100 ORDER BY c.Name_Last ' )
CosmosDB Customers Join SQL Server AdventureWorksDW Customers
SELECT oq.*, dc.* FROM OPENQUERY ( ACDB, ' SELECT TOP 10 * FROM PerfTest.CustomerOrders.CustomerOrders WHERE Name_First = ''Regina'' ' ) as oq INNER JOIN [AdventureWorksDW2016].[dbo].[DimCustomer] dc on oq.id = dc.CustomerKey GO
Managing ODBC Driver Query Errors
Lastly, as mentioned above when your query crashes out you often receive a generic error like this below.
Msg 7399, Level 16, State 1, Line 55
The OLE DB provider “MSDASQL” for linked server “ACDB” reported an error. The provider did not give any information about the error.
The reasons for this could be, quite frankly, wide and varied!
- Your query syntax isn’t correct
- Your ODBC schema format file isn’t correctly mapped to the underlying Azure CosmosDB database
- You hit a data type issue where the type in the format file doesn’t match the actual underlying type in Azure CosmosDB itself (ah yes, the joy of NoSQL!)
- You issue a query/command that isn’t supported in the ODBC driver (yet). Unfortunately I have yet to see a list of the types of standard SQL commands that are or are not supported.
- …and possibly a few other error reasons I haven’t quite hit yet
Long short, when you get this error you need to systematically troubleshoot the query + schema + data bit-by-bit to pull apart that issue.
So in Summary
So there you have it – a pretty easy method to query CosmosDB data right within SQL Server and begin to join data sets across SQL and NoSQL database platforms!
As per usual (and as I always say) please test this yourself as your milage may vary!
Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here