Introduction
There is no shortage of a need for connecting to data in various systems. These integrations are often done by pulling data from web APIs, specifically over HTTP. To get this data available in another database, usually a tool such as SSIS or another programming language is involved to utilize their HTTP libraries to connect to the API and load its data into a database.
This article aims to remove the need for other programming languages or tools by utilizing SQL Server's CLR integration to create a function that allows sending HTTP requests and receiving the responses directly in the database. It still utilizes another programming language to do the HTTP request, but it's a single object that is created once and can then be used to query the internet (literally!) from SQL. This has a lot of potential to cut back on development time and also simplify architecture.
If the idea of this being a native function in SQL Server that doesn’t need CLR integration is appealing, feel free to vote for this feature suggestion: https://feedback.azure.com/forums/908035-sql-server/suggestions/34429699-http-request-function
APIs typically return data in either JSON or XML formats and SQL Server already has rich features for working with data in these formats. Sometimes things can get a little more challenging if the response is CSV data, but another CLR function to convert CSV to JSON or XML could alleviate that.
Also, to simplify working with CSV in SQL Server this would be another great feature suggestion to vote for: https://feedback.azure.com/forums/908035-sql-server/suggestions/34429636-opencsv-function
Use Cases/Examples
More detail on the CLR function is found later in this article, but it might be useful to understand usage before getting into how it works. Here is an example to retrieve Stack Overflow questions from their API.
-- Query the Stack Overflow API and get a response DECLARE @response XML = [dbo].[clr_http_request] ( 'GET', 'http://api.stackexchange.com/2.2/questions?site=stackoverflow', NULL, NULL, 300000, 1, 0 ); -- Extract just the body of the response (expecting JSON) DECLARE @response_json NVARCHAR(MAX) = @response.value('Response[1]/Body[1]', 'NVARCHAR(MAX)'); -- Parse the JSON into a tabular format SELECT B.[question_id], B.[title], B.[tags], B.[is_answered], B.[view_count], B.[answer_count], B.[score] FROM OPENJSON(@response_json) WITH ([items] NVARCHAR(MAX) AS JSON) A CROSS APPLY OPENJSON(A.[items]) WITH ( [question_id] INT, [title] NVARCHAR(MAX), [tags] NVARCHAR(MAX) AS JSON, [is_answered] BIT, [view_count] INT, [answer_count] INT, [score] INT ) B;
The results look like this:
question_id | title | tags | is_answered | view_count | answer_count | score |
52088338 | QuickCheck - Haskell (generate a random string-date) | ["haskell","quickcheck"] | 0 | 1 | 0 | 0 |
52086832 | Cross Database Join Collation Issue | ["sql-server","union","collation"] | 0 | 12 | 0 | 0 |
52088335 | Ubuntu / NoMachine | ["ubuntu","rdp","nomachine"] | 0 | 2 | 0 | 0 |
52088005 | Exotic GROUP BY In MySQL | ["mysql","indexing"] | 1 | 20 | 1 | 2 |
52048980 | Apache modrewrite htaccess redirect | [".htaccess","mod-rewrite"] | 0 | 16 | 1 | 0 |
To better understand what is happening, here is a breakdown of each step.
This first part is calling the CLR HTTP Request function, sending a “GET” request to Stack Overflow’s “questions” API endpoint. The function returns XML describing the response. Detail on each of the parameters and the returned XML are described in more detail later in this article.
-- Query the Stack Overflow API and get a response DECLARE @response XML = [dbo].[clr_http_request] ( 'GET', 'http://api.stackexchange.com/2.2/questions?site=stackoverflow', NULL, NULL, 300000, 1, 0 );
The second part is querying the @response XML to retrieve the body of the response. For more information on querying XML data, specifically the “.value()” function, check out https://docs.microsoft.com/en-us/sql/t-sql/xml/value-method-xml-data-type
-- Extract just the body of the response (expecting JSON) DECLARE @response_json NVARCHAR(MAX) = @response.value('Response[1]/Body[1]', 'NVARCHAR(MAX)');
In this example, the response looks something like this:
{ "items":[ { "tags":["haskell", "quickcheck"], "is_answered":false, "view_count":1, "answer_count":0, "score":0, "question_id":52088338, "title":"QuickCheck - Haskell (generate a random string-date)" }, { "tags":["sql-server", "union", "collation"], "is_answered":false, "view_count":12, "answer_count":0, "score":0, "question_id":52086832, "title":"Cross Database Join Collation Issue" } ] }
The last step is to examine the format of the JSON and use SQL Server’s OPENJSON functionality to parse it into a tabular format. For more information on OPENJSON, see https://docs.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql?view=sql-server-2017
-- Parse the JSON into a tabular format SELECT B.[question_id], B.[title], B.[tags], B.[is_answered], B.[view_count], B.[answer_count], B.[score] FROM OPENJSON(@response_json) WITH ([items] NVARCHAR(MAX) AS JSON) A CROSS APPLY OPENJSON(A.[items]) WITH ( [question_id] INT, [title] NVARCHAR(MAX), [tags] NVARCHAR(MAX) AS JSON, [is_answered] BIT, [view_count] INT, [answer_count] INT, [score] INT ) B;
Note, this could all be shortened down to a single SELECT statement if desired, which would look like this:
SELECT B.* FROM OPENJSON ( [dbo].[clr_http_request] ( 'GET', 'http://api.stackexchange.com/2.2/questions?site=stackoverflow', NULL, NULL, 300000, 1, 0 ).value('Response[1]/Body[1]', 'NVARCHAR(MAX)') ) WITH ([items] NVARCHAR(MAX) AS JSON) A CROSS APPLY OPENJSON(A.[items]) WITH ( [question_id] INT, [title] NVARCHAR(MAX), [tags] NVARCHAR(MAX) AS JSON, [is_answered] BIT, [view_count] INT, [answer_count] INT, [score] INT ) B;
Okay, Stack Overflow was rather easy since it's a basic REST endpoint with no authentication. What if we're trying to reach an API that requires authentication and most of the documentation describes using an SDK/code library that obviously isn't available here?
That does a decent job describing Google AdWords' API. This example will get an access token and use that to pull a performance report from Google AdWords.
-- Authentication variables DECLARE @refresh_token VARCHAR(500) = '...'; DECLARE @client_id VARCHAR(500) = '...'; DECLARE @client_secret VARCHAR(500) = '...'; DECLARE @client_customer_id VARCHAR(500) = '...'; DECLARE @developer_token VARCHAR(500) = '...'; -- Use the AdWords Query Language to define a query for the Keywords Performance Report and specify desired format DECLARE @awql VARCHAR(MAX) = ' SELECT CampaignId, CampaignName, AdGroupId, AdGroupName, Id, Criteria, Device, Date, Impressions, Clicks, Cost, AveragePosition FROM KEYWORDS_PERFORMANCE_REPORT WHERE Impressions > 0 DURING 20180801,20180805 '; DECLARE @fmt VARCHAR(50) = 'XML'; -- Get access token DECLARE @access_token VARCHAR(500) = JSON_VALUE( [dbo].[clr_http_request] ( 'POST', 'https://www.googleapis.com/oauth2/v4/token', CONCAT('grant_type=refresh_token&refresh_token=', @refresh_token, '&client_id=', @client_id, '&client_secret=', @client_secret), NULL, 300000, 0, 0 ).value('/Response[1]/Body[1]', 'NVARCHAR(MAX)'), '$.access_token' ); -- Get report DECLARE @report_xml XML = CAST(REPLACE( [dbo].[clr_http_request] ( 'POST', 'https://adwords.google.com/api/adwords/reportdownload/v201802', CONCAT('__fmt=', @fmt, '&__rdquery=', @awql), CONCAT(' <Headers> <Header Name="Authorization">Bearer ', @access_token, '</Header> <Header Name="developerToken">', @developer_token, '</Header> <Header Name="clientCustomerId">', @client_customer_id, '</Header> </Headers> '), 300000, 1, 0 ).value('/Response[1]/Body[1]', 'NVARCHAR(MAX)'), '<?xml version=''1.0'' encoding=''UTF-8'' standalone=''yes''?>', '' ) AS XML); -- Parse report XML SELECT A.[row].value('@campaignID', 'BIGINT') [campaign_id], A.[row].value('@campaign', 'VARCHAR(500)') [campaign_name], A.[row].value('@adGroupID', 'BIGINT') [ad_group_id], A.[row].value('@adGroup', 'VARCHAR(500)') [ad_group_name], A.[row].value('@keywordID', 'BIGINT') [keyword_id], A.[row].value('@keyword', 'VARCHAR(500)') [keyword], A.[row].value('@device', 'VARCHAR(50)') [device], A.[row].value('@day', 'DATE') [date], A.[row].value('@impressions', 'INT') [impressions], A.[row].value('@clicks', 'INT') [clicks], A.[row].value('@cost', 'BIGINT') [cost], A.[row].value('@avgPosition', 'FLOAT') [average_position] FROM @report_xml.nodes('/report/table/row') A ([row]);
Results look like this:
campaign_id | campaign_name | ad_group_id | ad_group_name | keyword_id | keyword | device | date | impressions | clicks | cost | average_position |
---|---|---|---|---|---|---|---|---|---|---|---|
1234 | Campaign A | 51423 | Cool Stuff Ad Group | 6211415 | cool stuff | Computers | 8/5/2018 | 1430 | 229 | 3210.631846 | 2.4 |
1234 | Campaign A | 51423 | Cool Stuff Ad Group | 6211415 | cool stuff | Computers | 8/3/2018 | 588 | 162 | 680.4804449 | 1.7 |
1234 | Campaign A | 51423 | Cool Stuff Ad Group | 6211415 | cool stuff | Computers | 8/4/2018 | 368 | 29 | 742.1357105 | 3.3 |
1540 | Campaign B | 62130 | Awesome Stuff Ad Group | 7133217 | awesome stuff | Mobile devices with full browsers | 8/5/2018 | 1563 | 563 | 1094.902162 | 1.8 |
1401 | Campaign C | 61050 | Mediocre Stuff Ad Group | 6321354 | mediocre stuff | Computers | 8/1/2018 | 1236 | 190 | 643.7945118 | 2.2 |
The next steps could be to take this code, wrap up the two calls to [dbo].[clr_http_request]() in a scalar function to retrieve the report XML, and create a table-valued function to do the parsing. Then one could easily query the API with a query like this:
SELECT * FROM [dbo].[ufn_parse_adwords_xml]([dbo].[ufn_get_adwords_report](@awql));
Note: This example is only intended to give an idea of what is possible; explaining in detail would be outside of the scope of this article.
There are many APIs out there to query. Here are several other examples:
- Other Ad performance reports (Bing/Yahoo!/Facebook/etc.)
- Affiliate networks (commissions, etc.)
- Currencies
- Time zones
- User agent parsers
- SaaS applications, like ticketing systems such as Jira
Or maybe one could use this for testing automation or monitoring by querying a website to simply check for availability. Here's a basic example just getting a response from Google’s home page:
SELECT [dbo].[clr_http_request]('GET', 'https://www.google.com/', NULL, NULL, 10000, 0, 0);
It would certainly be interesting to see what other use cases are out there. Please share ideas!
The CLR Function
This is a SQL Server CLR function that calls an assembly written in C#, utilizing its HttpWebRequest class to make an HTTP Request and return the response.
Input Parameters
These are the parameters that can be passed into the function:
requestMethod (string)
Most often "GET" or "POST", but there are several others used for various purposes.
url (string)
The URL attempting to connect to, such as an API endpoint
parameters (string)
If a GET request, these will just get added into the query string. In that case you could just include them in the url parameter and pass NULL for parameters.
Otherwise, these parameters will be converted to a byte array and added to the content of the HTTP request.
Format of this parameter matches that of a URL query string where you have key=value pairs separated by "&":
param1=A¶m2=B
headers (string, in XML format)
This allows you to set headers for the HTTP request. They are passed as XML following this format:
<Headers>
<Header Name="MyHeader">My Header's Value</Header>
<Header Name="…">…</Header>
<Header Name="…">…</Header>
</Headers>
timeout - (integer)
Allows a timeout in milliseconds to be set.
autoDecompress (boolean)
Some APIs may compress their results (Stack Overflow is an example) and passing true here will decompress the response (if it is compressed).
convertResponseToBase64 (boolean)
If the resource delivers binary data, things will break down in saving that data to the XML response for this function. This parameter may be used to resolve that issue by converting the binary data to a Base64 string. Once the Base64 string is in SQL Server it can be decoded back to binary using some magic from the XML datatype. Here is an example:
CAST(@string AS XML).value('.', 'VARBINARY(MAX)')
Results
The result from this function is an XML document generated from the properties available in the HttpWebResponse class. This is the structure of that XML. For more information on these, see the documentation on HttpWebResponse (https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebresponse)
Response - this is the root element
CharacterSet
ContentEncoding
ContentLength
ContentType
CookiesCount
HeadersCount
IsFromCache
IsMutuallyAuthenticated
LastModified
Method
ProtocolVersionResponseUri
StatusCode
Server
StatusNumber
StatusDescription
SupportsHeaders
Headers
Header - each header will get its own node here
Name
Values - a header can have multiple values in C#'s HttpWebResponse
Value
Body - this will contain the content from the response
Overview of C# code flow
It would be good to look at the source code to get a deeper understanding of what is happening, but here is a general overview of the flow:
Creates an HttpWebRequest object from the url
Parses the provided Headers XML to set the HTTP Request's headers
Sets the request method, timeout, and automatic decompression configurations based on inputs
Adds in parameters (for non-GET requests) by converting the string to a byte array and writing it to the request stream
Makes request and retrieves response
Converts response into format described above and returns it
Compiling the C# is a straight-forward process. With the solution open in Visual Studio (attached in ClrHttpRequest.zip), right-click on the "ClrHttpRequest" project and choose “Build”. This will create the "ClrHttpRequest.dll" file in bin\Debug, which can then be used with clr_http_request.sql.
A couple notes:
Visual Studio 2017 Community Edition with SQL Server Data Tools was used to develop this. You can get Visual Studio at https://visualstudio.microsoft.com/downloads/ and while installing you may choose to install SQL Server Data Tools under “Data storage and processing”. For more information see https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt
This is currently compiled with .NET 4.5 as the target framework. That will work with SQL Server 2012 and later (but examples in this article using JSON functionality will only work in SQL Server 2016 and later). It should compile to .NET 3.5 just fine if you need to install on an earlier version of SQL Server and changing out some things like XElement with XmlDocument could bring it all the way down to .NET 2.0 if needed in SQL Server 2005.
Creating the function
To create the function, follow these steps using files found in the attached ClrHttpRequest.zip:
Copy ClrHttpRequest.dll to C:\ on the machine running the SQL Server instance
Run code in clr_http_request.sql to create the function (tweaking as necessary to adjust which database should be used).
Note: This script includes code to change configuration of the SQL Server instance by enabling the CLR integration.
That’s it! Should be ready to use now.
Conclusion
I've found this to be an invaluable tool in quickly getting to data and exploring APIs. Depending on needs, something like SSIS will still be a superior tool, but for many needs this may provide a simpler alternative by reducing the tools and skillsets needed to create data integrations.