Azure Cognitive Services API’s with SQL Integration Services Packages (SSIS)

,

[read this post on Mr. Fox SQL blog]

I had a recent requirement to integrate multi-language support into a SQL DW via a SQL SSIS ETL solution.  Specifically the SQL DW platform currently only supported English translation data for all Dimension tables, but the business was expanding internationally so there was a need to include other language translations of the Dimensional attributes.

We wanted to do this without having to manually translate English text attributes that exist already, or new ones that are added or modified over time.  We wanted an automated method that simply “worked“.

Enter Azure Cognitive Services Translator Text API service!

So the purpose of this blog is to outline the code/pattern we used to integrate the Azure Cognitive Services API into SQL SSIS ETL packages.

Primer on Azure Cognitive Services

Azure Cognitive Services has been available in Azure for almost 2 years now.  They are a suite of API’s that expose amazing intelligent AI services which have the ability to do some truly amazing things.  The services cover the 5 core pillars of Vision, Speech, language, knowledge and Search.

As at writing there are almost 30 Azure Cognitive API’s.  You can see the full list of all API’s here – https://azure.microsoft.com/en-us/services/cognitive-services/directory/

I have blogged before on an end-to-end AI scenario which applies several of the Azure Cognitive Services together in a single solution.  You can read about that here – https://mrfoxsql.wordpress.com/2016/09/13/azure-cognitive-services-apis-with-sql-server-2016-clr/

(Interesting aside – when I wrote that original blog post 18 months ago there were only 21 Cognitive API’s, and now there’s 28!  Cannot wait to see where this space goes in the next 18 months!)

 

The Azure Cognitive Services “Translator Text API”

So, going back to my scenario of language translation in SSIS for ETL workloads

My SQL SSIS package leverages the Translator Text API service.  For those who want to learn the secret sauce then I suggest to check here – https://azure.microsoft.com/en-us/services/cognitive-services/translator-text-api/

essentially this API is pretty simple;

  1. It accepts source text, source language and target language.  (The API can translate to/from over 60 different languages.)
  2. You call the API with your request parameters + API Key
  3. The API will respond with the language translation of the source text you sent in
  4. So Simple, so fast, so effective!

To use the Translator Text API service, you need to provision the service in the Azure Portal which is shown here – https://docs.microsoft.com/en-us/azure/cognitive-services/translator/translator-text-how-to-signup

Billing is based on number of characters included in the text to be translated.  There is a FREE tier that allows you to translate up to 2M characters.  (For reference, a typical 30-page document has around 17K characters.).  Overall its very economical for ETL solutions!

What else can this delightful Translator Text API do, I hear you think…

  • Text Transformation (ie normalise loose text like “Dis is 2 strange” into something that, well, non-teenagers can understand!)
  • Custom Transformation (ie translate your specific business domain terms/words that otherwise wouldn’t translate into other languages)
  • Profanity Filtering (ie mark up profanity, or remove it entirely from the string)

The Full API Reference is here – http://docs.microsofttranslator.com/text-translate.html

The Full FAQ is here – https://www.microsoft.com/en-us/translator/faq.aspx

 

SQL SSIS package solution

The SQL SSIS package Data Flow workspace looks like this below.

 

 

 

 

 

 

 

 

The ETL package itself isnt that technologically edgy, in fact its pretty darn basic really.  The interesting bit (for this blog anyway) is the “Script Component” which calls the Azure Cognitive Services API and collects the API response.

Its important to note that this Script Component is defined as a Transformation script and the code has been written in c# (VS2017).

SQL SSIS – Transformation c# “Script Component”

 
#region Namespaces
using System;
using System.IO;
using System.Net;
using System.Runtime.Serialization;
using System.Web;
#endregion
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
 public override void Input0_ProcessInputRow(Input0Buffer Row)
 {
 string AuthKey = Variables.TranslateAuthKey;
 string TranslateFrom = Variables.TranslateFrom;
 string TranslateTo = Variables.TranslateTo;
 string BaseURL = "https://api.microsofttranslator.com/V2/Http.svc/Translate";
 string EnglishProductSubcategoryName = Row.EnglishProductSubcategoryName; // The source text to translate
 
 // Make REST call to Azure Cognitive Service Translator Text API
 string FullURL = BaseURL + "?text=" + HttpUtility.UrlEncode(EnglishProductSubcategoryName) + "&from=" + TranslateFrom + "&to=" + TranslateTo;
 HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(FullURL);
 httpWebRequest.Headers.Add("Ocp-Apim-Subscription-Key", AuthKey);
 using (WebResponse response = httpWebRequest.GetResponse())
 using (Stream stream = response.GetResponseStream())
 {
 DataContractSerializer dcs = new DataContractSerializer(Type.GetType("System.String"));
 Row.GermanProductSubcategoryName = (string)dcs.ReadObject(stream); // The translated text value
 }
 }
}

SQL SSIS – Adding Script References to the c# code

To make the script work, you need to ensure you add references in the Script component editor for the following libraries…

  • System.Net.Http
  • System.Net.Http.WebRequest
  • System.Runtime.Serialization
  • System.Web

SQL SSIS – package variables and connections

When you open the solution in VS2017, you will need to update the following…

  1. Package VariableTranslateAuthKey” holds the Translator Text API Key.  You need to deploy the Translator Text API service via the Azure Portal into your Subscription (as per the link in the above section), and grab the API Key.
  2. Package VariableTranslateFrom” holds the Source Language.  Currently this is set as “en” for English language transaction.
  3. Package VariableTranslateTo” holds the Target Language.  Currently this is set as “de” for German language transaction.  This could be any up to 60 languages.
  4. The Connection Manager to point the package to your SQL Server where you want to create some sample Stage + DW tables.  They will be created and loaded with sample data by the SQL SSIS package in your tempdb database

Once done you can just run the SSIS package and it will connect to your SQL Server, provision the database tables in the SQL tempdb database, create some sample data and call out to the Azure Cognitive API to translate!

 

The Translator Text API in Action in SQL SSIS!

So lets do some translation testing – all we need to do is run the SQL SSIS translation package!

On the first run the package will create the Stage and DW tables in tempdb, and create 30 new Stage rows to be translated and loaded into the DW table.

 

 

 

 

 

 

 

 

 

The package takes about 5 seconds to run end to end. So it was translating at a rate of about 6 rows/sec (which also includes all of the key lookups and prep time)

Ok, awesome, but what about if we just add one new Stage data row and one updated Stage data row?

 

And the execution run in SQL SSIS

 

 

 

 

 

 

 

 

And the translation result in SSMS

Ok, so we can translate text, and very quicklyMaybe next time we should add in the Azure Cognitive Text to Speech API so automate actually saying it! ??

 

VS2017 Solution Downloads

The Visual Studio 2017 Solution containing the SQL SSIS package I wrote with all of the code and references can be downloaded from my github repo here – https://github.com/rolftesmer/SQLSSIS_AzureCognitiveAPI

The SSIS solution isnt considered “production ready” and is just a quick example of how you can write a SQL SSIS package to call out to external web API’s, and integrate the response into a downstream ETL data flow.

Feel free to download the solution, and modify as needed to suit your scenario.

 

Summary

So there you have it, a pretty simple method to integrate Azure Cognitive API’s right into your SSIS ETL and related data processes.

As at writing, given there are almost 30 Azure Cognitive API’s, then this opens up many other ways you could use the code/pattern in this blog for intelligence processing right within the SQL SSIS ETL services…

  1. Parse a set of picture files in a directory or blob store though the Vision API’s (Computer Vision API, Face API, Content Moderator API, etc) and save the tagged metadata into a database for later processing.
  2. Similar to the above, but this time with sound byte files parsing though the Speech API’s (Translator API, Recognition API, Bing Text to Speech / Speech to Text API’s, etc)
  3. Performing sentiment analysis and key phrase extraction on text you are loading into a database, or reading from a database using the Text Analytics API’s
  4. …and probably a stack more I haven’t thought of so far

Also given that SQL SSIS is such a powerful ETL service, you can of course combine the API response with any other data you may already have in your source or target platforms to create a fully integrated intelligent ETL solution!

 

So as usual, and as I always say, please test this out yourself with your own data as your mileage may vary!


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating