Our organization creates applications to harvest data from a large number of web sites to feed a mature ERP system for back-end processing. The web data comes in many formats and structures, like SOAP (XML) or REST (JSON), which need to be massaged to fit the SQL tables supporting the ERP system, like Customer, Orders and Ship To sites. One solution to managing this data would be the creation of specific ‘buffer’ tables, which would harvest the data from Web Site Store_ONE and use code to copy these rows into the actual ERP tables. This solution is quite simple and respects the concept of normalization. It becomes hard to manage as the requirements change quickly, requiring additional columns to be added and maintained across the system.
When we need to harvest from Web Site Store_TWO, we are often faced with additional requirements demanding some variations in the code used for harvesting data from Store_ONE and Store_TWO .The solution we selected was to transform all the web transactions into a common name value pair table which contains the data required to create Orders (or Customers).
In summary, we have a “connector” that parses the Web Sites data and inserts the parsed results into a Name Value Pair table (identical for all Web Sites). Adding new field from a website is simply adding a name value pair in the connector code.
Transforming the Name-Value data into SQL tables
Extracting name value pairs is a laborious endeavor in SQL. A pair at a time. Let's start with a table structure.
CREATE TABLE [dbo].[NameValueTable]( [TransactionGUID] [UNIQUEIDENTIFIER] NOT NULL, [Class] [VARCHAR](50) NOT NULL, [SourceObject] [VARCHAR](50) NOT NULL, [Transaction_ControlRoot] [VARCHAR](100) NOT NULL, [TransactionParameterName] [VARCHAR](100) NOT NULL, [TransactionParameterValue] [VARCHAR](MAX) NOT NULL, [CreatedBy] [VARCHAR](50) NOT NULL, [CreatedON] [DATETIME] NOT NULL, CONSTRAINT [PK_NameValueTable] PRIMARY KEY CLUSTERED ( [TransactionGUID] ASC, [Class] ASC, [SourceObject] ASC, [Transaction_ControlRoot] ASC, [TransactionParameterName] ASC, [CreatedON] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
The columns TransactionParameterName and TransactionParameterValue are the Name and Value pair colums. The TransactionGUID is used to select each Web transaction and the Class and SourceObject are used to further refine the selection of the pairs.
Creating Sample Data
For our purpose here we need a few name value pairs, and we selected Order Number and Order Amount.
DECLARE @TransactionGUID UNIQUEIDENTIFIER, @Class VARCHAR(50), @SourceObject VARCHAR(50), @Transaction_ControlRoot VARCHAR(100), @TransactionParameterName VARCHAR(100), @TransactionParameterValue VARCHAR(MAX), @CreatedBy VARCHAR(50), @CreatedON DATETIME; SET @TransactionGUID = NEWID(); SET @Class = 'MAIN'; SET @SourceObject = 'GetOrder'; SET @Transaction_ControlRoot = 'OrderTransaction'; SET @TransactionParameterName = 'OrderNumber'; SET @TransactionParameterValue = '1001'; SET @CreatedBy = 'SYSTEM'; SET @CreatedON = SYSDATETIME(); INSERT INTO [dbo].[NameValueTable] ( [TransactionGUID], [Class], [SourceObject], [Transaction_ControlRoot], [TransactionParameterName], [TransactionParameterValue], [CreatedBy], [CreatedON] ) VALUES (@TransactionGUID, @Class, @SourceObject, @Transaction_ControlRoot, @TransactionParameterName, @TransactionParameterValue, @CreatedBy, @CreatedON ); SET @TransactionParameterName = 'OrderAmount'; SET @TransactionParameterValue = '243.90'; INSERT INTO [dbo].[NameValueTable] ( [TransactionGUID], [Class], [SourceObject], [Transaction_ControlRoot], [TransactionParameterName], [TransactionParameterValue], [CreatedBy], [CreatedON] ) VALUES (@TransactionGUID, @Class, @SourceObject, @Transaction_ControlRoot, @TransactionParameterName, @TransactionParameterValue, @CreatedBy, @CreatedON );
In our application, a transaction from a website can have up to 100 pairs. The source of the pairs can be XML or JSON or sometimes a combination thereof. A stored procedure, based on an SQL CLR Assembly, written in .NET code, calls and reads the website API, and insert the results into this table. We use this strategy as it is easier to call an external website using .Net code than using native T-SQL. For this article, the actual .Net code is not required. Suffice to say that the .NET code calls the web store API and inserts new row in the table, NameValueTable, as it parses the API output.
We then need to extract the name value pairs data into tables in the ERP system. We have several options to do it as discussed here. As the data are not in columns, we cannot select col1, col2, col3 from NameValueTable. We must traverse the table one row at a time and explicitly select for the column name we need.
Using the Iterative Solution
For each pair, we select against the pair name and extract the pair value into a variable. The code is straight forward and becomes laborious as we run into 100s of pairs. While each select is not too demanding on resources, imagine a busy web store where we need to process thousands of orders.
DECLARE @TransactionGUID UNIQUEIDENTIFIER, @TransactionParameterName VARCHAR(100), @OrderNumber VARCHAR(10), @OrderAmount DECIMAL(12, 4); SELECT TOP (1) @TransactionGUID = TransactionGUID FROM dbo.NameValueTable; SET @TransactionParameterName = 'OrderNumber'; SELECT @OrderNumber = [TransactionParameterValue] FROM [dbo].[NameValueTable] WHERE [TransactionGUID] = @TransactionGUID AND [TransactionParameterName] = @TransactionParameterName; SET @TransactionParameterName = 'OrderAmount'; SELECT @OrderAmount = [TransactionParameterValue] FROM [dbo].[NameValueTable] WHERE [TransactionGUID] = @TransactionGUID AND [TransactionParameterName] = @TransactionParameterName; SELECT @OrderNumber AS [@OrderNumber], @OrderAmount AS [@OrderAmount];
@OrderNumber | @OrderAmount |
1001 | 243.9000 |
If the number of pairs is large, the impact of this code is brutal as it queries the main table as many times as we have pairs to extract. In some instances, we may have a name value table with thousands of pairs to be extracted. This does not scale well. For example, running this code 100,000 times cost 3,172 milliseconds.
Using a Temp Table
There maybe a better way. Creating a #temp table that only contains the transaction we are interested in, using a Select * Into #temp from Source table where TransactionID =’some value’. Here we gain on working with potentially a much smaller table, and using tempdb, which may prove more efficient than the main database
SELECT * --always use Named columns in a production environment INTO #Pair FROM [dbo].[NameValueTable] WHERE [TransactionGUID] = @TransactionGUID; SET @TransactionParameterName = 'OrderNumber'; SELECT @OrderNumber = [TransactionParameterValue] FROM #Pair WHERE [TransactionParameterName] = @TransactionParameterName; SET @TransactionParameterName = 'OrderAmount'; SELECT @OrderAmount = [TransactionParameterValue] FROM #Pair WHERE [TransactionParameterName] = @TransactionParameterName;
@OrderNumber | @OrderAmount |
1001 | 243.9000 |
But the results are not that obvious. We need 3,719 milliseconds for the same 100,000 calls! No improvements. We may have gained as the code executes in tempdb. The problem still remains the lengthy code and the possible maintenance issue that this creates. Creating an index will cost more.
User-Defined Table Type
We could use a lesser known SQL object, a User-Defined Table, to store our data. From Microsoft: “User-defined tables represent tabular information. They are used as parameters when you pass tabular data into stored procedures or user-defined functions. User-defined tables cannot be used to represent columns in a database table.”
-- ================================ -- Create User-defined Table Type -- ================================ USE <database_name,sysname,AdventureWorks> GO -- Create the data type CREATE TYPE <schema_name,sysname,dbo>.<type_name,sysname,TVP> AS TABLE ( <columns_in_primary_key, , c1> <column1_datatype, , int> <column1_nullability,, NOT NULL>, <column2_name, sysname, c2> <column2_datatype, , char(10)> <column2_nullability,, NULL>, <column3_name, sysname, c3> <column3_datatype, , datetime> <column3_nullability,, NULL>, PRIMARY KEY (<columns_in_primary_key, , c1>) )
A User-defined tables is reusable, it can have a primary key and it enforces column's data types. Such a table can be used as a parameter to other objects.
CREATE TYPE dbo.NameValueType AS TABLE ( [TransactionGUID] [UNIQUEIDENTIFIER] NOT NULL, [Class] [VARCHAR](50) NOT NULL, [SourceObject] [VARCHAR](50) NOT NULL, [Transaction_ControlRoot] [VARCHAR](100) NOT NULL, [TransactionParameterName] [VARCHAR](100) NOT NULL, [TransactionParameterValue] [VARCHAR](MAX) NOT NULL, [CreatedBy] [VARCHAR](50) NOT NULL, [CreatedON] [DATETIME] NOT NULL, PRIMARY KEY ( TransactionGUID, Class, SourceObject, TransactionParameterName ) );
The code inserting into #Pair is now replaced with insert into @NameValue. The execution plan is identical. (see image below the code)
DECLARE @NameValue AS NameValuePairType INSERT INTO @NameValue SELECT * FROM [dbo].[NameValueTable] WHERE [TransactionGUID] = @TransactionGUID; SELECT * FROM @NameValue
Image 1: Execution Plan comparison between Insert into #Pair and Insert into @NameValue User-defined Table object.
And @NameValue is an image of our original NameValueTable selected against TransactionGUID.
TransactionGUID | Class | SourceObject | Transaction_ControlRoot | TransactionParameterName | TransactionParameterValue | CreatedBy | CreatedON |
AB88D42F-B584-4AEF-8959-7C58D55E0891 | MAIN | GetOrder | OdrerTransaction | OrderAmount | 243.90 | SYSTEM | 2017-09-25 |
AB88D42F-B584-4AEF-8959-7C58D55E0891 | MAIN | GetOrder | OdrerTransaction | OrderNumber | 1001 | SYSTEM | 2017-09-25 |
Extracting each value:
SET @TransactionParameterName = 'OrderNumber'; SELECT @OrderNumber = [TransactionParameterValue] FROM @NameValue WHERE [TransactionParameterName] = @TransactionParameterName; SET @TransactionParameterName = 'OrderAmount'; SELECT @OrderAmount = [TransactionParameterValue] FROM @NameValue WHERE [TransactionParameterName] = @TransactionParameterName;
Looks familiar. But we can now use a function using @NameValue as an argument.
CREATE FUNCTION [dbo].[fnGetValueFromName] ( @NameValue AS NameValuePairType READONLY, @Name AS VARCHAR(200) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Result VARCHAR(MAX); SELECT @Result = [TransactionParameterValue] FROM @NameValue WHERE [TransactionParameterName] = @Name; RETURN @Result; END;
And the code looks cleaner:
SELECT @OrderNumber =[dbo].[fnGetValueFromName] (@NameValue,'OrderNumber') SELECT @OrderAmount =[dbo].[fnGetValueFromName] (@NameValue,'OrderAmount')
From the resources standpoint, the function executes a clustered index scan against the @NameValue table. Inserting the data usually takes 98% of the execution plan while getting individual values are negligible. The code is simplified and easier to maintain. Error trapping might be included along with possible validation in this function.
Execution Costs
In SQL, nothing comes FREE. Using a User-Defined table object and a function to select the data cost FOUR times more cycles than the bare code presented at the beginning. Is it worthwhile? Are there alternatives? I personally selected that strategy because it is easy to add a Pair in the source table and extract its Value in code.
It would be interesting to see if a SQL Assembly in CLR could encapsulate the code and produce faster results, but the purpose of this article was to introduce users to the use of User-Defined tables.
Summary
While working with name value pairs is not considered mainstream relational SQ, there are times when it may come handy. We have shown that by using a User-defined Table Type and a simple function, we can simplify the code to retrieve that pair value to two simple steps.
1. Inserting a subset of the complete name value table based on TransactionID in a user-defined table with this code:
DECLARE @NameValue AS NameValuePairType INSERT INTO @NameValue SELECT * --always use Named columns in a production environment FROM [dbo].[NameValueTable] WHERE [TransactionGUID] = @TransactionGUID;
2. Using a function to retrieve the value from the name
SELECT @OrderNumber =[dbo].[fnGetValueFromName] (@NameValue,'OrderNumber') SELECT @OrderAmount =[dbo].[fnGetValueFromName] (@NameValue,'OrderAmount')
We have found this strategy quite flexible when handling a constantly moving target as web data requirements seem to change by the day and by the site. Adding a new name value pair is simple. Extracting its data is a one line code. Of course, the parsing code and the ERP processes are adjusted to reflect such new pair. However, adding such a pair has no side effects on existing (and working) code, whereas changing table structures are always prone to issues in production.
Comments and suggestions are welcome.