There are two kinds of ASCII files commonly used for data transfer:
- Flat - a plain text file where each field value is the same width and padded with spaces.
- Delimited - a plain text file where each field value is separated by specific delimiter characters.
By definition, a flat file can only have one value per field per record. A delimited file, however, can have one or more fields that are further delimited by other delimiter characters. The most common type of delimited file is the CSV (comma-separated values) file, where the values in each record are separated by commas. A CSV file could also have one or more of its fields delimited by another delimiter, such as the pipe (|) character, as in the following example:
ProductId,ProductName,PartsList 1,Gizmo,463|914|771|281|418 2,Doohickey,422|453 3,Gadget,323|724|449|882|591|715
Each record above has a single ProductId value and ProductName value in those fields, but a variable number of PartNumber values in the PartsList field that are separated by the pipe character. The challenge is to map the multiple PartNumber values from a single record in the file into multiple records related by ProductId in a destination table.
In this article we will demonstrate a SQL Server user-defined function that resolves the values in a multi-value field into their individual values, then use that functon to create an SSIS package that reads a CSV file containing a multi-value field into a staging table and parses the multi-value field into a second, related table using the CROSS APPLY statement.
1. Deploy the SQL Server "ParseIndividualValue" user-defined function
USE [TestDB] GO BEGIN TRY DROP FUNCTION [dbo].[ParseIndividualValue] END TRY BEGIN CATCH END CATCH GO CREATE FUNCTION [dbo].[ParseIndividualValue](@String varchar(8000), @Delimiter char(1)) returns @temptable TABLE (items varchar(8000)) as begin declare @idx int declare @slice varchar(8000) select @idx = 1 if len(@String)<1 or @String is null return while @idx!= 0 begin set @idx = charindex(@Delimiter,@String) if @idx!=0 set @slice = left(@String,@idx - 1) else set @slice = @String if(len(@slice)>0) insert into @temptable(Items) values(@slice) set @String = right(@String,len(@String) - @idx) if len(@String) = 0 break end return end GO
2. Test the "ParseIndividualValue" function
USE TestDB GO BEGIN TRY DROP TABLE dbo.StagingTable END TRY BEGIN CATCH END CATCH GO CREATE TABLE [dbo].[StagingTable]( [ProductId] [varchar](50) NOT NULL, [ProductName] [varchar](50) NOT NULL, [PartsList] [varchar](1000) NOT NULL ) GO INSERT INTO dbo.StagingTable (ProductId,ProductName,PartsList) VALUES (1,'Gizmo','463|914|771|281|418'), (2,'Doohickey','422|453'), (3,'Gadget','323|724|449|882|591|715') SELECT * FROM dbo.StagingTable
BEGIN TRY DROP TABLE dbo.Products END TRY BEGIN CATCH END CATCH GO CREATE TABLE [dbo].[Products]( ProductId [int] NOT NULL, [PartNumber] [int] NOT NULL ) GO DECLARE @Table Table (ColumnA VarChar(100), ColumnB VarChar(1000)) INSERT INTO @Table SELECT ProductId,PartsList FROM dbo.StagingTable INSERT INTO dbo.Products (ProductId,PartNumber) SELECT T.ColumnA, a.* FROM @Table T CROSS APPLY [dbo].[ParseIndividualValue](t.ColumnB,'|') a ORDER BY CAST(T.ColumnA AS INT) GO SELECT * FROM dbo.Products ORDER BY ProductId,PartNumber
The screen shot shows that the PartNumer values have been extracted from the multiple-value field into individual records.
3. Create an SSIS package that parses a multi-value field
Paste the following code...
USE TestDB GO TRUNCATE TABLE dbo.StagingTable TRUNCATE TABLE dbo.Products SELECT * FROM dbo.StagingTable SELECT * FROM dbo.Products
...into a query panel and execute it to truncate the StagingTable and Products tables.
Paste the following CSV text....
ProductId,ProductName,PartsList 1,Gizmo,463|914|771|261|418 2,Doohickey,422|453 3,Gadget,323|724|449|882|591|715
...into a text editor and save it as input_file.csv...
...to a local folder.
Open a project in Business Intelligence Development Studio and start the SSIS Import and Export Wizard.
Select the Flat File Source data source and browse to input_file.csv, then click the Next button.
Click the Next button again.
Enter the server and and database names, then click the Next button.
Click the Next button again.
Click the Finish button.
Click the Close button.
Run the new SSIS package by selecting Debug-Start Debugging from the main menu.
Wait for the SSIS package to complete execution.
Paste the following code...
USE TestDB GO SELECT * FROM dbo.StagingTable
...into a query panel and execute it to confirm that the data was successfully inserted into the staging table.
Run the previous code again to truncate the StagingTable and Product tables.
Add an Execute SQLTask component to the SSIS package.
Add a precedent constraint from the Data Flow Task component to the Execute SQL Task component.
Right-click the Execute SQL Task component and select the Edit menu item.
Copy the following code into the Windows clipboard.
DECLARE @Table Table (ColumnA VarChar(100), ColumnB VarChar(1000)) INSERT INTO @Table SELECT ProductId,PartsList FROM dbo.StagingTable INSERT INTO dbo.Products (ProductId,PartNumber) SELECT T.ColumnA, a.* FROM @Table T CROSS APPLY [dbo].[ParseIndividualValue](t.ColumnB,'|') a ORDER BY CAST(T.ColumnA AS INT) GO
Click on the "More Options" elipsis menu item at the end of the "SQL Statement" line.
Paste the contents of the clipboard into the "Enter SQL Query" text box.
Click the "OK" button.
Execute the SSIS package.
Wait for the SSIS package to complete execution.
Paste the following code into a SQL Server Managment Studio quey panel...
USE TestDB GO SELECT * FROM dbo.StagingTable SELECT * FROM dbo.Products ORDER BY ProductId,PartNumber
...and execute it demonstrate that the PartsList field ha been correctly parsed and inserted into the Products table.