This article describes two ways to shred Unicode Japanese characters from the content of the excel files as well as extracting the filename ,which may contain Unicode Japanese characters, like this:
Let us suppose we have the below file and its contents that we need to shred into SQL Server tables using SSIS. BElow you see the folder, the file, and the data contained inside the file.
The SQL Table has the below structure
CREATE TABLE [dbo].[SampleData]( [Sample_Id] [int] NULL, [Start_time] [datetime] NULL, [End_time] [datetime] NULL, [Information] [nvarchar](255) NULL, [Filename] [nvarchar](1000) NULL ) ON [PRIMARY]
With the file and the table structure now in place, let us create the package to extract the required information:-
Shred the Data
The package consists of the below steps to parse the data into the respective columns of the table
- Parse the file
- Extract the data
Each of these steps is described below.
Step 1: Parse the file
The purpose of this step is to loop through each file and extract the information. A For-each loop container is used to parse through the directory with a Script task to extract the necessary information from the file.
The For-Each loop container can be configured as below, with the file path being captured dynamically using a variable.
We next construct the Script task to derive the required filename information using the below C# code. The code splits the filepath by "\\" into an array using the split() and extracts the filename through LINQ last() method. The filename is then assigned to one of the SSIS variable (vFile_MetaData)
public void Main() { String Filepath = Dts.Variables["User::vFilePath"].Value.ToString(); string[] fileid = Filepath.Split('\\'); Dts.Variables["User::vFile_metaData"].Value = fileid.Last().ToString(); MessageBox.Show(Dts.Variables["User::vFile_metaData"].Value.ToString()); Dts.TaskResult = (int)ScriptResults.Success; }
Step 2: Extract the data
The next step is to extract the data content of the parsed file. To do this, we add a Data Flow task to the For-Each loop created in Step 1.
The Data flow container is configured with the below extract, transform and load tasks.
- Excel Source
- Derived Column
- OLE DB Destination
Excel Source
The Excel source container is mapped to the file from where we would need to obtain the data:-
Derived Column
The Derived Column creates a new column from the variable vFile_metaData that contains the metadata information. Since, we are concerned with Unicode characters inside the metadata we use the datatype as DT_WSTR.
OLE DB Destination
The OLE DB destination maps our derived data into the respective columns of the table, with the table structure as shown in this image, with the mapping below.
Verifying the Data
When executing our package, we should see the filename. There is a MessageBox in the C# script, and we notice that the filename is getting extracted correctly into the SSIS variables.
The Data Flow task gets executed successfully:-
But when verifying the data in the table, we find that the filename remains blank although the rest of the data has been successfully inserted into the table.
Circumventing the Failure
With the filename inserted as a blank, the data extraction logically failed. We have two options that can be used to circumvent the failures
Option 1
We can convert the data type of the variable in the Derived Column transformation to DT_STR with Code page being set to 932 for Japanese Characters (JIS).
We then add a Data Conversion task to reconvert the Derived Column into a Unicode format, making it persistent to the data type of the table
Verifying the data
After re-running the package, we can verify the data in the table. We find that now we have the required filename from the file along with the content of the file, both of which are Unicode characters.
Option 2
The other way would be update the filename from the variable in each iteration using a Execute SQL task.
Instead of using a Data Conversion task , we change the data type of the variable to NVARCHAR and pass the variable as a parameter to the UPDATE statement.
Verifying the Data
After querying the data on the table, we find that we have the required filename from the file along with the content of the file both of which are Unicode characters in the table
Conclusion
Although there are numerous ways to extract data using SSIS, the above methods shows some of the options that can be performed successfully to derive the Unicode data from Excel files.