Speaking at SQL Saturdays, I always get excited when I get asked a question that has never been asked. Sometimes I know the answer right away, but there are times where I do not know the answer and I have to go home and do research in order to answer it. If the question is unique, and if I am able to adapt it to my talk over Dynamic SQL, I will add it to my presentation the next time that I speak.
Recently, I was asked a question that I did not know the answer to. How to load a CSV File from a directory without using xp_CmdShell. A lot of times, companies will lock down access to xp_CmdShell, so as a developer, you must find an alternate way to achieve your goal. When I was asked this question, I did not know the answer to it. I took the individuals name and contact information and told him I would research it and get back with him.
After digging through different Microsoft articles, I found a way to load a file from disk using BULK INSERT and creating a simple Format file in XML. I wrote some sample code, checked it twice and then emailed it off to him. A few days later he came back and said that everything was working great. That was a straight forward question that had a straight forward answer, however, imagine if you didn’t know the name of the file you were going to import, or if there were multiple files in the directory, how would you handle loading those files. That is when I decided that I could incorporate the question into my presentation, enhance it a little, and do it all Dynamically.
Before we go any further, for this to work you will have to know the columns that are in the CSV file that you will be loading, and the type of data each column contains.
The scenario that we will be using is as follows: As a developer, you are asked to read the CSV files from a directory, and then import those files into the database. You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file.
Being in the world of custom programming, I rely a lot on temp tables throughout the day. This example will utilize various temp tables to help achieve everything covered in the above scenario.
Step One
You know the directory where the files are going to be living. The first thing we will need to do is scan that directory for all the files that are living in it. We will use xp_DirTree to accomplish this. xp_DirTree is an undocumented system stored procedure. It will give us the list of subdirectories of the directory that has been specified as the input parameter. Another great feature of xp_DirTree, it will also list all of the files that are the specified directory.
To use it, all you have to do is call it and pass the directory that you want the items for as the accepted parameter. For example,
EXEC master.dbo.xp_DirTree ‘path’ ,1,1
will list the contents of the path specified. The xp_DirTree takes 3 parameters. The first being the directory you want to search, the second parameter will be the depth you want to look, and then the third parameter will tell xp_DirTree to list the files.
We will create two different temp tables that will allow us to capture the contents of the directory that we scanned with xp_DirTree. The first temp tables will capture all the files in the directory that xp_DirTree saw. The second temp table that we will create will be identical to the first table with the exception that we will use it to only hold the .CSV files from the directory. We will utilize CHARINDEX to strip away the file extension and just get the names of the files. We will filter out the non CSV files in our WHERE Clause.
IF OBJECT_ID('tempdb.dbo.#Files') IS NOT NULL BEGIN DROP TABLE #Files END CREATE TABLE #Files ( FileID INT IDENTITY(1,1) NOT NULL, fName NVARCHAR(100), Depts INT, F1 INT ) -- NOW WE WILL MAKE A DUPLICATE OF THE ABOVE TABLE IN ORDER TO CLEAN THE FILES AND CREATE THE TABLE NAMES -- IF OBJECT_ID('tempdb.dbo.#CleanFiles') IS NOT NULL BEGIN DROP TABLE #CleanFiles END CREATE TABLE #CleanFiles ( FileID INT IDENTITY(1,1) NOT NULL, TableName NVARCHAR(100), fName NVARCHAR(100) ) -- NOW WE WILL LOOK IN THE DIRECTORY AND GRAB ANY FILES THAT ARE IN THE DIRECTORY -- INSERT INTO #Files (fName, Depts, F1) EXEC master.dbo.xp_DirTree 'H:\DataFilesToUpload',1,1 -- NOW THAT WE HAVE THE FILES IN a TEMP TABLE -- WE CAN STRIP OUT THE NON CSV FILES AND PUSH THEM INTO THE CLEANFILE TABLE -- INSERT INTO #CleanFiles (TableName, fName) SELECT LEFT(fName,CHARINDEX('.',fName)-1), fName FROM #Files WHERE fName LIKE ('%.csv') SET @Loop = @@ROWCOUNT
Now that we have the file names loaded into a temp table, we can utilize that temp table in order to dynamically import the data. Using Dynamic SQL, we will loop through the file names in the temp table and load the data. We will use the TableName field to create a temp table at runtime, then see if a permanent table exists with the same name. If it doesn’t we will create a new table then push the data into it. This process will repeat through the loop while we still have unprocessed files.
Incorporating an XML Format file will ensure that the data is transferred into our temp table without any conversion errors. BULK INSERT supports two types of format files. An XML based formal file or a non-XML based file. The non-XML Format is the original format that is supported by earlier versions of SQL Server. For our example we will be creating an XML Format file.
There are a few benefits of using an XML format file over a non-XML format file;
- XML Format files are self-describing and easier to read and create
- XML Format files contain the data types of the target columns
- XML Format files allows for loading a single large object (LOB) data type for the data file.
Regardless of which format file you chose, remember that you must allow for Unicode Characters. Both XML and non-XML will use BCP (bulk copy program) syntax. Since we are using an XML based file, we will go over the different type of Data Types that you can use when building your file;
- SQLCHAR OR SQLVARYCHAR : The data is sent in the client code page or in the code page implied by the collation
- SQLNCHAR OR SQLNVARCHAR: The data is sent as a Unicode
- SQLBINARY OR SQLVARYBIN: The data is sent without any conversion
- SQLINT OR SQLVINT: Fixed number Integer field
- SQLDECIMAL OR SQLNUMERIC: Nullable variable number fields
Since my CSV files have four columns, my format file will have the four columns defined. It will also say the type of data each file will hold, and the length of each field;
<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> \<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="255"/> \<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="25" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> \<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> \<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> \<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="15"/> </RECORD> <ROW> \<COLUMN SOURCE="1" NAME="StockItemName" xsi:type="SQLVARYCHAR"/> \<COLUMN SOURCE="2" NAME="UnitPrice" xsi:type="SQLINT"/> \<COLUMN SOURCE="3" NAME="OnHand" xsi:type="SQLINT"/> \<COLUMN SOURCE="4" NAME="Amt2Purchase" xsi:type="SQLINT"/> \<COLUMN SOURCE="5" NAME="File" xsi:type="SQLVARYCHAR"/> </ROW> </BCPFORMAT>
For a more in depth look at using a format file for BULK INSERT, you can refer to this Microsoft tech article: https://docs.microsoft.com/en-us/sql/relational-databases/import-export/use-a-format-file-to-bulk-import-data-sql-server?view=sql-server-2017
Now we will create a simple loop to run through each file that we have in our temp table. The loop will create a temp table based on the file name, then load the data into that table. Using Dynamic SQL will allow us to swap out the file/table name with each loop.
WHILE @Loop > 0 AND @Count <= @Loop BEGIN -- WE WILL SET THE @TableName & @FileName VARIABLES -- SELECT @TableName = cf.TableName, @FileName = cf.fName FROM #CleanFiles cf WHERE cf.FileID = @Count
As each new loop happens, we will set the variable @TableName and @FileName. Those variables are read from our temp table, and we will use these variables set to create the tables and also read the files.
Using Dynamic SQL we will create the temp table at runtime. This script assumes that your temp table will match the columns in your CSV File and be of the same datatype. Once we have all of the different variables, we can execute our Dynamic SQL Code and load our three sample files.
-- NOW WE WILL START ADDING THE CODE TO DYNAMIC SQL IN ORDER TO PROCESS THE FILES -- SET @DynamicSQL = N' -- WE WILL CREATE A TEMP TABLE TO HOLD THE DATA IN EACH FILE -- IF OBJECT_ID(''tempdb.dbo.#' + @TableName + ''') IS NOT NULL BEGIN DROP TABLE #' + @TableName + ' END -- NOW WE WILL CREATE THE TEMP TABLE TO HOLD THE DATA -- -- (YOU SHOULD ALREADY KNOW THE STRUCTURE OF THE DATA SO YOU CAN BUILD THE TABLE TO MATCH THE CSV FILE) -- CREATE TABLE #' + @TableName +' -- YOU CAN CREATE THIS AS A GLOBAL TEMP TABLE IF YOU WOULD LIKE TO DO THE REST OF THE DATA MANIPULATION IN SQL -- ( StockItemName NVARCHAR(200), UnitPrice MONEY, OnHand INT, OrderAmt INT ) -- NOW WE WILL LOAD THE DATA INTO THE TEMP TABLE -- BULK INSERT #' + @TableName + ' -- USING DYNAMIC SQL WILL ALLOW US TO LOOP FOR EACH FILE IN OUR TABLE -- FROM ''H:\DataFilesToUpload\' + @FileName + ''' WITH (FORMATFILE = ''H:\DataFilesToUpload\InvFormat.xml'') -- NOW THE DATA FROM THE CSV FILE HAS BEEN LOADED INTO A TEMP TABLE -- -- AT THIS POINT, YOU CAN WORK WITH THE DATA TO MANIPULATE IT HOW YOU WOULD LIKE --' EXEC sp_ExecuteSQL @DynamicSQL SET @Count = @Count + 1 END
You can verify that your data is loading, and your script is running by inserting a SELECT statement after the EXEC Command and before the SET @Count statement.
SELECT * FROM #’ + @TableName
I never recommend using a SELECT * On a table and I hope that when you write your final code block that you will list out the column names in your Statement.
Now we want to verify that the table does not exists. We will use the NOT EXISTS option and look in sys.Tables to see if we find a copy of the table. If the table has already been created, we will skip this section so we do not load duplicate data. If the table hasn’t been created, we will create a new system table then load the data from the temp table into the permanent table.
-- WE WILL NOW CHECK TO SEE IF THERE IS A PERMANENT TABLE WITH THE SAME NAME ALREADY BUILT -- IF NOT WE WILL BUILD IF - IF NOT EXISTS(SELECT 1 FROM sys.Tables WHERE [Name] = ''' + @TableName + ''') BEGIN CREATE TABLE ' + @TableName + ' ( StockItemName NVARCHAR(200), UnitPrice MONEY, OnHand INT, OrderAmt INT, DateAdded SMALLDATETIME ) -- NOW THAT THE PERMANENT TABLE HAS BEEN BUILT - WE WILL INSERT INTO IT -- INSERT INTO ' + @TableName + ' (StockItemName, UnitPrice, OnHand, OrderAmt, DateAdded) SELECT StockItemName, UnitPrice, OnHand, OrderAmt, GETDATE() FROM #' + @tableName + ' END
With the help of Dynamic SQL, xp_DirTree, Bulk Insert and some out of the box ingenuity, we are now able to load .CSV files from a directory on our local hard drive. By scanning the directory and manipulating the results, we can loop through the file names and create new tables and load the data from the CSV in those tables.
Dynamic SQL allows you to solve problems that do not have an easy solution to the answer. The power of Dynamic SQL will allow you to be flexible when troubleshooting problems, finding new ways to do old task and help you grow as a developer.