March 16, 2008 at 1:46 pm
I have a flat file that has dates in the formats of YYYYMMDD. I want to import these into a date column in SQL Server 2005 sp2 and I'm using the dt_dbdate type in SSIS. However, SSIS throws an error saying that the YYYYMMDD values are not dates.
What date formats does SSIS accept as being dates? Can I specify a formatting hint in any way? How else can I get these values into a date column, without using a custom import script.
March 16, 2008 at 3:38 pm
I'm going with a derived column using this formula:
ISNULL(dt) || LEN(TRIM(dt)) != 8 ? NULL(DT_DBDATE) : (DT_DBDATE)(SUBSTRING(dt,5,2) + "/" + SUBSTRING(dt,7,2) + "/" + SUBSTRING(dt,1,4))
I'm wondering whether this is the fastest method and whether I should just do everything in a script component.
March 16, 2008 at 8:59 pm
The fastest method is neither... the fastest method is BULK INSERT. I don't use SSIS or DTS, but rumor has it that SSIS has a "BULK INSERT" object that should allow you to do this without such a formula. Bulk Insert om T-SQL with a format file should still beat if for performance.
I don't know about other folks, but using a BCP format file and Bulk Insert on the T-SQL side, I've been able to load a 5.2 million row 20 field file, with some cleansing, in 60 seconds flat including dates in the ISO format of YYYYMMDD.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 17, 2008 at 3:42 am
We use something like this: CONVERT(datetime, CONVERT(varchar, @IntDate))
March 17, 2008 at 3:02 pm
Jeff is correct about the Bulk Insert Task for SSIS, but the Bulk Insert Task cannot transform data. I suggest you look into using a Data Flow Task instead, depening how much data manipulation you are performing.
SQL Server 2005 Books Online (September 2007)
Bulk Insert Task
http://msdn2.microsoft.com/en-us/library/ms141239.aspx
SQL Server 2005 Books Online (September 2007)
Data Flow Task
http://msdn2.microsoft.com/en-us/library/ms141122.aspx
--
--
Also, I did a little digging and came across some links from an MSDN Forums posting:
Source: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2471462&SiteID=1
If you would like to bulk insert data using SSIS, here is what you need to do.
Control Flow:
You need to place a Data Flow Task on the canvas.
Data Flow:
1. First add a Flat File Source and configure it.
2. Connect the output of the Flat File Source to the transformations you need to perform (if any).
3. Connect the output of the last transformation to an OLE DB Destination.
4. Set the Data Access Mode of the OLE DB destination to Fast Load.
5. Configure the Fast Load options as you wish.
If you are new to SSIS and do not know how to perform these steps, I recommend that you visit this page:
http://msdn2.microsoft.com/en-us/library/ms167031.aspx
This link is the startup page of three very usefuly tutorials. You might also need to install the Sample Databases and Samples. So this page tells you exactly what you need to do: http://msdn2.microsoft.com/en-us/library/ms143804.aspx
For more information about how to set the Fast Load options, you might find these URLs useful:
Importing and Exporting Bulk Data: http://msdn2.microsoft.com/en-us/library/ms175937.aspx
BULK INSERT (Transact-SQL): http://msdn2.microsoft.com/en-us/library/ms188365.aspx
Optimizing Bulk Import Performance: http://msdn2.microsoft.com/en-us/library/ms190421.aspx
Controlling the Locking Behavior for Bulk Import: http://msdn2.microsoft.com/en-us/library/ms180876.aspx
Controlling Constraint Checking by Bulk Import Operations: http://msdn2.microsoft.com/en-us/library/ms186247.aspx
Controlling Trigger Execution When Bulk Importing Data: http://msdn2.microsoft.com/en-us/library/ms187640.aspx
Managing Batches for Bulk Import: http://msdn2.microsoft.com/en-us/library/ms188267.aspx
Controlling the Sort Order When Bulk Importing Data: http://msdn2.microsoft.com/en-us/library/ms177468.aspx
Hope This Helps,
"Key"
MCITP: DBA, MCSE, MCTS: SQL 2005, OCP
March 17, 2008 at 4:38 pm
Thanks for the feedback. I forgot about BCP, although I used that back in the SQL 2000 days to populate a datamart, since it was the fastest method.
After importing the data into staging and I realized that between 30-50% of the rows in the source file were irrelevant and needed to be screened out. I could import everything into a staging table, then delete the irrelevant rows with SQL, or just not insert the rows to begin with.
Ultimately I scrapped the derived column step and replaced it with a script transformation step, which I believe is the main way in SSIS to screen out rows. This is a snippet of the code that both screens out rows and turns strings into dates.
'Filter out rows that do not have CUSIPs
If Row.cusip.Trim.Length = 9 Then
'Turn strings into dates
If Row.watchlistdatestring.Trim.Length = 8 Then
Row.watchlistdate = Date.ParseExact(Row.watchlistdatestring, "yyyyMMdd", System.Globalization.CultureInfo.InvariantCulture)
Else
Row.watchlistdate_IsNull = True
End If
Row.DirectRowTotoDatabase()
End If
March 17, 2008 at 4:49 pm
Thanks for the link to the MS SSIS forum. If one of the main people behind SSIS (Kirk H.) says not to use Bulk Insert / BCP anymore, then I'm going to take his word for it.
Another note is from Donald Farmer, who wrote this:
You could also use a Derived Column component, and enable the FastParse property on the output column you are converting to. FastParse can handle this date format, even though Standard Parse can not.
I tried this and it worked, although the DerivedColumn cannot screen out entire rows, so I still went with the script transformation.
March 18, 2008 at 6:29 am
Which one of the many links posted is that? And why does he say not to use Bulk Insert/BCP anymore?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2008 at 7:34 am
The info about BCP was in this link
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2471462&SiteID=1
It just contained an exerpt from Kirk's SSIS book.
I'll quote the quote
Let me quote from Kirk Haselden's book titled Microsoft SQL Server 2005 Integration Services:
In DTS, using the Bulk Insert Task was the preferred way to do large bulk loads because it takes advantage of the performance gains using the T-SQL BULK INSERT command. With Integration Services, however, it is recommended to use the Data Flow Task. The Data Flow Task has numerous advantages over the Bulk Insert Task, which has rigid limitations on how it can consume and insert data. The Data Flow Task can bulk load data with comparable performance, especially if using the SQL Server Destination Adapter, while providing much more flexibility.
The Bulk Insert Task is a wrapper task around the T-SQL Bulk Insert statement. The task generates the Bulk Insert statement from the properties set on the task and then sends it to the server where it will be executed. Therefore, all the limitations that apply to the Bulk Insert T-SQL statement also apply to the Bulk Insert Task.
Because the Data Flow Task replaces the features provided by the Bulk Insert Task, it is recommended to use the Data Flow Task for bulk-loading scenarios. The Data Flow Task is easier to set up, just as performant for straight bulk inserts, and more flexible.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply