October 8, 2010 at 11:26 am
Hi,
I need some guidance from someone who has some experience with SSIS.
I'm working on a project converting an MS Access DB to SQL 2005. The access table has a Zip code in it, which can hold up to 50(!) characters. I defined the column in SQL as nchar(5). When running the package I get a truncation error on the column because there are some zip codes in the Access DB with > 5 characters.
I don't want to let the bad data go in. Does anyone have any suggestions as to the best approach to handle this?
If it makes any difference, the data definitions in the old Access database are atrociously bad. Things like a column that holds a 3 digit code being defined as 100 characters, a dollar amount that should clearly be numeric being saved as text, dates saved as text, etc. We are talking about at least half of the fields, and I have several dozen tables to convert, so I expect to run into this type of situation over and over.
October 8, 2010 at 11:31 am
You can set up exception handling in SSIS. Set the properties of the data transformation object to include a routing for bad rows. SSIS can dump them into a separate table (what I usually do), and then you can clean them up.
By the way, lots of Zip codes have 10 characters, because they'll be stored as Zip+4. That might not be an error.
For documentation on how to set the exception handling up, I seriously recommend reading the articles on http://www.MSDN.com on the subject.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 8, 2010 at 11:41 am
Thanks GSquared,
Could you tell me, how do the rows from the error table get back into the correct table? Do you simply write scripts to run outside SSIS to do this? And would you create an error table for each table you are importing, to store potential errors? And then after all the errors are cleaned up, drop all the error tables from the DB? Am I even close?
October 8, 2010 at 2:30 pm
Are those US ZIP codes your storing in the SQL table? If so, why are you using NCHAR?
Ignore this question if they're not US ZIP codes.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
October 11, 2010 at 6:43 am
Alvin,
I started off the project using the import wizard, and nvarchar is what it used. I changed it to nchar, but it looks like I will be changing it again. Thanks for the tip!
October 11, 2010 at 7:39 am
You can use a conditional split to filter out rows with a length > x.
For the "good" records, you can add a data conversion component, to specifically set the lenght of the field to nvarchar(x). This way your destination won't give a warning that it is possible that you will try to store 50-length data in a much smaller field.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 11, 2010 at 7:48 am
cvandevoorde (10/11/2010)
Alvin,I started off the project using the import wizard, and nvarchar is what it used. I changed it to nchar, but it looks like I will be changing it again. Thanks for the tip!
The point I wanted to make is that if you're dealing with US Zip Codes there's no reason to be using a data type that starts with N. Those are intended for storing data that includes unicode characters. These take up twice as many bytes of storage of the "non-N" options. VARCHAR or CHAR are much better options than NVARCHAR or NCHAR when you know you won't be using unicode characters.
One word of caution for everyone, the Import/Export Wizard does not always pick the best option. It usually goes for the safest options. It's always a good idea to review it's result and make changes where appropriate.
Just my 3 cents worth.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
October 11, 2010 at 10:28 am
First, if you have exclusive control of these fields in Access, you can just change the Access data type to a more suitable length (what has been allocated in SQL server) and SSIS will no longer complain. This is not a good option if you have data in a given field that is longer than you have allocated for it in SQL Server. If that occurs you may want to consider making your data type larger in SQL Server.
Another way, if you do not have exclusive control of the Access DB now, is to use a Data Conversion object in SSIS, in between your Access connection and your SQL Server connection. It can handle changing the data type/length on multiple fields as it runs.
Proper data profiling should be done first though, to make sure that your field on the SQL side can accommodate the data coming in from the Access DB.
Setting up error handling is is a good practice also, if you want the package to be able to continue running after it encounters an error, such as truncation.
Chris
Learning something new on every visit to SSC. Hoping to pass it on to someone else.
October 11, 2010 at 10:33 am
I vote for the data conversion route, regardless of whether the user has the proper rights to the Access database. This way, there's no danger of "breaking "anything.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
October 11, 2010 at 10:37 am
Stamey (10/11/2010)
Setting up error handling is is a good practice also, if you want the package to be able to continue running after it encounters an error, such as truncation.
Chris
This is what I am looking into right now, I am just not sure how to handle the rows in error, as far as fixing them later and getting them back into the table where they belong. If anybody has experience with this and can give just a quick answer as to how this might work, I'd like to hear about it.
October 12, 2010 at 4:54 am
Unless you KNOW you will never use anything but US zip codes, consider making the field length 7 to handle Canadian zip codes. Recently went through a somewhat painful change when we had to alter the db to handle that. From now on any db I create will have a zip code length of 7.
October 13, 2010 at 11:30 am
RonKyle (10/12/2010)
Unless you KNOW you will never use anything but US zip codes, consider making the field length 7 to handle Canadian zip codes. Recently went through a somewhat painful change when we had to alter the db to handle that. From now on any db I create will have a zip code length of 7.
Unless you have a requirement for zip+4, then you need 9 or 10 characters, depending on how you chose to store it.
23146-2322
or
231462322
Chris
Learning something new on every visit to SSC. Hoping to pass it on to someone else.
October 13, 2010 at 11:36 am
cvandevoorde (10/11/2010)
Stamey (10/11/2010)
Setting up error handling is is a good practice also, if you want the package to be able to continue running after it encounters an error, such as truncation.
Chris
This is what I am looking into right now, I am just not sure how to handle the rows in error, as far as fixing them later and getting them back into the table where they belong. If anybody has experience with this and can give just a quick answer as to how this might work, I'd like to hear about it.
Error handling consists of sending the errors to another table, or technically, another data connection. This alternate data connection would be a destination that does not have hard data types on the fields, and possibly larger fields or a "catch-all" field that would not cause an error when trying to put data into it so that it can be reviewed later.
While you might be able to automate getting data from the error location back into the production location, such as with a stored procedure, this is not easy to do, because if you know enough about the problem to automate moving it from the error location to the production location you can account for the error in the initial load and you would not have an error to begin with. What I am saying is that in most cases if the package has been written correctly then errors need to be manually reviewed because they contain a variance that could not be accounted for when the package was written.
Chris
Learning something new on every visit to SSC. Hoping to pass it on to someone else.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply