March 18, 2008 at 1:23 pm
Is it possible to easily import data from flat files with vertical columns?
Should I be using the flat file task at all?
In order words, instead of the data looking like this:
First Name, Middle Name, Last Name
John , A , Doe
The data would look like this:
First Name=John
Middle Name=A
Last Name=Doe
or
First Name, John
Middle Name, A
Last Name, Doe
I am extremely new to SSIS so excuse me if this seems elementary.
March 18, 2008 at 5:11 pm
Tony,
Welcome to SSIS. You might be able to use the flat file source, but you're going to need to stage the data in a staging table to flatten it out (using a sequential row identifier such as an IDENTITY column to maintain the same order as in your source file). A script component would also be helpful here.
The example you showed looked to be fairly simple - if you have a clear delineation (such as FirstName always represents a new record), you should be ok.
Feel free to post back if you need more information.
hth,
Tim
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
March 18, 2008 at 6:21 pm
Tim Mitchell (3/18/2008)
Tony,Welcome to SSIS. You might be able to use the flat file source, but you're going to need to stage the data in a staging table to flatten it out (using a sequential row identifier such as an IDENTITY column to maintain the same order as in your source file). A script component would also be helpful here.
The example you showed looked to be fairly simple - if you have a clear delineation (such as FirstName always represents a new record), you should be ok.
Feel free to post back if you need more information.
hth,
Tim
Hello Tim,
In my case, there will be unique identifier which will be an application number. The problem is the application number comes like this:
<123456>
<>
First Name, John, 0
Middle Name, A, 0
Last Name, Doe, 0
March 18, 2008 at 8:04 pm
How many of those "unique identifiers" and sections of names do you have in a file and are they all the same (not the names, the over all structure you posted)? This could be super easy with a BCP format file, if they are...
And, you're sure you're not talking about an XML file, right? Sorry about the stupid question, but I gotta make sure...
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2008 at 8:22 pm
Jeff Moden (3/18/2008)
How many of those "unique identifiers" and sections of names do you have in a file and are they all the same (not the names, the over all structure you posted)? This could be super easy with a BCP format file, if they are...And, you're sure you're not talking about an XML file, right? Sorry about the stupid question, but I gotta make sure...
Hello Jeff,
Each file will contain one application, so basically there will only be one application number that looks like this:
<123456>
The files are *.txt files. I know the application number looks like it might be XML, but it is a plain text file. That application number is defined by a third party vendor. They define their comments that way.
March 18, 2008 at 8:45 pm
Hi Tony... still not 100% clear on what the data file actually looks like... can it have more than on person in it and, if it does, will it look like this? Any chance of you posting one of the files? Please, make sure there's nothing personal in the file if you do! No SSN's, account numbers, etc. Or, just make a small example like I did below...
<>
First Name, John, 0
Middle Name, A, 0
Last Name, Doe, 0
<>
First Name, Sally, 0
Middle Name, B, 0
Last Name, Good, 0
<>
First Name, Little, 0
Middle Name, Red, 0
Last Name, Devil, 0
<>
First Name, Jolly, 0
Middle Name, Green, 0
Last Name, Giant, 0
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2008 at 9:56 pm
Here's an example:
<al99999>
<>
First Name, Mark, 0
Middle Name, Stan, 0
Last Name, Richards, 0
Address, 1003 Test Drive, 0
City, Atlanta, 0
PostalCode, 30071, 0
This is still a test and it represents an application. <al99999> represents the application number. This will give you a little more insight. There will only be one application within a *.txt file.
BTW, anything enclosed between <> represents a comment. The 0 on the end represent something too, but I forgot. I do remember it not being relevant for my needs.
March 18, 2008 at 10:36 pm
Ok... this is even easier than BCP...
Initial setup...
--===== Create a linked server to the drive and path you desire.
EXEC dbo.sp_AddLinkedServer TxtSvr,
'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'C:\Temp',
NULL,
'Text'
GO
--===== Set up login mappings.
EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL
GO
--===== List the tables in the linked server which is really a list of
-- file names in the directory. Note that the "#" sign in the
-- Table_Name is where the period in the filename actually goes.
-- You could drop this into a temp table and loop through the
-- files that fit a "pattern"
EXEC dbo.sp_Tables_Ex TxtSvr
GO
From there, it's pretty easy...
--===== Create a temp table to do a little work in because we can't
-- predict what the first column will be named
CREATE TABLE #RawData
(F1 VARCHAR(30) PRIMARY KEY CLUSTERED,
F2 VARCHAR(30),
F3 VARCHAR(30))
--===== Query one of the files by using a four-part name.
INSERT INTO #RawData
(F1,F2,F3)
SELECT *
FROM TxtSvr...[TestApp#txt]
WHERE F2 IS NOT NULL
--===== Demo what the data in #RawData is (Not a part of the solution)
SELECT * FROM #RawData
--===== Flatten the data
SELECT MAX(CASE WHEN F1 = 'First Name' THEN F2 ELSE '' END) AS FirstName,
MAX(CASE WHEN F1 = 'Middle Name' THEN F2 ELSE '' END) AS MiddleName,
MAX(CASE WHEN F1 = 'Last Name' THEN F2 ELSE '' END) AS LastName,
MAX(CASE WHEN F1 = 'Address' THEN F2 ELSE '' END) AS Address,
MAX(CASE WHEN F1 = 'City' THEN F2 ELSE '' END) AS City,
MAX(CASE WHEN F1 = 'PostalCode' THEN F2 ELSE '' END) AS PostalCode
FROM #RawData
Note that I had created a file called "TestApp.txt" in my C:\Temp directory with the data you provided... here's what I get for output...
(6 row(s) affected)
F1 F2 F3
------------------------------ ------------------------------ ------------------------------
Address 1003 Test Drive 0
City Atlanta 0
First Name Mark 0
Last Name Richards 0
Middle Name Stan 0
PostalCode 30071 0
(6 row(s) affected)
FirstName MiddleName LastName Address City PostalCode
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------
Mark Stan Richards 1003 Test Drive Atlanta 30071
(1 row(s) affected)
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2008 at 7:41 am
This is an insurance application.
There are cases where we will need to derive columns information based on a combination of data we received in the application.
For example, we have to determine whether or not the give a senior citizens discount. To do that, we have to check their age and use another field (or column) indicator to see if that person has taken a senior prevention driver's course. In an application, this may look like the following:
DateOfBirth, 04/12/1953, 0
Driver1_Info, STE | TUK | SEN, 0 (where SEN represents a senior prevention driver's course)
In addition, we have to import the data in our already made database structure since we receive a variety of different file formats (flat files, csv files, XML, etc)
In the application, we will have information such as policy, vehicles, coverages, etc. We have to import the data into several different tables. For example, any data related to general policy information would go to a Policy table. Any data related to vehicles would go to a Vehicles table. Any data related to coverages would got to a Coverages table.
We are in the process of looking at BizTalk. But until that becomes feasible, I felt that SSIS would probably help in this situation.
March 19, 2008 at 8:25 am
Tony, let's not build this one little requirement at a time... how about posting a whole user (like I asked before, no private info, please) so we can help you out? None of this needs SSIS if you don't want it... but, even if you do, no one can help if you keep nickle-diming us with additional scope.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2008 at 8:57 am
Hello Jim,
When I originally read your solution, it seem like you got the impression that I was looking to take the data from application and create a database table that mimics the structure of the application. Am I correct in that the sole purpose of your solution would be to flatten the data?
I include more information so you could understand to full context of what I need to do. I am not asking you to do my work for me. I am asking for guidance on what is the best solution to my problem.
Here is an example application file.
<AL7999,2>
<>
Agency_Addr1,114 LAKEWOOD DRIVE,0
Agency_City,DANVILLE,0
Agency_Contact,MARK,0
Agency_County,SAINT CHARLES,0
Agency_EMail,info@comparativeia.com,0
Agency_Fax,(985) 785-9200,0
Agency_Name,Comparative Insurance Agency, Inc.,0
Agency_Phone,(985) 785-8000,0
Agency_State,LA,0
Agency_Zip,70070,0
Carr_Addr1,Dairyland,0
Carr_Addr2,P.O. Box 35748,0
Carr_BrokerNumber,54321,0
Carr_City,Richmond,0
Carr_Claims,(800) 379-9306,0
Carr_Fax,(804) 794-3558,0
Carr_GeneralAgent,DAIRYLAND INSURANCE CO.,0
Carr_LegalName,DAIRYLAND INSURANCE COMPANY,0
Carr_NAIC,21164,0
Carr_Name,DAIRYLAND INSURANCE CO.,0
Carr_Phone,() -,0
Carr_State,VA,0
Carr_Zip,23235,0
Carr_Zip4,0728,0
Fin_Addr1,Dairyland Insurance Company,0
Fin_Addr2,PO Bix 35748,0
Fin_City,Richmond,0
Fin_LegalName,DAIRYLAND INSURANCE COMPANY,0
Fin_Name,DAIRYLAND DIRECT BILL,0
Fin_State,VA,0
Fin_Zip,23235,0
Fin_Zip4,5748,0
Op1_Addr1,114 WOODLAWN DR.,0
Op1_City,DOUBLE SPRINGS,0
Op1_County,WINSTON,0
Op1_DLInStateSince,01/27/1986,0
Op1_DLSince,01/27/1986,0
Op1_DLState,AL,0
Op1_DOB,03/29/1969,0
Op1_Fax,(225) 859-6541,0
Op1_First,HOMER,0
Op1_HomePhone,(225) 785-9845,0
Op1_Last,SIMPSON,0
Op1_MS,MARRIED,0
Op1_Occupation,EMPLOYED,0
Op1_PrevLBI1,10000,0
Op1_PrevLBI2,20000,0
Op1_PrevPolExpDate,02/16/2005,0
Op1_Relationship,SELF,0
Op1_Sex,MALE,0
Op1_SSN,- -,0
Op1_State,AL,0
Op1_Vios,NO,0
Op1_WorkPhone,(225) 698-1234,0
Op1_Zip,35553,0
Op2_AskStudentAway,NO,0
Op2_City,DOUBLE SPRINGS,0
Op2_County,WINSTON,0
Op2_DLInStateSince,01/27/1983,0
Op2_DLSince,01/27/1983,0
Op2_DLState,AL,0
Op2_DOB,06/04/1966,0
Op2_First,MARGE,0
Op2_Last,SIMPSON,0
Op2_MS,MARRIED,0
Op2_Occupation,EMPLOYED,0
Op2_Relationship,WIFE,0
Op2_Sex,FEMALE,0
Op2_State,AL,0
Op2_Vios,NO,0
Op2_Zip,35553,0
Pol_AddOns,0.00,0
Pol_AddOnTotal,0,0
Pol_AgFinFee,0,0
Pol_AgMVRFee, 0,0
Pol_AgPolFee, 0,0
Pol_AgSR22Fee,0.00,0
Pol_ALMendotaScore,C,0
Pol_ALProgPrevTier,C,0
Pol_ALVictoriaPayPlan1,12 MO/20.00%/10 PAY,0
Pol_AskAllVehOwned,YES,0
Pol_AskEFTPayment,NO,0
Pol_AskVehUseBDC,YES,0
Pol_AssignedRisk,NO,0
Pol_AUFinData,00,0
Pol_BalanceOwe,254,0
Pol_CarrCode,05,0
Pol_CertReq,.......I........,0
Pol_ClientNo,0,0
Pol_Commission,190.50,0
Pol_ConvictedOfViolation,YES,0
Pol_Coverages,|LUM |OC |OC | | |,0
Pol_DiscApplied,..M....H........,0
Pol_EffDate,02/16/2005 12:00:00 AM,0
Pol_EntryDate,01/27/2005,0
Pol_ExpDate,08/16/2005 23:59:59,0
Pol_Exported,06/15/2005,0
Pol_FinAmtFin,1016,0
Pol_FinCode,19,0
Pol_FinDocStamps,0.00,0
Pol_FinDown,254.00,0
Pol_FinFee,20.00,0
Pol_FinPayment,259.00,0
Pol_FinPercentDown,20.00,0
Pol_FinTerm, 4,0
Pol_FinTotalPayment,1036,0
Pol_FinTotalSalePrice,1290,0
Pol_HOCode,YES - NAME UNKNOWN / ,0
Pol_HOOwn,YES,0
Pol_Info,C,0
Pol_ManualDate, 4/ 24/ 2003,0
Pol_MVRFee,0.00,0
Pol_OtherOpsUnder25,YES,0
Pol_OtherVehsInHouse,YES,0
Pol_PolFee,0.00,0
Pol_PolicyID,4,0
Pol_PolicyTotal,1270,0
Pol_PolNo,
,0
Pol_PolType,AUTOMOBILE,0
Pol_PrevCarrier,07,0
Pol_PrevCarrierName,AIG AUTO 2.1 / ,0
Pol_PrevCoType,NON STANDARD,0
Pol_ProgressiveAQDisc,YES,0
Pol_PurePrem,1270.00,0
Pol_RanMVR,YES,0
Pol_Rated,RATED,0
Pol_RevisionDate, 4/ 24/ 2003,0
Pol_SR22Fee,0.00,0
Pol_Status,Application,0
Pol_SUFITXQ1,YES,0
Pol_Tax, 0,0
Pol_Territory,29,0
Pol_Uploaded,06/15/2005,0
Pol_VehOwnByOther,YES,0
Prod_Name,MARK,0
Veh1_AskFWD,NO,0
Veh1_AskPersonal,YES,0
Veh1_AskTTop,NO,0
Veh1_AskTurbo,NO,0
Veh1_Body,SEDAN,0
Veh1_ClsPts,/ 0 PT,0
Veh1_CovAddlEquip,NO COV,0
Veh1_CovAddlEquipDed,NO COV,0
Veh1_CovCOL,250,0
Veh1_CovCOM,500,0
Veh1_CovLBI1,20000,0
Veh1_CovLBI2,40000,0
Veh1_CovLPD,10000,0
Veh1_CovMED,1000,0
Veh1_CovPIP,NO COV,0
Veh1_CovPIPDed,NO COV,0
Veh1_CovRR,NO COV,0
Veh1_CovTL,NO COV,0
Veh1_CovUBI1,20000,0
Veh1_CovUBI2,40000,0
Veh1_CovUPD,NO COV,0
Veh1_CovUPDDed,NO COV,0
Veh1_Cyl,4,0
Veh1_Discounts,AHI,0
Veh1_Disp,2.0,0
Veh1_Doors,4,0
Veh1_GAJeffersonNYSafeVehDisc,YES,0
Veh1_Info,PWX2,0
Veh1_ISO,11,0
Veh1_ISOPerfSym, -11,0
Veh1_Make,FORD,0
Veh1_Model,ESCORT,0
Veh1_OpAssigned, 1,0
Veh1_Operator,1,0
Veh1_PremCOL,350,0
Veh1_PremCOM,86,0
Veh1_PremLBI,86,0
Veh1_PremLPD,86,0
Veh1_PremMED,19,0
Veh1_PremUBI,43,0
Veh1_Rest,AIR/AIR,0
Veh1_Symbol5,ŸŸŸ,0
Veh1_Tran,AUTO,0
Veh1_Type,PASS,0
Veh1_Use,PERSONAL,0
Veh1_VIN,1FAFP13P?Y,0
Veh1_Year,2000,0
Veh2_AskFWD,NO,0
Veh2_AskPersonal,YES,0
Veh2_AskTTop,NO,0
Veh2_AskTurbo,NO,0
Veh2_Body,UTILITY,0
Veh2_ClsPts,/ 0 PT,0
Veh2_CovAddlEquip,NO COV,0
Veh2_CovAddlEquipDed,NO COV,0
Veh2_CovCOL,250,0
Veh2_CovCOM,500,0
Veh2_CovLBI1,20000,0
Veh2_CovLBI2,40000,0
Veh2_CovLPD,10000,0
Veh2_CovMED,1000,0
Veh2_CovPIP,NO COV,0
Veh2_CovPIPDed,NO COV,0
Veh2_CovRR,NO COV,0
Veh2_CovTL,NO COV,0
Veh2_CovUBI1,20000,0
Veh2_CovUBI2,40000,0
Veh2_CovUPD,NO COV,0
Veh2_CovUPDDed,NO COV,0
Veh2_Cyl,6,0
Veh2_Discounts,A,0
Veh2_Disp,4.0,0
Veh2_Doors,4,0
Veh2_Info,QTWY2,0
Veh2_ISO,08,0
Veh2_ISOPerfSym, -08,0
Veh2_Make,FORD,0
Veh2_Model,EXPLORER XL,0
Veh2_OpAssigned, 2,0
Veh2_Operator,2,0
Veh2_PremCOL,312,0
Veh2_PremCOM,72,0
Veh2_PremLBI,77,0
Veh2_PremLPD,77,0
Veh2_PremMED,19,0
Veh2_PremUBI,43,0
Veh2_Rest,MANUAL/MANUAL,0
Veh2_Symbol5,ŸŸŸ,0
Veh2_Tran,AUTO,0
Veh2_Type,MPV,0
Veh2_Use,PERSONAL,0
Veh2_VIN,1FM?U61E?,0
Veh2_Year,2000,0
Agency_SN,AL7999,0
March 19, 2008 at 9:28 am
Hello Jim,
Heh... I'm just going to assume that you meant "Jeff"... 😛
When I originally read your solution, it seem like you got the impression that I was looking to take the data from application and create a database table that mimics the structure of the application. Am I correct in that the sole purpose of your solution would be to flatten the data?
Yes... once flattened, you can do just about anything you need to with it. Didn't know (probably not enough coffee for me) that you wanted to insert each "application file" into many tables until one of your more recent posts which is when I asked for a "full file" to see what's going on.
I am not asking you to do my work for me. I am asking for guidance on what is the best solution to my problem.
Sorry... didn't mean to suggest otherwise, Tony... thanks for posting the data...
The code I posted is example code... some of the "gory" details for turning this into a multi file import are in the first piece of code where I mention the word "Loop" in the comments. There is a piece of code in that same first piece that will list all the file names in a directory... you just need to modify that by inserting that data into a temp table with INSERT/EXEC and then looping through the (filtered with a WHERE clause on the names) names of the files that you actually want to process.
If you expand the code to flatten the file(s) the I previously posted to include all the column names, it won't matter if each file has the columns in the same order or not... it'll just work because it uses your file as what it is... a Name/Value table.
Once flattened into the temp table, you can do INSERT/SELECTs from the temp table into the various tables you have.
The key to the example code I made is... it's very flexible in what it "sees" and what you can do with it. Modify the code as suggested above and give it a whirl... If you run into problems that you can't figure out, post back and let's see what we can do.
Now, you do, as you mention, have a couple of values that are actually "multi values". Those are actually pretty easy to split out, but what do you want to do once they are split out?
And, finally, the reason why I'm recommending this method is because that's how I frequently handle files that have "comment headers" like these in them... that and the fact that I just learned how to spell SSIS ...
I'm sure that an SSIS "Ninja" could also pop this out, but I don't know any 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2008 at 9:30 am
Does each file only have one of these policies in it, or does it have a repeating pattern of these?
I take it you have zero control over how this comes out, correct? For example - the policy number hanging out up there in a pattern different than the other lines willl make the process somewhat more complicated that say, if it were on each of the lines relevant as another data element.
By chance - the policy number doesn't also happen to be the filename too?
It's essentially a somewhat hierarchical data set, which besides importing will also need re-normalizing. Where does this need to end up? What do the tables look like?
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 19, 2008 at 10:17 am
Heh... I'm just going to assume that you meant "Jeff"... 😛
I am sorry. I meant Jeff. 😛
Yes... once flattened, you can do just about anything you need to with it. Didn't know (probably not enough coffee for me) that you wanted to insert each "application file" into many tables until one of your more recent posts which is when I asked for a "full file" to see what's going on.
Well, it is not exactly inserting each application file into many tables. It is more like taking information out of the application file and sending it to a variety of different tables.
The code I posted is example code... some of the "gory" details for turning this into a multi file import are in the first piece of code where I mention the word "Loop" in the comments. There is a piece of code in that same first piece that will list all the file names in a directory... you just need to modify that by inserting that data into a temp table with INSERT/EXEC and then looping through the (filtered with a WHERE clause on the names) names of the files that you actually want to process.
If you expand the code to flatten the file(s) the I previously posted to include all the column names, it won't matter if each file has the columns in the same order or not... it'll just work because it uses your file as what it is... a Name/Value table.
Once flattened into the temp table, you can do INSERT/SELECTs from the temp table into the various tables you have.
The key to the example code I made is... it's very flexible in what it "sees" and what you can do with it. Modify the code as suggested above and give it a whirl... If you run into problems that you can't figure out, post back and let's see what we can do.
Now, you do, as you mention, have a couple of values that are actually "multi values". Those are actually pretty easy to split out, but what do you want to do once they are split out?
And, finally, the reason why I'm recommending this method is because that's how I frequently handle files that have "comment headers" like these in them... that and the fact that I just learned how to spell SSIS ...
I'm sure that an SSIS "Ninja" could also pop this out, but I don't know any 😀
Ok, I will focus on getting the data flatten first.
March 19, 2008 at 10:37 am
Matt Miller (3/19/2008)
Does each file only have one of these policies in it, or does it have a repeating pattern of these?
There will only be one policy in each file. There can be more than one type of vehicle or coverage in a file though. For example,
Veh1_Make, Chevy, 0
Veh2_Make, Nissan, 0
Veh1_Year, 2004, 0
Veh2_Year, 2003, 0
etc
I take it you have zero control over how this comes out, correct?
For example - the policy number hanging out up there in a pattern different than the other lines willl make the process somewhat more complicated that say, if it were on each of the lines relevant as another data element.
By chance - the policy number doesn't also happen to be the filename too?
Unfortunately I don't. This is the format given to me by the third-party vendor. We don't go by policy numbers at this stage, but we do use application numbers. It is still too early to tell if the filename will be the same as the application number. My past experience tells me the application number is not the same as the filename.
It's essentially a somewhat hierarchical data set, which besides importing will also need re-normalizing. Where does this need to end up? What do the tables look like?
The tables are not fully normalized, but it is what it is. I can't post the full structure because the tables are humongous, but here is a glimpse of what they look like:
Policies
_______
PolicyId <---- unique identifier
FirstName
LastName
Address1
Address2
City
State
Zip
County
Phone1
Phone2
StartDate
EndDate
Term
Status
AppNo <----- application number
Vehicles
________
VehicleId
PolicyId
Year
Make
Model
Vin
Coverages
_________
CoverageId
PolicyId
VehicleId
Limit1
Limit2
Deductible
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply