Importing a Flat File Containing Header and Detail Rows

Question

Importing a Flat File Containing Header and Detail Rows

Polymorphist

SSCommitted

Points: 1909
More actions
June 1, 2018 at 7:27 pm

#353728

Hello All,
I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.
I have a flat file which contains one single header row and multiple detail rows. The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.
FILE:
Here is what the file looks like when I open it in Notepad++:
1H582620172018 -- this is the header which always starts with "1H". It contains the company contract number (5826)
2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones. Always starts with "2H"
3H19501115 --this is the customer birthdate for Mary Jones. Always starts with "3H"
2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
3H19600901 --Bill Smith's birthdate
After I import the file into a landing table, each part of the customer data is in it's own record just like in the file
RESULT I NEED:
I need to create one record for each customer in the file:
COMPANY_CONTRACT_ID CUSTOMER_CONTRACT_ID FIRST_NAME LAST_NAME BIRTHDATE
5826 1234 Mary Jones 1950-11-15
5826 1236 Bill Smith 1960-09-01
I'm having a hell of a time trying to get the data into one single record. Any ideas as to how I might do this?
Hopefully I didn't confuse anyone. I can honestly say i've never worked with a file which had header/details rows like this before.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16155 More actions · Answer 1

The following is one way of doing it.

Requires that you load the files onto a staging table with a identity column and 2 other columns, reckey and data as per below example - load must not be done in parallel.

Note that the output below does not match your desired output as I do believe you added a extra number on the second customer id.
This is just one way - not necessarily the best one.
If your fields are fixed size then the charindex below isn't required as you can just split directly based on the position of each field.
also assumes that there will always be a record type of 2H to be able to identify which Customer the date of birth relates to.
similar for the 1H record.

And obviously that the input file will always be on the correct order - although this is pretty standard on these type of files.

if object_id('tempdb..#input') is not null drop table #input create table #input ( inputrecord varchar(2000) ) insert into #input select '1H582620172018' insert into #input select '2H1234 JONES MARY' insert into #input select '3H19501115' insert into #input select '2H12346 SMITH BILL' insert into #input select '3H19600901'

if object_id('tempdb..#staging') is not null drop table #staging

create table #staging ( rowid int identity (1, 1) , reckey char(2) , data varchar(2000) )

insert into #staging select substring(i.inputrecord, 1, 2) as reckey , substring(i.inputrecord, 3, 2000) as data from #input i

create clustered index #staging_ix1 on #staging (rowid )

create nonclustered index #staging_ix2 on #staging (reckey , rowid ) include (data)

select max(t.company_contract_id) as company_contract_id , t.customer_contract_id , max(substring(t.names, charindex(' ', t.names) + 1, 100)) as first_name , max(substring(t.names, 1, charindex(' ', t.names) - 1)) as last_name , max(t.birthdate) as birthdate from (select substring(sub2h.data, 1, charindex(' ', sub2h.data) - 1) as customer_contract_id , substring(submain.data, 1, 4) as company_contract_id , substring(sub2h.data, charindex(' ', sub2h.data) + 1, 200) as names , case when main.reckey = '3H' then convert(date, substring(main.data, 1, 8), 112) else null end as birthdate

from #staging main outer apply (select top 1 * from #staging st2 where st2.rowid <= main.rowid and st2.reckey = '1H' order by st2.rowid desc ) submain outer apply (select top 1 * from #staging st2 where st2.rowid < main.rowid and st2.reckey = '2H' order by st2.rowid desc ) sub2h where sub2h.reckey is not null ) t group by t.customer_contract_id

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 2

Polymorphist - Friday, June 1, 2018 7:27 PM
Hello All,
I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.
I have a flat file which contains one single header row and multiple detail rows. The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.
FILE:
Here is what the file looks like when I open it in Notepad++:
1H582620172018 -- this is the header which always starts with "1H". It contains the company contract number (5826)
2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones. Always starts with "2H"
3H19501115 --this is the customer birthdate for Mary Jones. Always starts with "3H"
2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
3H19600901 --Bill Smith's birthdate
After I import the file into a landing table, each part of the customer data is in it's own record just like in the file
RESULT I NEED:
I need to create one record for each customer in the file:
COMPANY_CONTRACT_ID CUSTOMER_CONTRACT_ID FIRST_NAME LAST_NAME BIRTHDATE
5826 1234 Mary Jones 1950-11-15
5826 1236 Bill Smith 1960-09-01
I'm having a hell of a time trying to get the data into one single record. Any ideas as to how I might do this?
Hopefully I didn't confuse anyone. I can honestly say i've never worked with a file which had header/details rows like this before.

This can all be done during the actual import. What and where are the delimiters in each of the lines in the file and what is the end of line marker?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Burrows SSC Guru Points: 65142 More actions · Answer 3

Jeff Moden - Sunday, June 3, 2018 7:31 AM
Polymorphist - Friday, June 1, 2018 7:27 PM
Hello All,
I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.
I have a flat file which contains one single header row and multiple detail rows. The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.
FILE:
Here is what the file looks like when I open it in Notepad++:
1H582620172018 -- this is the header which always starts with "1H". It contains the company contract number (5826)
2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones. Always starts with "2H"
3H19501115 --this is the customer birthdate for Mary Jones. Always starts with "3H"
2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
3H19600901 --Bill Smith's birthdate
After I import the file into a landing table, each part of the customer data is in it's own record just like in the file
RESULT I NEED:
I need to create one record for each customer in the file:
COMPANY_CONTRACT_ID CUSTOMER_CONTRACT_ID FIRST_NAME LAST_NAME BIRTHDATE
5826 1234 Mary Jones 1950-11-15
5826 1236 Bill Smith 1960-09-01
I'm having a hell of a time trying to get the data into one single record. Any ideas as to how I might do this?
Hopefully I didn't confuse anyone. I can honestly say i've never worked with a file which had header/details rows like this before.
This can all be done during the actual import. What and where are the delimiters in each of the lines in the file and what is the end of line marker?

My suspicion is that 1H and 3H are not delimited and 2H tab delimited (hopefully not space delimited :pinch: )
Will have to wait for confirmation though 🙂

Far away is close at hand in the images of elsewhere.
Anon.

MMartin1 One Orange Chip Points: 27795 More actions · Answer 4

Seems to me that it should be that each row is the column delimiter and the row delimiter is the start of the token '1H ...'. However the data presented does not quite fit this. Still some clarity needed.

----------------------------------------------------

Polymorphist SSCommitted Points: 1909 More actions · Answer 5

Thanks for all of the replies.

The file is a fixed-width with the row delimiter being {cr}{lf}.

frederico_fonseca SSCoach Points: 16155 More actions · Answer 6

The sample provided follows standard definition for usage from mainframe style extract files - or PC Cobol files that follow old conventions.
But not just those follow these - I remember when doing Basel II (or was it Solvency?) many years ago that the file definitions were like this - 1 major header, 1 major trailer, then multiple blocks of detail records - each block with 1 sub-header and multiple detail records.

records would be LF or CR+LF delimited
record identifier will be a predetermined code for each type - on the above example 1H is Company ID

Based on original example
1H582620172018
2H1234 JONES MARY
3H19501115
2H12346 SMITH BILL
3H19600901

Record type
1H - Company code
2H - Customer detail
3H - Customer DOB

if file is multiple company we could have

1H582620172018 -- this is companyid + a sequence that looks like from and to year
2H1234 JONES MARY -- Customer code + customer names (last and first)
3H19501115 -- customer DOB
2H12346 SMITH BILL -- New customer block
3H19600901
1H988120172018 -- new company block
2H1234 BURKE LOIS -- new customer block
3H19701115
2H8311 SMITH BILL -- new customer block
3H19801001

Now tricky sometimes is that headers that do not contain data may be excluded. On the above 3H could be excluded if no DOB exists for the client - or it could be supplied with a "null" date - 00000000 or 99999999 for example

OP needs to contact whoever creates the file and ask for full record definition and rules for when they are ommited in case it happens.

Polymorphist SSCommitted Points: 1909 More actions · Answer 7

Hello All -
Thanks again for all of your input, really helped. I modified the sample query provided by Frederico_Fonseca and it works perfectly. I now have a fully-functioning job which imports multiple files.
Really saved my bacon, thanks again.

Regarding the file format, I've never worked with this type of file before and it occurred to me that it might be an older format style. For what it's worth, i'm currently at a health insurance company and there seem to be a lot of old processes which haven't been touched for decades. I do know that there are some old COBOL processes running somewhere. Not long ago I converted a bunch of old KornShell scripts, most of which had not been touched since the late 1980s 🙂

Always an adventure.