June 1, 2018 at 7:27 pm
Hello All,
I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.
I have a flat file which contains one single header row and multiple detail rows. The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.
FILE:
Here is what the file looks like when I open it in Notepad++:
1H582620172018 -- this is the header which always starts with "1H". It contains the company contract number (5826)
2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones. Always starts with "2H"
3H19501115 --this is the customer birthdate for Mary Jones. Always starts with "3H"
2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
3H19600901 --Bill Smith's birthdate
After I import the file into a landing table, each part of the customer data is in it's own record just like in the file
RESULT I NEED:
I need to create one record for each customer in the file:
COMPANY_CONTRACT_ID CUSTOMER_CONTRACT_ID FIRST_NAME LAST_NAME BIRTHDATE
5826 1234 Mary Jones 1950-11-15
5826 1236 Bill Smith 1960-09-01
I'm having a hell of a time trying to get the data into one single record. Any ideas as to how I might do this?
Hopefully I didn't confuse anyone. I can honestly say i've never worked with a file which had header/details rows like this before.
June 2, 2018 at 11:27 am
The following is one way of doing it.
Requires that you load the files onto a staging table with a identity column and 2 other columns, reckey and data as per below example - load must not be done in parallel.
Note that the output below does not match your desired output as I do believe you added a extra number on the second customer id.
This is just one way - not necessarily the best one.
If your fields are fixed size then the charindex below isn't required as you can just split directly based on the position of each field.
also assumes that there will always be a record type of 2H to be able to identify which Customer the date of birth relates to.
similar for the 1H record.
And obviously that the input file will always be on the correct order - although this is pretty standard on these type of files.
if object_id('tempdb..#input') is not null
drop table #input
create table #input
( inputrecord varchar(2000)
)
insert into #input
select '1H582620172018'
insert into #input
select '2H1234 JONES MARY'
insert into #input
select '3H19501115'
insert into #input
select '2H12346 SMITH BILL'
insert into #input
select '3H19600901'
if object_id('tempdb..#staging') is not null
drop table #staging
create table #staging
( rowid int identity (1, 1)
, reckey char(2)
, data varchar(2000)
)
insert into #staging
select substring(i.inputrecord, 1, 2) as reckey
, substring(i.inputrecord, 3, 2000) as data
from #input i
create clustered index #staging_ix1 on #staging
(rowid
)
create nonclustered index #staging_ix2 on #staging
(reckey
, rowid
)
include (data)
select max(t.company_contract_id) as company_contract_id
, t.customer_contract_id
, max(substring(t.names, charindex(' ', t.names) + 1, 100)) as first_name
, max(substring(t.names, 1, charindex(' ', t.names) - 1)) as last_name
, max(t.birthdate) as birthdate
from (select substring(sub2h.data, 1, charindex(' ', sub2h.data) - 1) as customer_contract_id
, substring(submain.data, 1, 4) as company_contract_id
, substring(sub2h.data, charindex(' ', sub2h.data) + 1, 200) as names
, case
when main.reckey = '3H'
then convert(date, substring(main.data, 1, 8), 112)
else null
end as birthdate
from #staging main
outer apply (select top 1 *
from #staging st2
where st2.rowid <= main.rowid
and st2.reckey = '1H'
order by st2.rowid desc
) submain
outer apply (select top 1 *
from #staging st2
where st2.rowid < main.rowid
and st2.reckey = '2H'
order by st2.rowid desc
) sub2h
where sub2h.reckey is not null
) t
group by t.customer_contract_id
June 3, 2018 at 7:31 am
Polymorphist - Friday, June 1, 2018 7:27 PMHello All,
I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.
I have a flat file which contains one single header row and multiple detail rows. The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.FILE:
Here is what the file looks like when I open it in Notepad++:1H582620172018 -- this is the header which always starts with "1H". It contains the company contract number (5826)
2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones. Always starts with "2H"
3H19501115 --this is the customer birthdate for Mary Jones. Always starts with "3H"
2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
3H19600901 --Bill Smith's birthdateAfter I import the file into a landing table, each part of the customer data is in it's own record just like in the file
RESULT I NEED:
I need to create one record for each customer in the file:COMPANY_CONTRACT_ID CUSTOMER_CONTRACT_ID FIRST_NAME LAST_NAME BIRTHDATE
5826 1234 Mary Jones 1950-11-15
5826 1236 Bill Smith 1960-09-01I'm having a hell of a time trying to get the data into one single record. Any ideas as to how I might do this?
Hopefully I didn't confuse anyone. I can honestly say i've never worked with a file which had header/details rows like this before.
This can all be done during the actual import. What and where are the delimiters in each of the lines in the file and what is the end of line marker?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2018 at 2:13 am
Jeff Moden - Sunday, June 3, 2018 7:31 AMPolymorphist - Friday, June 1, 2018 7:27 PMHello All,
I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.
I have a flat file which contains one single header row and multiple detail rows. The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.FILE:
Here is what the file looks like when I open it in Notepad++:1H582620172018 -- this is the header which always starts with "1H". It contains the company contract number (5826)
2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones. Always starts with "2H"
3H19501115 --this is the customer birthdate for Mary Jones. Always starts with "3H"
2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
3H19600901 --Bill Smith's birthdateAfter I import the file into a landing table, each part of the customer data is in it's own record just like in the file
RESULT I NEED:
I need to create one record for each customer in the file:COMPANY_CONTRACT_ID CUSTOMER_CONTRACT_ID FIRST_NAME LAST_NAME BIRTHDATE
5826 1234 Mary Jones 1950-11-15
5826 1236 Bill Smith 1960-09-01I'm having a hell of a time trying to get the data into one single record. Any ideas as to how I might do this?
Hopefully I didn't confuse anyone. I can honestly say i've never worked with a file which had header/details rows like this before.
This can all be done during the actual import. What and where are the delimiters in each of the lines in the file and what is the end of line marker?
My suspicion is that 1H and 3H are not delimited and 2H tab delimited (hopefully not space delimited :pinch: )
Will have to wait for confirmation though 🙂
Far away is close at hand in the images of elsewhere.
Anon.
June 4, 2018 at 11:17 am
Seems to me that it should be that each row is the column delimiter and the row delimiter is the start of the token '1H ...'. However the data presented does not quite fit this. Still some clarity needed.
----------------------------------------------------
June 4, 2018 at 11:56 am
Thanks for all of the replies.
The file is a fixed-width with the row delimiter being {cr}{lf}.
June 4, 2018 at 11:59 am
The sample provided follows standard definition for usage from mainframe style extract files - or PC Cobol files that follow old conventions.
But not just those follow these - I remember when doing Basel II (or was it Solvency?) many years ago that the file definitions were like this - 1 major header, 1 major trailer, then multiple blocks of detail records - each block with 1 sub-header and multiple detail records.
records would be LF or CR+LF delimited
record identifier will be a predetermined code for each type - on the above example 1H is Company ID
Based on original example
1H582620172018
2H1234 JONES MARY
3H19501115
2H12346 SMITH BILL
3H19600901
Record type
1H - Company code
2H - Customer detail
3H - Customer DOB
if file is multiple company we could have
1H582620172018 -- this is companyid + a sequence that looks like from and to year
2H1234 JONES MARY -- Customer code + customer names (last and first)
3H19501115 -- customer DOB
2H12346 SMITH BILL -- New customer block
3H19600901
1H988120172018 -- new company block
2H1234 BURKE LOIS -- new customer block
3H19701115
2H8311 SMITH BILL -- new customer block
3H19801001
Now tricky sometimes is that headers that do not contain data may be excluded. On the above 3H could be excluded if no DOB exists for the client - or it could be supplied with a "null" date - 00000000 or 99999999 for example
OP needs to contact whoever creates the file and ask for full record definition and rules for when they are ommited in case it happens.
June 5, 2018 at 1:55 am
Hello All -
Thanks again for all of your input, really helped. I modified the sample query provided by Frederico_Fonseca and it works perfectly. I now have a fully-functioning job which imports multiple files.
Really saved my bacon, thanks again.
Regarding the file format, I've never worked with this type of file before and it occurred to me that it might be an older format style. For what it's worth, i'm currently at a health insurance company and there seem to be a lot of old processes which haven't been touched for decades. I do know that there are some old COBOL processes running somewhere. Not long ago I converted a bunch of old KornShell scripts, most of which had not been touched since the late 1980s 🙂
Always an adventure.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply