May 5, 2017 at 5:04 am
Hi,
I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.
The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.
Thanks.
May 5, 2017 at 5:15 am
pwalter83 - Friday, May 5, 2017 5:04 AMHi,I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.
The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.
Thanks.
My normal approach is to use ELT rather than ETL, load everything into a staging table as it appears in the file and then do grouping and transformations.
😎
BTW, cannot see any attachments in your post.
May 5, 2017 at 6:53 am
pwalter83 - Friday, May 5, 2017 5:04 AMHi,I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.
The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.
Thanks.
Could just be me but I see no attachments.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 5, 2017 at 6:54 am
Eirikur Eiriksson - Friday, May 5, 2017 5:15 AMpwalter83 - Friday, May 5, 2017 5:04 AMHi,I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.
The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.
Thanks.
My normal approach is to use ELT rather than ETL, load everything into a staging table as it appears in the file and then do grouping and transformations.
😎
BTW, cannot see any attachments in your post.
Heh... I prefer "LET". 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 7, 2017 at 3:20 am
Jeff Moden - Friday, May 5, 2017 6:54 AMEirikur Eiriksson - Friday, May 5, 2017 5:15 AMpwalter83 - Friday, May 5, 2017 5:04 AMHi,I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.
The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.
Thanks.
My normal approach is to use ELT rather than ETL, load everything into a staging table as it appears in the file and then do grouping and transformations.
😎
BTW, cannot see any attachments in your post.Heh... I prefer "LET". 😉
Kind of "LET IT BI"
😎
May 7, 2017 at 7:14 am
If the header and files relate to each other, then it may be difficult attaching the second and more headers to their respective headers. I've used Python in the past to split the flat file, header and the header records into separate files. As the header rows become row delimiters, it's easy to read the file in chunks from header-to-header, where you can partition the records with their respective headers into separate flat files. That way they are easily reusable for reimportation later without having to stage more bytes than what you need versus just loading it into a staging table leaving the flat file intact in the same multi-header file.
Something as simple as (psudo code here):
import pandas as df
# Load your CSV file
file = df.read_csv('location\yourfile.csv')
# Make a new list to hold your rows that you want to separate
newRecords= []
# Define your header rows for exporting to csv.
label1 = ['Date', 'Blah', 'Blah2']
# Iterate over each row in the data frame.
# Once you get to the second header in the file, break, else append record to a list
for line in file:
if line['FieldName'] == "Date":
break
else:
newRecords.append([line])
# Now that you have a list/array of records, dump them into a dataframe
newFile = df.DataFrame(newRecords, header = label1)
# Then export the data frame as a csv file to import with SSIS
df.to_csv(YourPartitonedFile.csv)
That being said, recommended Python 2.7 (google Windows Anaconda Python) with Pandas. Both to_csv and read_csv have options to ignore headers, set headers, custom delimiters, skip rows, read in chunks and all that good stuff. Pandas does not work that well on larger files with straight reads, but it's really easy to read, transform and export flat files with Python + Pandas. I take this approach to distribute the workload as opposed to SSIS doing all the work along with best practices on doing as much on disk before landing data into your database (i.e.: helping offload the work from SQL Server).
This is just one option and not your only option! :hehe:
May 8, 2017 at 3:37 am
Hi,
I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
Thanks in advance for your help.
May 8, 2017 at 4:14 am
pwalter83 - Monday, May 8, 2017 3:37 AMHi,
I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
Thanks in advance for your help.
This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
😎
May 8, 2017 at 4:51 am
Eirikur Eiriksson - Monday, May 8, 2017 4:14 AMpwalter83 - Monday, May 8, 2017 3:37 AMHi,
I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
Thanks in advance for your help.This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
😎
I have tried different ways to achieve it but get errors every time.
Could you please suggest how to go about this ? Thanks.
May 8, 2017 at 5:13 am
pwalter83 - Monday, May 8, 2017 4:51 AMEirikur Eiriksson - Monday, May 8, 2017 4:14 AMpwalter83 - Monday, May 8, 2017 3:37 AMHi,
I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
Thanks in advance for your help.This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
😎I have tried different ways to achieve it but get errors every time.
Could you please suggest how to go about this ? Thanks.
What kind of errors are you getting?
😎
Import everything as (n)varchar, use a query for the type casting needed and add a row_number for the header rows filtering on the first instance.
May 8, 2017 at 9:21 am
This is a good example of why you should never load a file directly into it's final resting place.
The file is correctly TAB delimited in all of the right places. As Eirikur implies, load the file into a staging table where all of the columns are character-based data-types. Then, simply insert from that staging table into the final table using a WHERE clause of WHERE MODEL_DSC <> 'MODEL_DSC' and call it an easy day.
Then, hunt down the idiots that are sending you this rubbish and explain the facts of life to them.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2017 at 2:56 am
Eirikur Eiriksson - Monday, May 8, 2017 5:13 AMpwalter83 - Monday, May 8, 2017 4:51 AMEirikur Eiriksson - Monday, May 8, 2017 4:14 AMpwalter83 - Monday, May 8, 2017 3:37 AMHi,
I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
Thanks in advance for your help.This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
😎I have tried different ways to achieve it but get errors every time.
Could you please suggest how to go about this ? Thanks.What kind of errors are you getting?
😎
Import everything as (n)varchar, use a query for the type casting needed and add a row_number for the header rows filtering on the first instance.
Hi,
Thanks for your suggestion. Could you please explain your solution in a bit more detail ? Thanks.
May 9, 2017 at 5:47 am
Jeff Moden - Monday, May 8, 2017 9:21 AMThis is a good example of why you should never load a file directly into it's final resting place.The file is correctly TAB delimited in all of the right places. As Eirikur implies, load the file into a staging table where all of the columns are character-based data-types. Then, simply insert from that staging table into the final table using a WHERE clause of WHERE MODEL_DSC <> 'MODEL_DSC' and call it an easy day.
Then, hunt down the idiots that are sending you this rubbish and explain the facts of life to them.
Hi,
Thanks for the solution but I am still confused as to what component to use to insert from the staging table. In the SSIS package I have used a For each loop container and within that is the Data Flow Task and within the Data Flow task, the process flow starts from Flat File source and then there is a data conversion which finally copies the data to the destination SQL table (explain with attached screenshots).
Do you know which component I need to use and where does it need to be inserted within the Data Flow task ? Many thanks again.
May 10, 2017 at 3:00 am
pwalter83 - Tuesday, May 9, 2017 5:47 AMJeff Moden - Monday, May 8, 2017 9:21 AMThis is a good example of why you should never load a file directly into it's final resting place.The file is correctly TAB delimited in all of the right places. As Eirikur implies, load the file into a staging table where all of the columns are character-based data-types. Then, simply insert from that staging table into the final table using a WHERE clause of WHERE MODEL_DSC <> 'MODEL_DSC' and call it an easy day.
Then, hunt down the idiots that are sending you this rubbish and explain the facts of life to them.
Hi,
Thanks for the solution but I am still confused as to what component to use to insert from the staging table. In the SSIS package I have used a For each loop container and within that is the Data Flow Task and within the Data Flow task, the process flow starts from Flat File source and then there is a data conversion which finally copies the data to the destination SQL table (explain with attached screenshots).Do you know which component I need to use and where does it need to be inserted within the Data Flow task ? Many thanks again.
Hi,
Can anyone please help on this this, its very urgent. Thanks.
May 10, 2017 at 3:59 am
It's so simple that you don't even need to use SSIS. You can use bcp or BULK INSERT (or SSIS if you insist, or if it's part of a larger set of tasks) to copy everything from the file into a table with the same number of columns as the file has. This assumes that the column headers match the columns themselves one-to-one, right? Then delete all header rows (something like WHERE Col1 = 'Header1' AND Col2 = 'Header2'...). Finally, transfer to the destination table, performing any necessary conversions along the way (date formats, decimals etc).
John
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply