Loading x number of CSV files into x number of unique tables of a database

  • My apologies.  I was on vacation last week (but got emergency calls for 17.5 hours of work) and have been slammed for the day.  I've got commitments from now until the weekend.  I can write some code to auto-magically create a shedload of test files and then write the code to show you how to do all the things we've been talking about but I can't get to it until this coming weekend.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No problem at all. I totally understand! Appreciate all the help 🙂

  • I finally had a decent bit of time to start working on this and went back through all of the comments to refresh my memory.  I first came across this and remembered it correctly....

    masterelaichi wrote:

    The filenames are named with the groups there are in - Filename1_Group1.csv, Filename2_Group1.CSV, Filename3_Group1.csv; Filename1_Group2.csv, Filename2_Group2.CSV, Filename3_Group2.csv, and so on

    When I create the dataflow task, I add the name of the file as the name of the OLEDB destination component. So when I go to e OLEDB destination->table load or fast view-> NEW, the SQL code is CREATE TABLE Filename1_Group1. I edit this step by adding an identifier column using IDENTITY (1,1)

    Once I do this, the resulting tables in SQL are named the same as the raw file mentioned above. To answer your question as to how I identify, it is manual as the names of the file is the name of the sql table that is meant to be created. I can do these in batches, i.e, load all of Group1, followed by Group2, and so on

    THEN I came across this and this wasn't something I remembered correctly.

    masterelaichi wrote:

    The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)

    The only thing that I remembered was that the files were built from Excel files.  What I didn't remember was that you were NOT getting the data as CSV files.  Instead, YOU are getting the files as Excel files and then YOU are the one creating CSVs from those Excel files.

    That might make things even easier but, if you would, could you build a couple of EXCEL files like what you get but without all the PII in them along with naming the files like they are named when you receive them and attach them, please?

    Sorry for the previous misunderstanding on my part but we should be able to do the whole shebang reading directly from the EXCEL files in T-SQL... no CSV files required.  You won't even have to open an Excel file.

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Jeff. Truly appreciate your help with this!

    The only thing that I remembered was that the files were built from Excel files.  What I didn't remember was that you were NOT getting the data as CSV files.  Instead, YOU are getting the files as Excel files and then YOU are the one creating CSVs from those Excel files. 

     

    That's right. The reason I convert the excel files into CSVs are for two reasons mainly 1) the excel files are password protected and; 2)depending on the file "group", some files unwanted header rows which are not required for the sql tables, with the relevant data contained in a named range a couple of rows further down

    Converting these excel files to csv files isn't much too much of an issue as I recently came across a powershell script which helps me to automate this piece of work ( this was tedious in itself! converting 50+ excel files into csvs by removing the password and saving it using a specific file name). Also, I tend to use these csv files to do some preliminary checks using Power BI

    So based on this, would it help you if I attach samples of the CSV files with the names and with some sample columns ? Although, the actual files we get have PII in the filenames itself, so I would have to mask that

    I am just curious as to what procedure you will be using. I was digging around a bit and I came across something called script task but that requires a bit of coding. Will you be using something like that or is it possible to do this using dataflow tasks, foreach container and other controls straight of the the GUI in SSIS ?

     

    Thanks again for all the help!

     

     

     

     

    • This reply was modified 2 years, 11 months ago by  masterelaichi.
  • I was having a look at some of the raw files to upload to the tables and realised that I missed pointing out that some of the source files have completely empty columns in between

    For example, in the employee_table group of files, the format is as such

    SampleImage

     

    Would this affect the modelling much ? The format of the groups are similar, i.e each group has the gaps in the same columns

    Previously, the dataflow task used to automatically detect the column gaps as Column1,Column2, and so on. Hope it doesn't complicate it further

     

     

     

  • masterelaichi wrote:

    Converting these excel files to csv files isn't much too much of an issue as I recently came across a powershell script which helps me to automate this piece of work

    careful with this - if the powershell is using the interops com which requires Excel to be installed on the machine where the script runs then you would not be able to run this on the server without breaking MS licensing terms - and it would be unsupported.

  • masterelaichi wrote:

    Thanks, Jeff. Truly appreciate your help with this!

    --snip

    I am just curious as to what procedure you will be using. I was digging around a bit and I came across something called script task but that requires a bit of coding. Will you be using something like that or is it possible to do this using dataflow tasks, foreach container and other controls straight of the the GUI in SSIS ?

    I'm going to take a wild guess here and assure you that Jeff will not be using any part of SSIS in his solution ... he can't abide it! Jeff can do almost anything using T-SQL and extended T-SQL, including making the tea, and that's what his solution will comprise.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • masterelaichi wrote:

    Thanks, Jeff. Truly appreciate your help with this!

    The only thing that I remembered was that the files were built from Excel files.  What I didn't remember was that you were NOT getting the data as CSV files.  Instead, YOU are getting the files as Excel files and then YOU are the one creating CSVs from those Excel files. 

    That's right. The reason I convert the excel files into CSVs are for two reasons mainly 1) the excel files are password protected and; 2)depending on the file "group", some files unwanted header rows which are not required for the sql tables, with the relevant data contained in a named range a couple of rows further down

    Converting these excel files to csv files isn't much too much of an issue as I recently came across a powershell script which helps me to automate this piece of work ( this was tedious in itself! converting 50+ excel files into csvs by removing the password and saving it using a specific file name). Also, I tend to use these csv files to do some preliminary checks using Power BI

    So based on this, would it help you if I attach samples of the CSV files with the names and with some sample columns ? Although, the actual files we get have PII in the filenames itself, so I would have to mask that

    I am just curious as to what procedure you will be using. I was digging around a bit and I came across something called script task but that requires a bit of coding. Will you be using something like that or is it possible to do this using dataflow tasks, foreach container and other controls straight of the the GUI in SSIS ?

    Thanks again for all the help!

    I have code that automatically adapts to a whole lot of things including ever-changing headers.  I guess we'll find out with some of the samples you  can send me.

    As to password protected spreadsheets, I've not had to work on such things before.  I'll have to do a test to find out if the "ACE Drivers" have a provision for those.

    But, all of that may be a moot point if you have no issues with opening each one separately and saving it as a CSV.  It IS amazing to me that you're allowed to take password protected files and just save them as a CSV that anyone could open an use.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • masterelaichi wrote:

    I was having a look at some of the raw files to upload to the tables and realised that I missed pointing out that some of the source files have completely empty columns in between

    For example, in the employee_table group of files, the format is as such

    SampleImage

    Would this affect the modelling much ? The format of the groups are similar, i.e each group has the gaps in the same columns

    Previously, the dataflow task used to automatically detect the column gaps as Column1,Column2, and so on. Hope it doesn't complicate it further

    Due to the "auto-magic" naming the ACE drivers use, skipped columns shouldn't be a huge issue.  However, I'm thinking it's time to have a real live "come to Jesus" meeting with the users that are creating the spreadsheets.  This is why most DBA's hate spreadsheets.  It's the "no form, fit, or function" factor that users seem to randomly use.

    Still, we can overcome a whole lot of sins in the spreadsheet world.  Include one of those in your sample spreadsheets and I'll show you what I mean.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    As to password protected spreadsheets, I've not had to work on such things before.  I'll have to do a test to find out if the "ACE Drivers" have a provision for those.

    as far as I am aware ACE driver will not open a password protected Excel file.

  • @masterelaichi,

    I've not seen any recent replies on this thread lately.  I think I'm waiting on some small sample files from you but we've been back and forth on that so I'm not sure.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    My apologies, I have been unwell and accumulated a mountain of work these past couple of days. I am hoping to upload some sample datasets in the next couple of days. I will try and include whatever columns names possible so that it maintains confidentiality

     

  • No problem and totally understood.  I wanted to make sure that you weren't waiting on me.

    You're health and then you work come first.  Take your time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I have attached some sample files which I created using this website called mockaroo. The actual files are more varied than the ones I have attached as there are more "batches". Hopefully, this small sample set should give you an idea what I am working with

    thanks again for your help!

    • This reply was modified 2 years, 11 months ago by  masterelaichi.
    Attachments:
    You must be logged in to view attached files.
  • I sent you a PM here on the SSC site.  YOU NEED TO READ IT NOW!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 52 total)

You must be logged in to reply to this topic. Login to reply