Import last CSV file to SQL Server by date (date inside the file)

  • I have a folder with some files whose contents I need to import to a table in SQL Server. Among those files I need to choose the lastest generated so I can import it to the database.

     

    My problem is that the date and hour I need to consider to select the lastest file is stored, respectively, in the first and second rows of the file.

     

    Below I have a small sample of the structure of the files (from the third line on is the data I need to import - "code," "name of the place," "number of stages," "percentage completed," "status"):

     

    01/12/2020;;;;
    10:09:19;;;;
    51211299;PLACE 1;50;89,2;Legalization
    31601338;PLACE 2;456;100;Finished
    32867934;PLACE 3;102;100;Finished
    32005071;PLACE 4;500;100;Finished
    37365156;PLACE 5;222;86,15;Stopped

     

    Any insight on how to do that will be very welcomed. Thank you all in advance!

  • I think you can grab the creation date of the file if you use a folder object and iterate over the files in PowerQuery. Then you could just remove the first two rows and import the rest, and it would be super simple. But if the date/time inside the file is different from the time stamp on the file, that would be a problem. =(

    • This reply was modified 3 years, 10 months ago by  pietlinden.
  • Using a script task - you can loop over the files, read the first 2 rows and build an array - sort the array as needed and return only the file that you want to process.  I would move the files that should not be processed to another folder - and have a post script task that archives the file processed (and possible rename to show it was processed).

    The other option is to just load all the files into staging tables - using a conditional split you can redirect the first 2 rows to a file header table with a file identifier (name of file) - and the other rows to a details table with the same file identifier.  Then - just process the 'latest' file for that batch into the destination/final table.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I followed the ideia given by Mr. Jeffrey Williams and now I have a collection of files in a dictionary ordered by value (date):

     

    using System.Linq;
    using System.IO;
    using System.Collections.Generic;


    public void Main()
    {
    string[] arquivos = Directory.GetFiles(Dts.Variables["User::PastaOrigem"].Value.ToString(), "*.csv");
    var listaArquivos = new Dictionary<string, DateTime>();

    foreach (string arquivo in arquivos)
    {
    DateTime dataHora = DateTime.Parse(File.ReadLines(@arquivo).First().Substring(0, 10) + " " + File.ReadLines(@arquivo).Skip(1).FirstOrDefault().Substring(0, 8));

    listaArquivos.Add(arquivo, dataHora);

    }

    // Ordena o dicionário com as informações dos arquivos pela data
    Dts.Variables["User::ColecaoArquivos"].Value = from a in listaArquivos orderby a.Value ascending select a;

    Dts.TaskResult = (int)ScriptResults.Success;
    }

     

    I created two user defined variables, one for the path to the folder (string) and other for the dictionary (object).

     

    This code is not the best but it gets the job done. I used Linq to read the first line and the second line, I then parse the values to a DateTime object.

  • Interesting approach - it doesn't have to be the best as long as it gets the job done.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • So, I had to redo my code because a dictionary wouldn't do it. I needed in fact a DataTable object.

     

    I added two variables to my script task (path to the folder and an object):

    using System.IO;
    using System.Linq;
    // I don't need System.Collections.Generic anymore


    public void Main()
    {
    string[] arquivos = Directory.GetFiles(Dts.Variables["User::CaminhoArquivo"].Value.ToString(), "*.csv");

    var dtArquivos = new DataTable();
    dtArquivos.Columns.Add("NomeArquivo"); // File name
    dtArquivos.Columns.Add("CriacaoArquivo"); // File's created date

    foreach (string arquivo in arquivos) // file in files
    {
    // The date format is a string like dd/mm/yyyy in the first line of the file
    // The hour format is a string like hh:mm:ss in the second line of the file
    DateTime dataHora = DateTime.Parse(File.ReadLines(@arquivo).First().Substring(0, 10) + " " + File.ReadLines(@arquivo).Skip(1).FirstOrDefault().Substring(0, 8));

    // Add a new row and content to the DataTable
    DataRow dr = dtArquivos.NewRow();
    dr["NomeArquivo"] = arquivo; // File name
    dr["CriacaoArquivo"] = dataHora; // File's created date

    dtArquivos.Rows.Add(dr);

    }

    // Sort by creation date
    dtArquivos.DefaultView.Sort = "CriacaoArquivo asc";

    // Pass the sorted DataTable to the user defined variable of type Object
    Dts.Variables["User::ColecaoArquivos"].Value = dtArquivos;

    Dts.TaskResult = (int)ScriptResults.Success;
    }
  • That is very similar to how I process files - but I am getting the date from the file name.  I also did it the long way - using an unsorted data table, creating a sorted data view - and then putting everything into a sorted data set to be returned.

    If I recall correctly - sorting would not work correctly on the data table, probably just the way I was trying to put it together.

    Just one note, I would recommend defining the data type for each column in the data table:

    var dtArquivos = new DataTable();

    dtArquivos.Columns.Add("NomeArquivo", typeof(string)); // File name

    dtArquivos.Columns.Add("CriacaoArquivo", typeof(Datetime)); // File's created date

    Not sure it really makes a difference...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply