January 11, 2021 at 9:04 pm
I have a folder with some files whose contents I need to import to a table in SQL Server. Among those files I need to choose the lastest generated so I can import it to the database.
My problem is that the date and hour I need to consider to select the lastest file is stored, respectively, in the first and second rows of the file.
Below I have a small sample of the structure of the files (from the third line on is the data I need to import - "code," "name of the place," "number of stages," "percentage completed," "status"):
01/12/2020;;;;
10:09:19;;;;
51211299;PLACE 1;50;89,2;Legalization
31601338;PLACE 2;456;100;Finished
32867934;PLACE 3;102;100;Finished
32005071;PLACE 4;500;100;Finished
37365156;PLACE 5;222;86,15;Stopped
Any insight on how to do that will be very welcomed. Thank you all in advance!
January 11, 2021 at 10:09 pm
I think you can grab the creation date of the file if you use a folder object and iterate over the files in PowerQuery. Then you could just remove the first two rows and import the rest, and it would be super simple. But if the date/time inside the file is different from the time stamp on the file, that would be a problem. =(
January 11, 2021 at 10:56 pm
Using a script task - you can loop over the files, read the first 2 rows and build an array - sort the array as needed and return only the file that you want to process. I would move the files that should not be processed to another folder - and have a post script task that archives the file processed (and possible rename to show it was processed).
The other option is to just load all the files into staging tables - using a conditional split you can redirect the first 2 rows to a file header table with a file identifier (name of file) - and the other rows to a details table with the same file identifier. Then - just process the 'latest' file for that batch into the destination/final table.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
January 12, 2021 at 6:50 pm
I followed the ideia given by Mr. Jeffrey Williams and now I have a collection of files in a dictionary ordered by value (date):
using System.Linq;
using System.IO;
using System.Collections.Generic;
public void Main()
{
string[] arquivos = Directory.GetFiles(Dts.Variables["User::PastaOrigem"].Value.ToString(), "*.csv");
var listaArquivos = new Dictionary<string, DateTime>();
foreach (string arquivo in arquivos)
{
DateTime dataHora = DateTime.Parse(File.ReadLines(@arquivo).First().Substring(0, 10) + " " + File.ReadLines(@arquivo).Skip(1).FirstOrDefault().Substring(0, 8));
listaArquivos.Add(arquivo, dataHora);
}
// Ordena o dicionário com as informações dos arquivos pela data
Dts.Variables["User::ColecaoArquivos"].Value = from a in listaArquivos orderby a.Value ascending select a;
Dts.TaskResult = (int)ScriptResults.Success;
}
I created two user defined variables, one for the path to the folder (string) and other for the dictionary (object).
This code is not the best but it gets the job done. I used Linq to read the first line and the second line, I then parse the values to a DateTime object.
January 12, 2021 at 8:11 pm
Interesting approach - it doesn't have to be the best as long as it gets the job done.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
January 12, 2021 at 9:03 pm
So, I had to redo my code because a dictionary wouldn't do it. I needed in fact a DataTable object.
I added two variables to my script task (path to the folder and an object):
using System.IO;
using System.Linq;
// I don't need System.Collections.Generic anymore
public void Main()
{
string[] arquivos = Directory.GetFiles(Dts.Variables["User::CaminhoArquivo"].Value.ToString(), "*.csv");
var dtArquivos = new DataTable();
dtArquivos.Columns.Add("NomeArquivo"); // File name
dtArquivos.Columns.Add("CriacaoArquivo"); // File's created date
foreach (string arquivo in arquivos) // file in files
{
// The date format is a string like dd/mm/yyyy in the first line of the file
// The hour format is a string like hh:mm:ss in the second line of the file
DateTime dataHora = DateTime.Parse(File.ReadLines(@arquivo).First().Substring(0, 10) + " " + File.ReadLines(@arquivo).Skip(1).FirstOrDefault().Substring(0, 8));
// Add a new row and content to the DataTable
DataRow dr = dtArquivos.NewRow();
dr["NomeArquivo"] = arquivo; // File name
dr["CriacaoArquivo"] = dataHora; // File's created date
dtArquivos.Rows.Add(dr);
}
// Sort by creation date
dtArquivos.DefaultView.Sort = "CriacaoArquivo asc";
// Pass the sorted DataTable to the user defined variable of type Object
Dts.Variables["User::ColecaoArquivos"].Value = dtArquivos;
Dts.TaskResult = (int)ScriptResults.Success;
}
January 12, 2021 at 9:54 pm
That is very similar to how I process files - but I am getting the date from the file name. I also did it the long way - using an unsorted data table, creating a sorted data view - and then putting everything into a sorted data set to be returned.
If I recall correctly - sorting would not work correctly on the data table, probably just the way I was trying to put it together.
Just one note, I would recommend defining the data type for each column in the data table:
var dtArquivos = new DataTable();
dtArquivos.Columns.Add("NomeArquivo", typeof(string)); // File name
dtArquivos.Columns.Add("CriacaoArquivo", typeof(Datetime)); // File's created date
Not sure it really makes a difference...
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply