July 13, 2014 at 11:01 pm
I have to import zip files every day from a vendors ftp site who leaves the last 7 days on the ftp site at all times. I run the etl every day and just need the latest zip file.
For eg.
Jun1
tbl1_060114.txt, tbl2_060114.txt, tbl3_060114.txt
Jun2
tbl1_060214.txt, tbl2_060214.txt, tbl3_060214.txt
Jun3
tbl1_060314.txt, tbl2_060314.txt, tbl3_060314.txt
and so on.
What task+approach do I need to take to get just the latest version of tbl1,tbl2,tbl3 from the ftp site? Other consideration: I must not accidentally reprocess a differential that I've already imported, and need to be able to handle for a situation where a database backup is lost and I have to reprocess 2 or more days worth of differentials in the right order.
Does this make sense?
July 13, 2014 at 11:27 pm
If you move a file to another folder after processing it, will the vendor put it back if it is within the seven-day period?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 14, 2014 at 12:32 am
The vendor makes a running 'last 7 days worth' of zipped differential files available on their ftp site, 3 per day (one per table). I just have to go get them before any one of them gets more than 7 days old.
Those are the only rules. I plan to get whatever they've just posted, on the day they've posted. This means that there will always be 6 other files, per table, that are still sitting there. I guess it doesn't matter to anybody right now whether I move or copy the files from the ftp source, so maybe I should just move files I'm processing. In that case I will just need the differential having a time stamp 1 day greater than the last differential I processed the day before...per table.
so simple yet so complex.
July 14, 2014 at 2:49 am
As per my understanding, below approach should be able to handle this:
1. Create a Table 'Time' which should have all the calendar dates and a Column 'IsProcessed' with flag 'Y' or 'N'
2.Create a User varible, to hold the value from below query:
Select TimeColumn from dbo.Time where IsProcessed Flag = 'N' and timecoumn < = Getdate()
This should give you all the dates for which the files are yet to be picked up. For e.g., 060114, 060214 etc.
3.Use a Foreach loop container using above mentioned variableto loop thorugh for all the required dates.
4. Create another Foreach loop container within the above mentioned for each loop container, to Pick up all the files for a given date. You can create variable(s) to dynamically generate the file names to pickup the required file from FTP server for each table.
5. Move the file to Archive folder just so that there is a copy available with you even after 7 days period.
6. After all the files are picked up for a particular date, the corresponding IsProcessed flag in Time table should be set to 'Y'.
In case, you have to reprocess previous day's data, then you can move the file to FTP server and change the Isprocessed Flag in Time table.
Just a high level idea what I could think of 🙂
July 14, 2014 at 4:18 am
I agree that keeping a record of the files that you've processed is best way to accommodate your requirements, but it doesn't have to as complex as the previous suggestion. You could simply pick up each file, check the file name against your ProcessedFiles (for example) table; if there is no matching entry, continue to process the file; if there is a matching entry, do nothing further and move on to the next file.
You can also record other potentially useful information in your ProcessedFiles table such as row counts and timing information. It is also useful as a basic audit table.
Regards
Lempster
July 14, 2014 at 8:29 am
Hi Marinder and Lempster, Thank you both. I think you're both in agreement actually, just that Marinder gave me concrete SSIS steps. Thanks for the details Marinder. I will ponder and get to work on POC!!! thanks so much..
July 14, 2014 at 4:17 pm
how would you control the order of processing. Diff on jun 1 has to be processed before diff on jun 2.
would you put a query in the second foreach loop to do something like:
select top 1
from TimeTbl
order by datecolumn asc
??
July 15, 2014 at 7:12 am
Sure, ORDER BY would be able to handle that scenario. and that Query would be required for First Foreach loop container.
Second foreach loop contaiiner should be used for looping through files for a particular date.
July 15, 2014 at 8:17 am
is it correct that in the second foreach loop container, the filename will be built up using an expression built using two variables: filepart and datepart?
July 15, 2014 at 12:18 pm
M, you think you could look at what I did on this post
http://www.sqlservercentral.com/Forums/Topic1592460-364-1.aspx?Update=1
July 16, 2014 at 1:59 am
That's correct. The complete filename (FileName+Datepart) should be built using expression builder.
I have also replied to other post you've referred to above.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply