July 14, 2017 at 9:41 am
Hello,
I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.
Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.
Thanks
July 14, 2017 at 10:15 am
darsha.shashank - Friday, July 14, 2017 9:41 AMHello,I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.
Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.
Thanks
Take the header out of file 1. Now you have 173 identically structured files.
Use a FOREACH container to import them to a staging table.
Run some SQL on your staging table to handle dupes.
Then load from staging table to target table.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 14, 2017 at 11:43 am
darsha.shashank - Friday, July 14, 2017 9:41 AMHello,I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.
Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.
Thanks
Why did you split the one large 45GB file into 173 files? I'm thinking that it there is absolutely no use to doing that. Further, it also makes checking for dupes a whole lot more difficult.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2017 at 8:48 pm
So how'd things work out for you?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2017 at 11:59 pm
I agree with Jeff, are you splitting the file or is it arriving to you in this way? I see no need to split it.
A simple PowerShell script can de-dupe any text file and retain the order of the lines before you even bother the database engine:
From http://www.secretgeek.net/ps_duplicates:
# define a new empty hash table
$hash = @{}
# read through input file, rawlist.txt, and write each unique line to a new file, newlist.txt
gc c:\rawlist.txt |
>> %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} >
>> c:\newlist.txt
Just update the bolded items in the code snippet to try it out.
If you stick with one big file a simple SSIS Package with a single Data Flow is your friend. If you have to contend with multiple files, the PowerShell will need to loop over your files and Phil outlined a good approach for handling multiple files in SSIS using a ForEach Container.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply