Flat File with out headers

Question

Flat File with out headers

darsha.shashank

SSC Veteran

Points: 258
More actions
July 14, 2017 at 9:41 am

#338861

Hello,
I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.
Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.
Thanks

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246993 More actions · Answer 1

darsha.shashank - Friday, July 14, 2017 9:41 AM
Hello,
I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.
Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.
Thanks

Take the header out of file 1. Now you have 173 identically structured files.

Use a FOREACH container to import them to a staging table.

Run some SQL on your staging table to handle dupes.

Then load from staging table to target table.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 2

darsha.shashank - Friday, July 14, 2017 9:41 AM
Hello,
I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.
Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.
Thanks

Why did you split the one large 45GB file into 173 files? I'm thinking that it there is absolutely no use to doing that. Further, it also makes checking for dupes a whole lot more difficult.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 3

So how'd things work out for you?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 4

I agree with Jeff, are you splitting the file or is it arriving to you in this way? I see no need to split it.

A simple PowerShell script can de-dupe any text file and retain the order of the lines before you even bother the database engine:
From http://www.secretgeek.net/ps_duplicates:



# define a new empty hash table
$hash = @{}
# read through input file, rawlist.txt, and write each unique line to a new file, newlist.txt
gc c:\rawlist.txt | 
>> %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} > 
>> c:\newlist.txt

Just update the bolded items in the code snippet to try it out.

If you stick with one big file a simple SSIS Package with a single Data Flow is your friend. If you have to contend with multiple files, the PowerShell will need to loop over your files and Phil outlined a good approach for handling multiple files in SSIS using a ForEach Container.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato