Flat File with out headers

  • Hello,

    I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.

    Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.

    Thanks

  • darsha.shashank - Friday, July 14, 2017 9:41 AM

    Hello,

    I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.

    Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.

    Thanks

    Take the header out of file 1. Now you have 173 identically structured files.

    Use a FOREACH container to import them to a staging table.

    Run some SQL on your staging table to handle dupes.

    Then load from staging table to target table.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • darsha.shashank - Friday, July 14, 2017 9:41 AM

    Hello,

    I have flat file of 45GB which is been split into 173 files(each as 255mb approx). It has 750 columns and the first file contains headers, the rest doesn't have headers.

    Can anyone help me out to create a package to load 173 flat files into sql server table without any duplicates.

    Thanks

    Why did you split the one large 45GB file into 173 files?  I'm thinking that it there is absolutely no use to doing that.  Further, it also makes checking for dupes a whole lot more difficult.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So how'd things work out for you?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree with Jeff, are you splitting the file or is it arriving to you in this way? I see no need to split it.

    A simple PowerShell script can de-dupe any text file and retain the order of the lines before you even bother the database engine:
    From http://www.secretgeek.net/ps_duplicates:

    
    
    
    
    # define a new empty hash table

    $hash = @{}
    # read through input file, rawlist.txt, and write each unique line to a new file, newlist.txt

    gc c:\rawlist.txt | 
    >> %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} >
    >>
    c:\newlist.txt


    Just update the bolded items in the code snippet to try it out.

    If you stick with one big file a simple SSIS Package with a single Data Flow is your friend. If you have to contend with multiple files, the PowerShell will need to loop over your files and Phil outlined a good approach for handling multiple files in SSIS using a ForEach Container.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply