pre-validate csv to SQL load

  • before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number

    create a good file(records pass) and bad file(for review).

    THanks.

  • You could set up something in a data flow in SSIS to redirect the 'bad' rows.  Myself I like to load the CSV data to a 'work' table where all fields are defined as VARCHAR(500) or more if needed.  Then I'll use SQL to validate/redirect as needed.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • I did go the route of a staging table and defining them as varchar(255). Now how can I take that schema and data

    and validate against "Live" table field defs?

    If I do an Insert Into ?? from my staging table and cast\convert the different fields. If something fails how can I detect which row and field and not use a cursor as some of the loads could be big.

    Can Powershell come into play?

    Thanks.

     

  • The validation your are referring to has to be done manually, ie, create some queries which identify the issues you are concerned about and run them on the staged data.

    If you perform the load using SSIS, you can set up the components to redirect problematic rows to a different table.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • If you are using SQL, then when you do an insert into the SQL will fail when it finds a field that doesn't match the data type or the constraints you have on your target table.

    If you use SSIS you can redirect the 'bad' rows when they don't match certain look ups.  You may even be able to redirect the rows when it goes to put the data in the target table.

    You could even set up a script task to validate the data.

    Never done anything with powershell, so can't help you there.

     

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 wrote:

    You may even be able to redirect the rows when it goes to put the data in the target table.

    You mean if the INSERT fails? Yes, you can redirect these sorts of failures too.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Bruin wrote:

    before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number

    create a good file(records pass) and bad file(for review).

    THanks.

    I wouldn't do a separate pre-validation because it's a waste of time.  No matter what, you have to scan the CSV file.  That scan might as well also double as a load.  If the load has no errors, then you're all done.  If the load has errors, then you've still only done a single scan.

    There are various thoughts on how to do this.  Personally, I create a table with just exactly the correct datatypes, set error tolerance to 2 billion rows, set a BULK INSERT command (in T-SQL) to sequester the bad rows and errors, and do the load into a table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply