pre-validate csv to SQL load

Question

pre-validate csv to SQL load

Bruin

SSCrazy

Points: 2085
More actions
August 23, 2019 at 3:04 pm

#3673985

before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number
create a good file(records pass) and bad file(for review).
THanks.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

below86 SSChampion Points: 11553 More actions · Answer 1

You could set up something in a data flow in SSIS to redirect the 'bad' rows. Myself I like to load the CSV data to a 'work' table where all fields are defined as VARCHAR(500) or more if needed. Then I'll use SQL to validate/redirect as needed.

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

Bruin SSCrazy Points: 2085 More actions · Answer 2

I did go the route of a staging table and defining them as varchar(255). Now how can I take that schema and data

and validate against "Live" table field defs?

If I do an Insert Into ?? from my staging table and cast\convert the different fields. If something fails how can I detect which row and field and not use a cursor as some of the loads could be big.

Can Powershell come into play?

Thanks.

Phil Parkin SSC Guru Points: 246991 More actions · Answer 3

The validation your are referring to has to be done manually, ie, create some queries which identify the issues you are concerned about and run them on the staged data.

If you perform the load using SSIS, you can set up the components to redirect problematic rows to a different table.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

below86 SSChampion Points: 11553 More actions · Answer 4

If you are using SQL, then when you do an insert into the SQL will fail when it finds a field that doesn't match the data type or the constraints you have on your target table.

If you use SSIS you can redirect the 'bad' rows when they don't match certain look ups. You may even be able to redirect the rows when it goes to put the data in the target table.

You could even set up a script task to validate the data.

Never done anything with powershell, so can't help you there.

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

Phil Parkin SSC Guru Points: 246991 More actions · Answer 5

below86 wrote:

You may even be able to redirect the rows when it goes to put the data in the target table.

You mean if the INSERT fails? Yes, you can redirect these sorts of failures too.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 6

Bruin wrote:

before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number
create a good file(records pass) and bad file(for review).
THanks.

I wouldn't do a separate pre-validation because it's a waste of time. No matter what, you have to scan the CSV file. That scan might as well also double as a load. If the load has no errors, then you're all done. If the load has errors, then you've still only done a single scan.

There are various thoughts on how to do this. Personally, I create a table with just exactly the correct datatypes, set error tolerance to 2 billion rows, set a BULK INSERT command (in T-SQL) to sequester the bad rows and errors, and do the load into a table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)