August 23, 2019 at 3:04 pm
before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number
create a good file(records pass) and bad file(for review).
THanks.
August 23, 2019 at 4:25 pm
You could set up something in a data flow in SSIS to redirect the 'bad' rows. Myself I like to load the CSV data to a 'work' table where all fields are defined as VARCHAR(500) or more if needed. Then I'll use SQL to validate/redirect as needed.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
August 23, 2019 at 5:29 pm
I did go the route of a staging table and defining them as varchar(255). Now how can I take that schema and data
and validate against "Live" table field defs?
If I do an Insert Into ?? from my staging table and cast\convert the different fields. If something fails how can I detect which row and field and not use a cursor as some of the loads could be big.
Can Powershell come into play?
Thanks.
August 23, 2019 at 6:59 pm
The validation your are referring to has to be done manually, ie, create some queries which identify the issues you are concerned about and run them on the staged data.
If you perform the load using SSIS, you can set up the components to redirect problematic rows to a different table.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 23, 2019 at 7:02 pm
If you are using SQL, then when you do an insert into the SQL will fail when it finds a field that doesn't match the data type or the constraints you have on your target table.
If you use SSIS you can redirect the 'bad' rows when they don't match certain look ups. You may even be able to redirect the rows when it goes to put the data in the target table.
You could even set up a script task to validate the data.
Never done anything with powershell, so can't help you there.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
August 23, 2019 at 7:06 pm
You may even be able to redirect the rows when it goes to put the data in the target table.
You mean if the INSERT fails? Yes, you can redirect these sorts of failures too.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 24, 2019 at 4:50 am
before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number
create a good file(records pass) and bad file(for review).
THanks.
I wouldn't do a separate pre-validation because it's a waste of time. No matter what, you have to scan the CSV file. That scan might as well also double as a load. If the load has no errors, then you're all done. If the load has errors, then you've still only done a single scan.
There are various thoughts on how to do this. Personally, I create a table with just exactly the correct datatypes, set error tolerance to 2 billion rows, set a BULK INSERT command (in T-SQL) to sequester the bad rows and errors, and do the load into a table.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply