Please help on Logic

  • Hi,

    I need help on data level error handling.

    I have csv file with Data like

    First row is column header.

    --------------- CSV file Start

    Name,Number,Cratdate

    Ram,100,02/09/2012

    Smith,101,01/12/2012

    Abdul,102,A

    Adam,103,05/30/2011

    --------------CSV file end

    Here we can observe that there is data level issues in the Row with name Abdul, i.e., cratdate is not convertible.

    So as per my requirement I need to avoid the row which have issue and need to import remaining Data to main table. And the row which have issue should be taken care, i.e., May be imported to the another table or something..

    I need help on this what ever we used may be Query or SSIS.

    What I have only thing in my mind is using loop and ignoring that row.

    But it doesn't work if the data is huge..

    Please Help

    Ram..

    🙂

  • Personally I'd set up an SSIS package for this.

    Import the csv file into a staging table that contains the relevent datatypes and non NULLable columns and do any transformations and data cleansing in this table and use the destination error output path to redirect any errored rows (i.e. a Cratdate that is not a date format) to an error table that is structured to accept bad data (or you could use a flat file destination may be easier).

    In order to achieve the above you will not be able to use the Fast Load (Bulk Insert) into the staging table it will have to be done row by row.

    Then copy the cleansed data into your main table.

    I wouldn't choose to just ignore the rows that contain bad data, seems a little risky to me

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Few more options:

    BULK INSERT (Transact-SQL)

    http://msdn.microsoft.com/en-us/library/ms188365.aspx

    bcp Utility

    http://msdn.microsoft.com/en-us/library/ms162802.aspx

  • I am trying to workout using SSIS.. Let me get back to you if it works to me..

    🙂

  • Import to SQL

    Like this:

    select * from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};

    DefaultDir=C:\External;','select top 5 * from

    Mytext.txt')

    Or try this:

    OPENROWSET('Microsoft.Jet.OLEDB.4.0',

    'Text;Database=C:\External',

    'select * from TRANS.csv')

    select top 50 * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','Text;Database=c:\temp\captell;HDR=Yes;FORMAT=Delimited(;)', 'SELECT * FROM [file.TXT]')

    ____________________________________________________________________________
    Rafo*

  • That wouldn't handle the bad data tho :ermm:

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • but he can insert the csv file into a table variable and then manipulate the data

    ____________________________________________________________________________
    Rafo*

  • If you want to do it in the hand crafted T-SQL, you can do the following:

    Import all data into the staging table with reasonbaly maximum unrestricted varchar datatype for all columns. Then run validation query and import only valid rows into final destination table. Small setup for example:

    CREATE TABLE StagingData

    ( ID INT IDENTITY(1,1)

    ,Name VARCHAR(255)

    ,Number VARCHAR(255)

    ,Cratdate VARCHAR(255)

    )

    CREATE TABLE StagingDataError (ID INT, Error VARCHAR(100))

    -- Import data in bulk (eg. SSIS)

    INSERT StagingData SELECT 'Ram','100','02/09/2012'

    INSERT StagingData SELECT 'Smith','101','01/12/2012'

    INSERT StagingData SELECT 'Abdul','102','A'

    INSERT StagingData SELECT 'Adam','103','05/30/2011'

    INSERT StagingData SELECT 'To-Long-Name','103','05/30/2011'

    INSERT StagingData SELECT 'Bob','A','05/30/2011'

    -- Example of Validation:

    -- validate name length

    insert StagingDataError

    select ID, 'Invalid Name Length'

    from StagingData

    where len(Name) > 10

    -- validate number

    insert StagingDataError

    select ID, 'Invalid Number'

    from StagingData

    where ISNUMERIC(Number) = 0 -- please note: this is not complete valildation for number! just example

    -- validate date

    insert StagingDataError

    select ID, 'Invalid Cratdate'

    from StagingData

    where ISDATE(Cratdate) = 0 -- please note: this is not complete valildation for date! just example

    -- Load valid records into final destination

    --insert into FINALDESTINATIONTABLE

    select *

    from StagingData AS sd

    left join StagingDataError AS e

    on e.ID = sd.ID

    where e.ID is null

    -- show errors

    select * from StagingDataError

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin,

    You solution seems works for me..

    Still needs work on huge data and get back the status

    🙂

  • @rafo Thats true I hadn't thought of throwing it into a table variable - I suppose it depends on the volume of data how efficient that solution would be but could work 🙂

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • You shouldn’t use table variables for huge data... temporary tables or actual tables are better.

  • Hmm, can u prove it?, i mean,

    do you have an example and probe it what you say?

    ____________________________________________________________________________
    Rafo*

  • You can optimise the solution even further. Few tips:

    1. Create Error table on fly using SELECT INTO and UNION ALL

    2. You may find that before loading data to the final table, you need to pre-load valid data into temp (#) table (again using SELECT INTO).

    3. Use CLR functions for validation if needed instead of user-defined ones

    4. Yes, use #-tables not table variables! You may need to create additonal indexes on them...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • First example from Google Search...

    Temporary Tables vs. Table Variables and Their Effect on SQL Server Performance

    http://www.codeproject.com/Articles/18972/Temporary-Tables-vs-Table-Variables-and-Their-Effe

  • xRafo (2/9/2012)


    Hmm, can u prove it?, i mean,

    do you have an example and probe it what you say?

    Table variables have no statistics. That alone should be sufficient to make the point.

    -- Gianluca Sartori

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply