loading data from staging table to final with validations to data specific

  • Good Morning,

    Have a Question, we have a FACETS feed it has about 50 columns in it. it is daily feed it get loads with all fields (Dates, Money, Strings) as varchar(500) to the table called STAGING_FACETS in sql server.

    so here I need to cleanse the data and loads to final table (PR_FACETS) . in this final table  data type specific columns, so  using a procedure (or any other way also fine),.

    here my question is

    before Insert/Update for date columns I am using ISDATE() check to make sure the source /stage has valid dates in it

    is there any alternative way to di this?

    also if there is any non date values in the column that need to e exclude from updating / inserting to target table.

    also here the main question is if there is any non dates values (except nulls, valid date values ) that need to be exclude and process the valid ones with out stopping / break the procedure. no hard break....

    any advise please?

    or any posts/ resources that I can refer?

    Thank you

    Areshmi

  • It may be that this could mostly be done during the original insert into the staging table.  In order to know that, we'd need to know what the "Facets Feed" looks like.  Can you post what the "record format" is and a couple of rows of the original data from the feed?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ISDATE() isn't perfect (https://stackoverflow.com/questions/32755924/isdate-in-not-working-as-expected-in-sql-server)

    You may need some other guard clauses or checks to be sure the value is a date. Unfortunately, dates are just not stored in a consistent format across systems.

  • You could use TRY_CAST or TRY_CONVERT to attempt to convert the value to an appropriate date data type (date, datetime).  You would have to build the code to attempt to convert the non-null values - and then determine what to do with those rows that cannot be converted.

    How is this data loaded to the staging table?  Is it through BCP or SSIS or something else?  It may be possible to modify that process to perform your data type validation - redirecting 'bad' rows to a separate table/file for further analysis and correction, but it really depends on how you want to process the data.

    Further to that - if you have control over how the FACETS file/feed is built then you can and should validate the data type and format on output.  With dates you should always output the data in a non-ambiguous format - the best option for SQL Server would be 'YYYYMMDDTHH:MM:SS.nnn' but you can also leave out the T and just use 'YYYYMMDD HH:MM:SS.nnn'.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Good Morning,

    Apologize for delay in reponse please see attached sample file ( i mocked little bit) so all these fileds need to be loaded to final table(same structure as stg except in final table they are relevant data formats where as in STG they all just strings )

     

    can you please advise the better way to process each column in terms of data types, for example the date fields how to validate them before loading to final table like wise how to check amounts fields to include them with pennies if any?

     

    Thank yoy in advance

    asiti

    Attachments:
    You must be logged in to view attached files.
  • added Jeff please help, sorry for delay as it is bit delay to get the file to me

    Thank you

  • We are planning to process with SSIS, but we are not married to any particular solution. can you please guide me for some sample / post that i can do this below exacty said by you. Thank you in advance. Asiti

     

    How is this data loaded to the staging table?  Is it through BCP or SSIS or something else?  It may be possible to modify that process to perform your data type validation - redirecting 'bad' rows to a separate table/file for further analysis and correction, but it really depends on how you want to process the data.

  • Having a quick look at what you attached, here's what I would do...

    I'd lay out a staging table with the correct columns with the correct datatypes.

    I'd then setup to do the imports using BULK INSERT along with using the error checking feature, which can be setup to sequester bad rows in a separate file for later repair without stopping the main load.  It can also tell you what's  wrong with the failed rows but that can require a little bit of a trick if you want it to be humanly readable.

    And, no... I wouldn't (and haven't yet) use SSIS for any of this.  You can get some pretty exquisite control over the process using a stored procedure and bulk insert.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What action do you wish to take if validation fails (eg, invalid date) for a particular row?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hello PHil,

     

    Would like to insert these records into separate table (so team can look into it later to check) we dont want to process that record at all. please let me know if you wish me to explain more.

     

    Thank you

    asiti

  • Thanks Jeff, sure I will work for SQL side only. no ssis

     

    in this case everyday the staging table ret refreshed (truncate and load from file as all columns nvarchar(500))

     

    so from here we need to check with final table and then load to final table (can be update or insert)

     

    Can you guide me any of resources to use bulk insert and sql procedure that you mentioned in your earlier solution

     

    Thank you

    Asiti

  • As you are familiar with SSIS, you will probably know that you can redirect errors to an error table.

    If your target staging table is set up with all of the expected data types, you should therefore be able to redirect all the INSERT failures without too much trouble.

    Of course, you will also want to know which column caused the error and what the error was. I can help you with that part, if this is how you would like to approach the problem.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you Phil.

     

    Would it be advisable to do it in SQL server itself, as we have some senior management they opted to use SQL server only (son interms of procedure / manual code etc)

     

    any advise? please

    Thanks

    Asiti

  • asita wrote:

    Thank you Phil.

    Would it be advisable to do it in SQL server itself, as we have some senior management they opted to use SQL server only (son interms of procedure / manual code etc)

    any advise? please

    Thanks

    Asiti

    To be sure, Phil is correct.  The same thing that I propose without SSIS could be done in SSIS.  I tend to be a bit anti-SSIS simply because of the way I've seen people abuse it but, done correctly, SSIS is an equally good option for this particular task.

    If you have senior management that would prefer not using SSIS, that's up to you and them.

    The key here is that you don't need a separate test for validating things like dates, etc.  It CAN be done by using the correct datatype in the table being loaded and any rows with such detectable errors can be sequestered by either method without it terminating the whole run.

    Pick one of the two methods and run with it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • removed this entry as it is posted to separate new post. apologize for multi post.

    Thanks

    Asiti

    • This reply was modified 3 years, 6 months ago by  asita.
    • This reply was modified 3 years, 6 months ago by  asita.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply