Date Type ETL Question

  • I'm not sure that I'm posting this in the right section, so please mods please move if that is the case.

    Anyway, what are the best practices for ETL jobs that move Date fields? Let's say I'm going from Oracle to SQL Server. Is it best to land the data in SQL server in some sort of DateTime Data type field, such as Datetime2? I've recently run into some infrequent job failures due to data type mismatch.

    I was wondering what the pros and cons are just landing the dateTime data in varchar fields in staging tables. This would be a workaround to the Datetime conversion issues from Oracle to SQL Server.

  • (see latest post)

  • I'm my experience performance and validation was always better and more reliable when we converted dates (and numbers as well) to a string format (dates to YYYY-MM-DD HH:MM:SS.FFF and numbers to an standard format) IN ORACLE before transferring and load to stating/temp tables before moving to final destination.

    this is particularly true if you are using linked servers or SSIS (standard dataflow) as implicit conversion from Oracle types to SQL/SSIS types is slower than the combination of (convert in oracle + transfer size + SQL/SSIS convert back to correct type)

  • My personal preference is to land Dates (or Date/Time) fields as text (varchar) in the staging area if I don't trust the source...or if I anticipate possible conversion issues.

    The pro is that your first (staging) step won't fail, and the con is that you'd have to deal with the conversions in your transformation step. I prefer that, because it's more time consuming (and expensive) to go back to the source than it is to account for the conversions in your transformation step.

  • When dealing with untrusted data, you'll need to use things like TRY_CONVERT().  If it fails, it will return a NULL.  Then you can go back to the original data and see what caused the failure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Like above, I almost always land in string formats because then a) I have the data, and b) I can convert to the proper data type and c), I can delete/handle error rows in the destination without messing with my ETL job.

     

  • Hey All,

    Thank you for the answers!

  • I think that Oracle displays dates in a UK format with days before months, at least that's how it seems when extracting dates without formatting. We always use to_char with a format string to get dates out of Oracle. I assume you can add milliseconds, but I've never needed to.

    cast(to_char(ULTIMEDATE, 'YYYY-MM-DD HH24:MI:SS') AS varchar(25)) ULDateTime

    An example of why it is a good idea to import dates as strings is that different database platforms have different data type ranges. Before datetime2 existed I believe 1753-01-01 was the earliest datetime SQL Server could handle. We had an app with an Oracle back end that occasionally created dates over 1000 years old. This was bad data caused by the application wrapping future dates, however, Oracle knew it was a date, but SQL Server did not. Inserting into a datetime failed whereas inserting into a varchar allowed us to see the bad data and decide what to do.

  • For Oracle to SQL Server ETL we use

    CAST(DOB AS DATE DEFAULT NULL ON CONVERSION ERROR) as "DOB"

    in the select query and load to date field if no time required. This sorts out a lot of dud data in advance.

    For fields with Time, datetime2 destination is required as Oracle date has a wider range than SQL server datetime.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply