Date Type ETL Question

Question

Post reply

Date Type ETL Question

Coffee_&_SQL

SSC Veteran

Points: 226
More actions
October 22, 2024 at 3:25 pm

#4471670

I'm not sure that I'm posting this in the right section, so please mods please move if that is the case.
Anyway, what are the best practices for ETL jobs that move Date fields? Let's say I'm going from Oracle to SQL Server. Is it best to land the data in SQL server in some sort of DateTime Data type field, such as Datetime2? I've recently run into some infrequent job failures due to data type mismatch.
I was wondering what the pros and cons are just landing the dateTime data in varchar fields in staging tables. This would be a workaround to the Datetime conversion issues from Oracle to SQL Server.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Martin Schoombee SSCoach Points: 19127 More actions · Answer 1

(see latest post)

This reply was modified 1 month ago by Martin Schoombee.

frederico_fonseca SSCoach Points: 16154 More actions · Answer 2

I'm my experience performance and validation was always better and more reliable when we converted dates (and numbers as well) to a string format (dates to YYYY-MM-DD HH:MM:SS.FFF and numbers to an standard format) IN ORACLE before transferring and load to stating/temp tables before moving to final destination.

this is particularly true if you are using linked servers or SSIS (standard dataflow) as implicit conversion from Oracle types to SQL/SSIS types is slower than the combination of (convert in oracle + transfer size + SQL/SSIS convert back to correct type)

This reply was modified 1 month ago by frederico_fonseca.

Martin Schoombee SSCoach Points: 19127 More actions · Answer 3

My personal preference is to land Dates (or Date/Time) fields as text (varchar) in the staging area if I don't trust the source...or if I anticipate possible conversion issues.

The pro is that your first (staging) step won't fail, and the con is that you'd have to deal with the conversions in your transformation step. I prefer that, because it's more time consuming (and expensive) to go back to the source than it is to account for the conversions in your transformation step.

Jeff Moden SSC Guru Points: 1004324 More actions · Answer 4

When dealing with untrusted data, you'll need to use things like TRY_CONVERT(). If it fails, it will return a NULL. Then you can go back to the original data and see what caused the failure.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Jones - SSC Editor SSC Guru Points: 736173 More actions · Answer 5

Like above, I almost always land in string formats because then a) I have the data, and b) I can convert to the proper data type and c), I can delete/handle error rows in the destination without messing with my ETL job.

Coffee_&_SQL SSC Veteran Points: 226 More actions · Answer 6

Coffee_&_SQL

SSC Veteran

Points: 226

October 22, 2024 at 5:45 pm

#4472239

Hey All,

Thank you for the answers!

Ed B Hall of Fame Points: 3666 More actions · Answer 7

I think that Oracle displays dates in a UK format with days before months, at least that's how it seems when extracting dates without formatting. We always use to_char with a format string to get dates out of Oracle. I assume you can add milliseconds, but I've never needed to.

cast(to_char(ULTIMEDATE, 'YYYY-MM-DD HH24:MI:SS') AS varchar(25)) ULDateTime

An example of why it is a good idea to import dates as strings is that different database platforms have different data type ranges. Before datetime2 existed I believe 1753-01-01 was the earliest datetime SQL Server could handle. We had an app with an Oracle back end that occasionally created dates over 1000 years old. This was bad data caused by the application wrapping future dates, however, Oracle knew it was a date, but SQL Server did not. Inserting into a datetime failed whereas inserting into a varchar allowed us to see the bad data and decide what to do.

P Jones SSChampion Points: 12361 More actions · Answer 8

For Oracle to SQL Server ETL we use

CAST(DOB AS DATE DEFAULT NULL ON CONVERSION ERROR) as "DOB"

in the select query and load to date field if no time required. This sorts out a lot of dud data in advance.

For fields with Time, datetime2 destination is required as Oracle date has a wider range than SQL server datetime.