October 22, 2024 at 3:25 pm
I'm not sure that I'm posting this in the right section, so please mods please move if that is the case.
Anyway, what are the best practices for ETL jobs that move Date fields? Let's say I'm going from Oracle to SQL Server. Is it best to land the data in SQL server in some sort of DateTime Data type field, such as Datetime2? I've recently run into some infrequent job failures due to data type mismatch.
I was wondering what the pros and cons are just landing the dateTime data in varchar fields in staging tables. This would be a workaround to the Datetime conversion issues from Oracle to SQL Server.
October 22, 2024 at 3:32 pm
(see latest post)
October 22, 2024 at 3:33 pm
I'm my experience performance and validation was always better and more reliable when we converted dates (and numbers as well) to a string format (dates to YYYY-MM-DD HH:MM:SS.FFF and numbers to an standard format) IN ORACLE before transferring and load to stating/temp tables before moving to final destination.
this is particularly true if you are using linked servers or SSIS (standard dataflow) as implicit conversion from Oracle types to SQL/SSIS types is slower than the combination of (convert in oracle + transfer size + SQL/SSIS convert back to correct type)
October 22, 2024 at 3:35 pm
My personal preference is to land Dates (or Date/Time) fields as text (varchar) in the staging area if I don't trust the source...or if I anticipate possible conversion issues.
The pro is that your first (staging) step won't fail, and the con is that you'd have to deal with the conversions in your transformation step. I prefer that, because it's more time consuming (and expensive) to go back to the source than it is to account for the conversions in your transformation step.
October 22, 2024 at 4:48 pm
When dealing with untrusted data, you'll need to use things like TRY_CONVERT(). If it fails, it will return a NULL. Then you can go back to the original data and see what caused the failure.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2024 at 4:59 pm
Like above, I almost always land in string formats because then a) I have the data, and b) I can convert to the proper data type and c), I can delete/handle error rows in the destination without messing with my ETL job.
October 22, 2024 at 5:45 pm
Hey All,
Thank you for the answers!
October 22, 2024 at 6:20 pm
I think that Oracle displays dates in a UK format with days before months, at least that's how it seems when extracting dates without formatting. We always use to_char with a format string to get dates out of Oracle. I assume you can add milliseconds, but I've never needed to.
cast(to_char(ULTIMEDATE, 'YYYY-MM-DD HH24:MI:SS') AS varchar(25)) ULDateTime
An example of why it is a good idea to import dates as strings is that different database platforms have different data type ranges. Before datetime2 existed I believe 1753-01-01 was the earliest datetime SQL Server could handle. We had an app with an Oracle back end that occasionally created dates over 1000 years old. This was bad data caused by the application wrapping future dates, however, Oracle knew it was a date, but SQL Server did not. Inserting into a datetime failed whereas inserting into a varchar allowed us to see the bad data and decide what to do.
October 29, 2024 at 9:25 am
For Oracle to SQL Server ETL we use
CAST(DOB AS DATE DEFAULT NULL ON CONVERSION ERROR) as "DOB"
in the select query and load to date field if no time required. This sorts out a lot of dud data in advance.
For fields with Time, datetime2 destination is required as Oracle date has a wider range than SQL server datetime.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply