Blog Post

Cleaning up bad dates–#SQLNewBlogger

,

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I got some data recently from an online service, MapMyRun.com, where I track my workout data. I’ve been doing this for years, but with some of the instability and security issues with services, I decided I need to periodically grab a copy of my data and load it.

This post describes an issue with cleaning up the date data.

When I tried the load from SSMS (Tasks->Load Flat File), I got errors if I attempted to insert into a date field. My goal is to use the pattern of loading to a staging table and then merging data into my main table, so I decided to just load into a staging table. When I did this, here is what I saw.

2020-08-27 12_30_53-~vs7264.sql - ARISTOTLE_SQL2017.way0utwest (ARISTOTLE_Steve (58))_ - Microsoft S

For a human, the dates make perfect sense. For a computer, however, translating this to a date via an implicit conversion doesn’t work well. No problem, I can fix this. I’ll replace the date:

SELECT top 10
CAST( REPLACE(swh.Date_Submitted, 'Aug.', 'Aug') AS DATE) AS SubmitDate
, *
FROM dbo.staging_workout_history AS swh

This works great.

2020-08-27 12_33_28-~vs7264.sql - ARISTOTLE_SQL2017.way0utwest (ARISTOTLE_Steve (58))_ - Microsoft S

However, if I scroll through the entire file, I find issues with other dates. Once I remote the top, I get this:

Msg 241, Level 16, State 1, Line 1
Conversion failed when converting date and/or time from character string.

Hmmm, there are other issues. I could remove my CAST and scroll through, but there’s an easier way. I’ll use TRY_CAST() instead. If I restructure my query, I can run it to completion.

2020-08-27 12_40_27-~vs7264.sql - ARISTOTLE_SQL2017.way0utwest (ARISTOTLE_Steve (58))_ - Microsoft S

Now I can scroll through. I could also add a test for getting NULL from TRY_CAST to find the problematic dates. Here’s where I see more problems.

2020-08-27 12_41_08-~vs7264.sql - ARISTOTLE_SQL2017.way0utwest (ARISTOTLE_Steve (58))_ - Microsoft S

It seems that whoever exported data, from whatever system, decided periods in months make sense. I can amend the REPLACE in this way, which should fix things.

SELECT
TRY_CAST( REPLACE(swh.Date_Submitted, '.', '') AS DATE) AS IsItADate
, *
FROM dbo.staging_workout_history AS swh
 

Now I see this seems to work

2020-08-27 12_46_22-~vs7264.sql - ARISTOTLE_SQL2017.way0utwest (ARISTOTLE_Steve (58))_ - Microsoft S

One More Problem

I’m glad I kept scrolling through a full year. That’s because I saw this:

2020-08-27 12_47_33-~vs7264.sql - ARISTOTLE_SQL2017.way0utwest (ARISTOTLE_Steve (58))_ - Microsoft S

Who thinks Sept is the abbreviation for September? Technically it is, and I see April, June, and July spelled out, so the inconsistency in this extract is bizarre to me. Perhaps there is a database that formats English month names like this and deals with conversions, but this seems like sloppy programming to include “April”, “Jan.”, and then “Sept.” in your data set.

In any case, I can add another REPLACE() to my code.

SELECT
TRY_CAST( REPLACE(
                   REPLACE(swh.Date_Submitted, '.', '')
                   , 'Sept', 'Sep') AS DATE) AS IsItADate
, *
FROM dbo.staging_workout_history AS swh
WHERE  TRY_CAST( REPLACE(
                   REPLACE(swh.Date_Submitted, '.', '')
                   , 'Sept', 'Sep') AS DATE)  IS null

This converts all the dates in my column to the expected value. Now I need to do the same thing for other date columns, change TRY_CAST() to CAST() and I can import data.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating