Guaranteed Varchar to Date conversion

  • I regularly load in multiple files using SSIS into a staging table and then onto another table (I call them trans as they are never fixed).   The data goes into stage as it comes out of the file or other source. This way I can always prove I took in what I was supplied.

    Stage_policy may contain many date fields and the source file is forever changing format of the dates (dont ask why - they just do and will no doubt always carry on doing this). When I move the date to the trans_policy table I want to guarantee that I get the change from Varchar to date correct. In the past we have had files with what we though was UK layout ie dd/mm/yyyy and found out they had changed to MM/DD/YYYY.

    Is there a published stored proc anywhere that runs through these columns and checks where the days, months, and years are in the string.

    I am about to start writing but as always if there are good pointers out there that would help. I have had a good look but could not find anything.

    Oh and SQL Server 2014.

    Cheers

    E

  • Not sure exactly what you're after here - please supply some sample data and expected results.  You can use TRY_CONVERT to verify whether a string can be converted to a valid date.  Whether that date is the correct one is a different matter - the only way to distinguish between '06/07/2019' and '07/06/2019', I suppose, is to trust the supplier of the data to use a consistent format.

    John

  • If you load them into a staging table with the date stored as varchar, you can go through the date column looking at the maximum value of substring(date,1,2) and substring(date,4,2). Hopefully, you will have at least one date where the day-part is greater than 12 so you will be able to deduce the format of the date for conversion.

  • Take a look at this post where a present a solution for a similar issue.

    You can take this a step further, and create a iTVF for future use

  • So, I spent some time creating a function that will successfully convert *MOST* string formats to DATETIME.  This has only been tested in a us-en environment.

    CREATE FUNCTION dbo.iTVF_StringToDate (
    @strDate nvarchar(100)
    )
    RETURNS TABLE WITH SCHEMABINDING
    AS
    RETURN
    SELECT DateResult = CASE
    WHEN @strDate LIKE '%[a-z][a-z][a-z]%' THEN TRY_CONVERT(datetime, @strDate) -- 02 Oct 2019 20:26:45:260

    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[0][1-9]-[0-3][0-9]T%' THEN TRY_CONVERT(datetime, @strDate, 127) -- yyyy-mm-ddThh:mi:ss
    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[1][0-2]-[0-3][0-9]T%' THEN TRY_CONVERT(datetime, @strDate, 127) -- yyyy-mm-ddThh:mi:ss

    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[0][1-9]-[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 121) -- yyyy-mm-dd hh:mi:ss
    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[1][0-2]-[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 121) -- yyyy-mm-dd hh:mi:ss

    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]/[0][1-9]/[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 111) -- yyyy/mm/dd
    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]/[1][0-2]/[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 111) -- yyyy/mm/dd

    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9].[0][1-9].[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 102) -- yyyy.mm.dd
    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9].[1][0-2].[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 102) -- yyyy.mm.dd

    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9][0][1-9][0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 112) -- yyyymmdd
    WHEN @strDate LIKE '[0-9][0-9][0-9][0-9][1][0-2][0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 112) -- yyyymmdd

    WHEN @strDate LIKE '[0-9][0-9]-[1][3-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy (dd = 13-19)
    WHEN @strDate LIKE '[0-9][0-9]-[2][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy (dd = 20-29)
    WHEN @strDate LIKE '[0-9][0-9]-[3][0-1]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy (dd = 30-31)

    WHEN @strDate LIKE '[0-9][0-9]/[1][3-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 101) -- mm/dd/yyyy (dd = 13/19)
    WHEN @strDate LIKE '[0-9][0-9]/[2][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 101) -- mm/dd/yyyy (dd = 20/29)
    WHEN @strDate LIKE '[0-9][0-9]/[3][0-1]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 101) -- mm/dd/yyyy (dd = 30/31)


    WHEN @strDate LIKE '[1][3-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 105) -- dd-mm-yyyy (dd = 13-19)
    WHEN @strDate LIKE '[2][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 105) -- dd-mm-yyyy (dd = 20-29)
    WHEN @strDate LIKE '[3][0-1]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 105) -- dd-mm-yyyy (dd = 30-31)

    WHEN @strDate LIKE '[1][3-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy (dd = 13-19)
    WHEN @strDate LIKE '[2][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy (dd = 20-29)
    WHEN @strDate LIKE '[3][0-1]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy (dd = 30-31)

    WHEN @strDate LIKE '[0-3][0-9].[0][1-9].[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 104) -- dd.mm.yyyy
    WHEN @strDate LIKE '[0-3][0-9].[1][0-2].[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 104) -- dd.mm.yyyy

    --==========================================================================================================================
    -- This is where there is a big possibilty of getting the wrong date, as we do not know whether the dates
    -- are Month-Day or Day-Month. So, rather return NULL, and let the user do the conversion manually.
    --==========================================================================================================================
    --WHEN @strDate LIKE '[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy
    --WHEN @strDate LIKE '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy
    --==========================================================================================================================
    ELSE NULL
    END;
    GO

     

    Now create a table with various string date formats, and cross apply the function to do the convertsion

    IF OBJECT_ID(N'tempdb..#DateFormats', N'U') IS NOT NULL
    BEGIN
    DROP TABLE #DateFormats;
    END;

    CREATE TABLE #DateFormats (
    FmtID int NOT NULL
    , FmtString varchar(40) NOT NULL
    , strDate AS CONVERT(nvarchar(100), GETDATE(), FmtID)
    );

    INSERT INTO #DateFormats ( FmtID, FmtString )
    VALUES
    ( 106, 'dd mon yyyy' )
    , ( 113, 'dd mon yyyy hh:mi:ss:mmm (24h)' )
    , ( 100, 'mon dd yyyy hh:miAM' )
    , ( 107, 'Mon dd, yyyy' )
    , ( 109, 'mon dd yyyy hh:mi:ss:mmmAM' )
    , ( 126, 'yyyy-mm-ddThh:mi:ss.mmm' )
    , ( 127, 'yyyy-mm-ddThh:mi:ss.mmmZ' )
    , ( 23, 'yyyy-mm-dd' )
    , ( 120, 'yyyy-mm-dd hh:mi:ss (24h)' )
    , ( 121, 'yyyy-mm-dd hh:mi:ss.mmm (24h)' )
    , ( 111, 'yyyy/mm/dd' )
    , ( 102, 'yyyy.mm.dd' )
    , ( 112, 'yyyymmdd' )
    , ( 104, 'dd.mm.yyyy' )

    -- These 4 formats have the potential to return incorrect values (4 June vs 6 May)
    -- So the function will return NULL. You could easily modify the function to assume
    -- the format that is most common in your environment.
    , ( 101, 'mm/dd/yyyy' )
    , ( 103, 'dd/mm/yyyy' )
    , ( 105, 'dd-mm-yyyy' )
    , ( 110, 'mm-dd-yyyy' );

    SELECT *
    FROM #DateFormats AS src
    CROSS APPLY dbo.iTVF_StringToDate(src.strDate) AS conv;

     

  • Another version, trying to do the least prep work possible (although for determining day or month first, the entire staging table may need to be scanned):

    --Assumes that all date strings for a given column in that specific 
    -- staging table iteration are consistent.
    --Recognized formats, where [<delim>] is an optional delim char:
    --YYYY[<delim>]MM[<delim>]DD --for dates from 2013 on.
    --MM[<delim>]DD[<delim>]YYYY
    --DD[<delim>]MM[<delim>]YYYY
    --[<assumedOK>]mmm[<assumedOK>]
    --Unrecognized format(s):
    --YYYY[<delim>]DD[<delim>]MM --if this format is possible, let me know.

    DECLARE @byte_to_check int
    DECLARE @date_column varchar(30)
    DECLARE @date_conversion_code int
    DECLARE @date_delim_found bit
    DECLARE @date_format varchar(10)
    DECLARE @day_first_in_date bit

    SELECT TOP (1) @date_column = date_column
    FROM dbo.stage_policy

    IF @date_column LIKE '20[1][3456789]%'
    OR @date_column LIKE '20[23456789]%'
    BEGIN
    SET @date_format = 'YYYY'
    /*assumed format is MM next, never DD next*/
    /*
    IF SUBSTRING(@date_format, 5, 1) LIKE '[0-9]'
    BEGIN
    SET @date_delim_found = 0
    SET @byte_to_check = 5
    END /*IF*/
    ELSE
    BEGIN
    SET @date_delim_found = 1
    SET @byte_to_check = 6
    END /*ELSE*/
    */
    END /*IF*/
    ELSE
    IF @date_column LIKE '%[a-z]%'
    BEGIN
    SET @date_format = 'mmm'
    SET @byte_to_check = NULL
    END /*IF*/
    ELSE
    BEGIN
    IF SUBSTRING(@date_format, 3, 1) LIKE '[0-9]'
    BEGIN
    SET @date_delim_found = 0
    SET @byte_to_check = 3
    END /*IF*/
    ELSE
    BEGIN
    SET @date_delim_found = 1
    SET @byte_to_check = 4
    END /*ELSE*/
    END /*ELSE*/

    IF @byte_to_check > 0
    BEGIN
    SET @date_column = NULL
    SELECT TOP (1) @date_column = date_column
    FROM dbo.stage_policy
    WHERE SUBSTRING(date_column, @byte_to_check, 2) LIKE '1[3456789]%' OR
    SUBSTRING(date_column, @byte_to_check, 2) LIKE '[23]%'
    IF @date_column IS NULL
    SET @day_first_in_date = 0
    ELSE
    SET @day_first_in_date = 1
    END /*IF*/

    SET @date_conversion_code = CASE
    WHEN @date_format = 'mmm' THEN NULL
    WHEN @date_format = 'YYYY' THEN
    CASE WHEN @date_delim_found = 1 THEN 102 ELSE 112 END
    WHEN @day_first_in_date = 0 AND @date_delim_found = 1 THEN 101
    WHEN @day_first_in_date = 1 AND @date_delim_found = 1 THEN 103
    WHEN @day_first_in_date = 0 THEN 1001 /*DDMMYYYY*/
    ELSE 1002 END /*MMDDYYYY*/

    INSERT INTO dbo.policy ( date_column, ... )
    SELECT
    CASE WHEN @date_conversion_code IS NULL
    THEN CAST(date_column AS date)
    WHEN @date_conversion_code = 1001
    THEN SUBSTRING(date_column, 5, 4) + SUBSTRING(date_column, 3, 2) + LEFT(date_column, 2)
    WHEN @date_conversion_code = 1002
    THEN SUBSTRING(date_column, 5, 4) + LEFT(date_column, 2) + SUBSTRING(date_column, 3, 2)
    ELSE CONVERT(date, date_column, @date_conversion_code) END AS date_column,
    ....
    FROM dbo.stage_policy

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Any format containing yyyy-mm-dd can also throw you for a loop if the data had French origins because SQL Server uses YYYY-DD-MM with the French Language.

    There is NO WAY to ascertain the correct position of mm and dd from unknown sources even if you use the main ISO format of YYYYMMDD (with no delimiters) because you simply can't trust the source.  The only way to pull this off is to negotiate with the source just exactly what the format will be and then they need to stick with it.  Anything else is a pure crap shoot.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Any format containing yyyy-mm-dd can also throw you for a loop if the data had French origins because SQL Server uses YYYY-DD-MM with the French Language.

    There is NO WAY to ascertain the correct position of mm and dd from unknown sources even if you use the main ISO format of YYYYMMDD (with no delimiters) because you simply can't trust the source.  The only way to pull this off is to negotiate with the source just exactly what the format will be and then they need to stick with it.  Anything else is a pure crap shoot.

    I was going to reply with the same idea (but mine would not be so perspicuous ) 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • when the source data incorporates date ranges that span a few month they it is sometimes possible to analyse the data as part of the load process and see which conversion formula works for the majority of the dates in a single column - and then use that one for all dates on that column,

    Still not a full guarantee that the format is indeed the one used at source.

    If dates are normally on a short interval then it is impossible to guarantee which was the format used as mentioned before

     

  • Thanks all,

    that's going to take me some time to analyse. The source data is just terrible. They have been swapping the format of the dates on almost a daily basis. Currently they are supplying them in two different formats in the same file.

    Generally I do not get into this position but thought it would be handy to have a trick handy for the next time this happens.

    Lot of interesting reading.

    Many Many Thanks

    E

  • Ells wrote:

    Thanks all,

    that's going to take me some time to analyse. The source data is just terrible. They have been swapping the format of the dates on almost a daily basis. Currently they are supplying them in two different formats in the same file.

    Generally I do not get into this position but thought it would be handy to have a trick handy for the next time this happens.

    Lot of interesting reading.

    Many Many Thanks

    E

    I would suggest

    • Add a new DATETIME field to your staging table
    • Using the function that I supplied, update the new field.
    • Manually work through the records where the new field is still NULL

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply