Validate date stored as a string as a complete, valid date

  • I have an upstream table where birth_date is stored as a string, YYYYMMMDD.  The data is very dirty. 

    I need to convert it to a date value, only if it is a complete date, otherwise NULL.  No imputed dates, i.e. assuming MMDD = 01-01, etc.  No load errors due to the dirty data.

    I wish there was an option to ISDATE that would validate for a complete date w/o imputation.

    Is the below the best approach?  This feel kludgy to me but I can't think of a better way.

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,('  2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT s
      ,ISDATE(s) AS IsDate
      ,CASE
        WHEN(LEN(LTRIM(s)) != 8) THEN NULL
        WHEN(ISDATE(s)=0) THEN NULL
        ELSE CAST(s AS DATE)  -- or CONVERT(DATE,s,112) ???
       END AS MyDate
    FROM @t

  • Here is another way of writing the logic
    😎


    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,('20180228')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT X.s
    ,ISDATE(s) AS IsDate
    ,DATEADD(DAY,1 - NULLIF(1 + SIGN(DATALENGTH(TRIM(X.s)) - 8),0),TRY_CONVERT(DATE,X.s,112)) AS MyDate
    FROM @t X;

  • If you're on SQL 2016 (or even SQL 2012), just use TRY_CAST:


    SELECT s
    ,TRY_CAST(s AS date) AS MyDate
    FROM @t

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I'm not sure how you're populating your upstream table but, if you're using BCP or BULK INSERT, learn how to use the error parameters to simply sequester the rows with the bad dates into a rework file.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson - Tuesday, December 18, 2018 12:55 AM

    Here is another way of writing the logic
    😎


    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,('20180228')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT X.s
    ,ISDATE(s) AS IsDate
    ,DATEADD(DAY,1 - NULLIF(1 + SIGN(DATALENGTH(TRIM(X.s)) - 8),0),TRY_CONVERT(DATE,X.s,112)) AS MyDate
    FROM @t X;

    Hi @Eirikur,

    An appropriate emoji 😉

    I'm reminded of a previous life when I used to view obfuscated Perl.  Hmmm, I'll have to think about this, and the poor developer who may have to support this code down the road.

    But it looks to me like it's using the same approach as mine, i.e. using LEN or DATALENGTH as a proxy to mean "a complete date" (assuming it then converts as a date).  As before, I wish there was a way to coax ISDATE to only return 1 only if the input is a complete date w/o further imputation.

    I certainly need to hit the doc for SIGN, DATALENGTH, and TRY_CONVERT, so thanks for that.

    Scott

  • ScottPletcher - Tuesday, December 18, 2018 11:12 AM

    If you're on SQL 2016 (or even SQL 2012), just use TRY_CAST:


    SELECT s
    ,TRY_CAST(s AS date) AS MyDate
    FROM @t

    Hi Scott,

    Your parents named you well 😀

    I'll hit the doc on TRY_CAST, but that in and of itself does not give me the desired results:

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT s
    ,ISDATE(s) AS IsDate
    ,CASE
      WHEN(LEN(LTRIM(s)) != 8) THEN NULL
      WHEN(ISDATE(s)=0) THEN NULL
      ELSE CAST(s AS DATE) -- or CONVERT(DATE,s,112) ???
     END AS MyDate
    ,TRY_CAST(s AS DATE) AS ScottPDate
    FROM @t

    See rows 3 & 5.

    Scott

  • Jeff Moden - Tuesday, December 18, 2018 12:26 PM

    I'm not sure how you're populating your upstream table but, if you're using BCP or BULK INSERT, learn how to use the error parameters to simply sequester the rows with the bad dates into a rework file.

    Hi Jeff,

    I'm not populating the upstream table, another team is.

    The data is health data, and I suspect (this is just conjecture) that the hospitals send flat files to my government department, and the proverbial "they" are using BCP, BULK INSERT, or some similar means to import the data as-is.

    We seem to have a culture here that we should retain the garbage so we "know what was sent", rather than cleanse the data.  Or perhaps "they" are just lazy.  Anyway, this leaves the downstream developer or analyst to deal with the garbage.

    This is the same data for which I posted https://www.sqlservercentral.com/Forums/2011818/Request-code-review-for-UDF-ContainsControlChar-and-CleanString, on which you've commented.

    Scott

  • Scott In Sydney - Thursday, December 20, 2018 11:30 PM

    Hi Scott,
    Your parents named you well 😀

    I'll hit the doc on TRY_CAST, but that in and of itself does not give me the desired results:

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT s
    ,ISDATE(s) AS IsDate
    ,CASE
      WHEN(LEN(LTRIM(s)) != 8) THEN NULL
      WHEN(ISDATE(s)=0) THEN NULL
      ELSE CAST(s AS DATE) -- or CONVERT(DATE,s,112) ???
     END AS MyDate
    ,TRY_CAST(s AS DATE) AS ScottPDate
    FROM @t

    See rows 3 & 5.

    Scott

    Try combining the two:

    SELECT s
    ,ISDATE(s) AS IsDate
    ,CASE
        WHEN LTRIM(s) NOT LIKE '[1-2][0-9][0-9][0-9][1-2][0-9][1-3][0-9]' THEN CAST(NULL AS date)
        ELSE TRY_CAST(s AS date)
    END AS MyDate
    FROM @t

    John

  • Scott In Sydney - Thursday, December 20, 2018 11:24 PM

    Eirikur Eiriksson - Tuesday, December 18, 2018 12:55 AM

    Here is another way of writing the logic
    😎


    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,('20180228')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT X.s
    ,ISDATE(s) AS IsDate
    ,DATEADD(DAY,1 - NULLIF(1 + SIGN(DATALENGTH(TRIM(X.s)) - 8),0),TRY_CONVERT(DATE,X.s,112)) AS MyDate
    FROM @t X;

    Hi @Eirikur,

    An appropriate emoji 😉

    I'm reminded of a previous life when I used to view obfuscated Perl.  Hmmm, I'll have to think about this, and the poor developer who may have to support this code down the road.

    But it looks to me like it's using the same approach as mine, i.e. using LEN or DATALENGTH as a proxy to mean "a complete date" (assuming it then converts as a date).  As before, I wish there was a way to coax ISDATE to only return 1 only if the input is a complete date w/o further imputation.

    I certainly need to hit the doc for SIGN, DATALENGTH, and TRY_CONVERT, so thanks for that.

    Scott

    Added some comments to explain the code
    😎


    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,('20180228')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')
    ;
    SELECT
     X.s
     -- ISDATE returns 1 for both full dates and year only
    ,ISDATE(s) AS IsDate
    -- TRY_CONVERT follows the same logic as ISDATE, if the input is year then
    -- it "produces" 1st Jan of the year.
    ,TRY_CONVERT(DATE,X.s,112) AS TC_OUT
    -- Get the length of the column value
    ,DATALENGTH(TRIM(X.s)) AS DLEN
    -- Return 0 if 8 characters, otherwise -1
    ,SIGN(DATALENGTH(TRIM(X.s)) - 8) AS Is8Char
    -- Shift the sign by + 1
    ,1 + SIGN(DATALENGTH(TRIM(X.s)) - 8) AS Is8CharShift
    -- Produce a NULL output if the length does not match
    ,NULLIF(1 + SIGN(DATALENGTH(TRIM(X.s)) - 8),0) AS IsNullIfShort
    -- Pulling it all together, the DATEADD will either add 0 days or return NULL
    -- if the input is too short
    ,DATEADD(DAY,1 - NULLIF(1 + SIGN(DATALENGTH(TRIM(X.s)) - 8),0),TRY_CONVERT(DATE,X.s,112)) AS MyDate
    FROM @t X;


  • SELECT s
        s2
        ,TRY_CAST(s2 AS DATE) AS ScottPDate
    FROM @t
    CROSS APPLY (
        SELECT LTRIM(RTRIM(s)) AS s1
    ) AS ca1
    CROSS APPLY (
        SELECT CASE WHEN LEN(s1) = 4 THEN s1 + '0101'
            WHEN LEN(s1) = 6 THEN s1 + '01'
            ELSE s1 END AS s2
    ) AS ca2

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Scott In Sydney - Thursday, December 20, 2018 11:30 PM

    ScottPletcher - Tuesday, December 18, 2018 11:12 AM

    If you're on SQL 2016 (or even SQL 2012), just use TRY_CAST:


    SELECT s
    ,TRY_CAST(s AS date) AS MyDate
    FROM @t

    Hi Scott,

    Your parents named you well 😀

    I'll hit the doc on TRY_CAST, but that in and of itself does not give me the desired results:

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT s
    ,ISDATE(s) AS IsDate
    ,CASE
      WHEN(LEN(LTRIM(s)) != 8) THEN NULL
      WHEN(ISDATE(s)=0) THEN NULL
      ELSE CAST(s AS DATE) -- or CONVERT(DATE,s,112) ???
     END AS MyDate
    ,TRY_CAST(s AS DATE) AS ScottPDate
    FROM @t

    See rows 3 & 5.

    Scott

    I realize that Scott Pletcher already provided a solution but there may be a simplification.  I'm just confused by your statement of "See rows 3 & 5" but you didn't say a thing about row 1.
    

    It looks like you're saying that row 3, which resolves to the YYYYMM format, should resolve to a date for the first of the given month.
    It also looks like you're saying that row 5, which resolves to YYYY, should not be considered as a valid date for the 1st of January for the given year.
    Part of the confusion is that row 2 also resolves to just YYYY and yet you brought no attention to that.

    So, let me ask... what are the explicit rules?  Are they...
    1.  If the characters resolve to YYYYMMDD and it's a valid date, return the converted date.
    2.  If the characters resolve to YYYYMM      and it's a valid year and month, return the converted date as a "first of he month" date.
    3.  If the characters resolve to YYYY            return a NULL regardless if it's a valid year or not.
    4.  Otherwise, return NULL

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, December 24, 2018 10:41 AM

    Scott In Sydney - Thursday, December 20, 2018 11:30 PM

    ScottPletcher - Tuesday, December 18, 2018 11:12 AM

    If you're on SQL 2016 (or even SQL 2012), just use TRY_CAST:


    SELECT s
    ,TRY_CAST(s AS date) AS MyDate
    FROM @t

    Hi Scott,

    Your parents named you well 😀

    I'll hit the doc on TRY_CAST, but that in and of itself does not give me the desired results:

    DECLARE @t TABLE (s VARCHAR(8))

    INSERT INTO @t
    VALUES
    ('foo')
    ,('2018')
    ,('201802')
    ,('20180230')
    ,(' 2018')
    ,('2018----')
    ,('----01--')
    ,('20181225')

    SELECT s
    ,ISDATE(s) AS IsDate
    ,CASE
      WHEN(LEN(LTRIM(s)) != 8) THEN NULL
      WHEN(ISDATE(s)=0) THEN NULL
      ELSE CAST(s AS DATE) -- or CONVERT(DATE,s,112) ???
     END AS MyDate
    ,TRY_CAST(s AS DATE) AS ScottPDate
    FROM @t

    See rows 3 & 5.

    Scott

    I realize that Scott Pletcher already provided a solution but there may be a simplification.  I'm just confused by your statement of "See rows 3 & 5" but you didn't say a thing about row 1.
    

    It looks like you're saying that row 3, which resolves to the YYYYMM format, should resolve to a date for the first of the given month.
    It also looks like you're saying that row 5, which resolves to YYYY, should not be considered as a valid date for the 1st of January for the given year.
    Part of the confusion is that row 2 also resolves to just YYYY and yet you brought no attention to that.

    So, let me ask... what are the explicit rules?  Are they...
    1.  If the characters resolve to YYYYMMDD and it's a valid date, return the converted date.
    2.  If the characters resolve to YYYYMM      and it's a valid year and month, return the converted date as a "first of he month" date.
    3.  If the characters resolve to YYYY            return a NULL regardless if it's a valid year or not.
    4.  Otherwise, return NULL

    FYI, here's my last post, with the changes for yyyy and yyyymm incorporated into the code:


    SELECT s
        s2
        ,TRY_CAST(s2 AS DATE) AS ScottPDate
    FROM @t
    CROSS APPLY (
        SELECT LTRIM(RTRIM(s)) AS s1
    ) AS ca1
    CROSS APPLY (
        SELECT CASE WHEN LEN(s1) = 4 THEN s1 + '0101'
            WHEN LEN(s1) = 6 THEN s1 + '01'
            ELSE s1 END AS s2
    ) AS ca2

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Now all I have to do is get the OP to confirm and/or clarify the requirements according to the questions I asked.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply