Find Valid Date from String

  • Hello All,

    I have a field that unfortunately I have no control over. The field is a Approximate date field. This is part of a medical chart. In the application the provider can input a Approximate date of diagnosis. The problem is it is a free text field.

    I am writing a report against this field and I need to capture as many valid dates as I can without having a conversion error. I have some code that I am using below that works most of the time but still encounters issues when a string is validated as a date but can't be converted. The values commented out are examples of instances I found that will cause a conversion error.

    I'm looking for a better method of doing this. Thanks for any help or insight in advance.

    CREATE TABLE #DatesInStrings (ID INT IDENTITY (1,1), ApproximateDate VARCHAR(30))

    INSERT INTO #DatesInStrings VALUES ('01/02/14'), ('01-02-14'),('2014'),('01//02/14'),('01.02.2014'),('01--2000') --,('02-01/02')--,('01-02-014')

    SELECT CASE WHEN ISDATE(LEFT(SUBSTRING(dis.ApproximateDate, PATINDEX('%[0-9]%', dis.ApproximateDate), 200),

    PATINDEX('%[^0-9/-]%', SUBSTRING(dis.ApproximateDate, PATINDEX('%[0-9]%',dis.ApproximateDate), 200) + 'X') -1))=1 -----Find and Isolate potential DATE characters from string and validate as able to CONVERT to DATE

    THEN CONVERT(DATE ,LEFT(SUBSTRING(dis.ApproximateDate, PATINDEX('%[0-9]%', dis.ApproximateDate), 200),

    PATINDEX('%[^0-9/-]%', SUBSTRING(dis.ApproximateDate, PATINDEX('%[0-9]%', dis.ApproximateDate), 200) + 'X') -1))END

    FROM #DatesInStrings dis

    DROP TABLE #DatesInStrings

    ***SQL born on date Spring 2013:-)

  • Hmm...the problem with those last two values is that they can successfully be cast as DATETIME, which is why ISDATE is returning 1 for them. I don't know of any ways to deal with that off-hand if you actually want to ignore those values.

    If you don't care if those values are allowed, then you could cast as datetime and then as dates.

    I'll give it some more thought to see if I can come up with alternative approaches.

    Cheers!

  • Are you 2008? There is a try_convert, but it's only in 2012+ (https://msdn.microsoft.com/en-us/library/hh230993.aspx)

    I think you'd need more checking of the substring returned to see if things were in bound. Even then, you're limiting to a specific region or date format. However you could check for month/day inside of a range, and perhaps that gets you closer.

    This is a complex string manipulation. I think this is the type of thing crying for regex inside T-SQL to better find/match patterns.

  • Hi Steve,

    Yes sir,

    it is 2008 r2. I so wish for the day I can start coding using 2012. Unfortunately many of our customers are still lagging behind with 2008 r2.

    I just wish applications would use correct data types. I know wishful thinking. I knew this was complex just trying to see if anyone had some cool tricks. I will also look into the Regex as well.

    Thanks!

    ***SQL born on date Spring 2013:-)

  • This might be overkill, but storing a table with valid date formats could work. You can simplify the list or increase it as needed. You could even store lengths and format codes to use them in the convert function.

    CREATE TABLE #DatesInStrings (ID INT IDENTITY (1,1), ApproximateDate VARCHAR(30))

    INSERT INTO #DatesInStrings VALUES ('01/02/14'), ('01-02-14'),('2014'),('01//02/14'),('01.02.2014'),('01--2000'),('02-01/02'),('01-02-014')

    CREATE TABLE #ValidDateFormats(

    date_format varchar(100)

    )

    INSERT INTO #ValidDateFormats

    VALUES('%[^0-9/.-][1-2][0-9][0-9][0-9][^0-9/.-]%'),

    --YYYY-MM-DD

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]-0[0-9]-[0-2][0-9][^0-9/.-]%'),

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]-0[0-9]-3[0-1][^0-9/.-]%'),

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]-1[0-2]-[0-2][0-9][^0-9/.-]%'),

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]-1[0-2]-3[0-1][^0-9/.-]%'),

    --YYYYMMDD

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]0[0-9][0-2][0-9][^0-9/.-]%'),

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]0[0-9]3[0-1][^0-9/.-]%'),

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]1[0-2][0-2][0-9][^0-9/.-]%'),

    ('%[^0-9/.-][1-2][0-9][0-9][0-9]1[0-2]3[0-1][^0-9/.-]%'),

    --MM/DD/YYYY

    ('%[^0-9/.-]0[0-9]/[0-2][0-9]/[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]0[0-9]/3[0-1]/[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]/[0-2][0-9]/[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]/3[0-1]/[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    --MM/DD/YY

    ('%[^0-9/.-]0[0-9]/[0-2][0-9]/[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]0[0-9]/3[0-1]/[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]/[0-2][0-9]/[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]/3[0-1]/[0-9][0-9][^0-9/.-]%'),

    --MM-DD-YYYY

    ('%[^0-9/.-]0[0-9]-[0-2][0-9]-[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]0[0-9]-3[0-1]-[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]-[0-2][0-9]-[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]-3[0-1]-[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    --MM-DD-YY

    ('%[^0-9/.-]0[0-9]-[0-2][0-9]-[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]0[0-9]-3[0-1]-[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]-[0-2][0-9]-[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2]-3[0-1]-[0-9][0-9][^0-9/.-]%'),

    --MM.DD.YYYY

    ('%[^0-9/.-]0[0-9].[0-2][0-9].[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]0[0-9].3[0-1].[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2].[0-2][0-9].[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2].3[0-1].[1-2][0-9][0-9][0-9][^0-9/.-]%'),

    --MM.DD.YY

    ('%[^0-9/.-]0[0-9].[0-2][0-9].[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]0[0-9].3[0-1].[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2].[0-2][0-9].[0-9][0-9][^0-9/.-]%'),

    ('%[^0-9/.-]1[0-2].3[0-1].[0-9][0-9][^0-9/.-]%')

    SELECT ApproximateDate, vd.date_format,

    CASE WHEN ISDATE(SUBSTRING(dis.ApproximateDate, PATINDEX(vd.date_format, dis.ApproximateDate), LEN(vd.date_format)))=1 -----Find and Isolate potential DATE characters from string and validate as able to CONVERT to DATE

    THEN CONVERT(DATE ,SUBSTRING(dis.ApproximateDate, PATINDEX(vd.date_format, dis.ApproximateDate), LEN(vd.date_format)))END

    FROM #DatesInStrings dis

    LEFT JOIN #ValidDateFormats vd ON '#' + dis.ApproximateDate + '#' LIKE VD.date_format

    GO

    DROP TABLE #DatesInStrings

    DROP TABLE #ValidDateFormats

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You might look at SQL# for regex functionality

    http://www.sqlsharp.com/

  • Thank you all! Great suggestions. I will start researching each to find the best fit. 😀

    ***SQL born on date Spring 2013:-)

  • Steve Jones - SSC Editor (1/28/2016)


    You might look at SQL# for regex functionality

    http://www.sqlsharp.com/

    If it can be done in T-SQL, it'll usually be faster than RegEx. We've proven that on several threads including one of Solomon's.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thomashohner (1/28/2016)


    I'm looking for a better method of doing this. Thanks for any help or insight in advance.

    As Steve mentioned, on SQL 2012 we would favor the TRY_* functions built into T-SQL.

    However since you are on a lower version, I implemented a set of SQLCLR functions when I was running on SQL 2008 R2 that reliably check data-type cast-ability that I can share if you are interested. The functions I implemented are specific for SQL Server meaning that while a C# DateTime object's min value is 1/1/0000 a SQL DATETIME bottoms out at 1/1/1900 and I am taking take that into account.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (1/28/2016)


    thomashohner (1/28/2016)


    I'm looking for a better method of doing this. Thanks for any help or insight in advance.

    As Steve mentioned, on SQL 2012 we would favor the TRY_* functions built into T-SQL.

    However since you are on a lower version, I implemented a set of SQLCLR functions when I was running on SQL 2008 R2 that reliably check data-type cast-ability that I can share if you are interested. The functions I implemented are specific for SQL Server meaning that while a C# DateTime object's min value is 1/1/0000 a SQL DATETIME bottoms out at 1/1/1900 and I am taking take that into account.

    I can't speak to C#'s date range, but the minimum value of a SQL Server Datetime is 1/1/1753.

  • Ugh, thanks for the correction Ed. I am confident my C# code is correct but I was going from memory on the post as to the boundaries (was juat trying to explain there are diffs in the boundaries and my code accounts for them). Now I have figure out where I got 1/1/1900 and why it is a significant date.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (1/28/2016)


    Ugh, thanks for the correction Ed. I am confident my C# code is correct but I was going from memory on the post as to the boundaries (was juat trying to explain there are diffs in the boundaries and my code accounts for them). Now I have figure out where I got 1/1/1900 and why it is a significant date.

    1/1/1900 is date zero. Casting a zero to datetime will return that date. Most people won't need anything previous to that.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (1/28/2016)


    Orlando Colamatteo (1/28/2016)


    Ugh, thanks for the correction Ed. I am confident my C# code is correct but I was going from memory on the post as to the boundaries (was juat trying to explain there are diffs in the boundaries and my code accounts for them). Now I have figure out where I got 1/1/1900 and why it is a significant date.

    1/1/1900 is date zero. Casting a zero to datetime will return that date. Most people won't need anything previous to that.

    Thanks Luis. Pretty sure you're right, that that was what I was thinking but I'm mobile now and didn't want to double down on my mistake until I could try it out.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (1/28/2016)


    Luis Cazares (1/28/2016)


    Orlando Colamatteo (1/28/2016)


    Ugh, thanks for the correction Ed. I am confident my C# code is correct but I was going from memory on the post as to the boundaries (was juat trying to explain there are diffs in the boundaries and my code accounts for them). Now I have figure out where I got 1/1/1900 and why it is a significant date.

    1/1/1900 is date zero. Casting a zero to datetime will return that date. Most people won't need anything previous to that.

    Thanks Luis. Pretty sure you're right, that that was what I was thinking but I'm mobile now and didn't want to double down on my mistake until I could try it out.

    No problem, Orlando. And yes, Luis is absolutely right.

  • A few thoughts:

    1) As Steve mentioned, TRY_CONVERT is your ideal option, but since it was introduced in 2012 and you are on 2008 R2, that isn't an option (sadly).

    2) Some of the data is "fixable", such as replacing "//" with just "/". This is very easy to do with a CTE to massage the data before you even test / convert it.

    3) While you ultimately want the data in the DATE datatype, you should test with first converting to DATETIME. The conversion to DATETIME is less strict and will convert some values that converting to DATE will reject.

    4) While you can accomplish some of this with simplistic T-SQL pattern matching via LIKE and PATINDEX, the variability of the data really does tend to favor Regular Expressions. Even the fine example from Luis only captures a subset of the patterns. And each date format requires 4 entries. Considering that there are 16 formats, that's 64 patterns right there. Why 16 formats? Because each of the date parts can accept fewer digits: the day can be 1 or 2 digits, the month 1 or 2, and the year 1 through 4. So it's not just MM/DD/YYYY, it's also: M/DD/YYYY, M/D/YYYY, MM/D/YYYY, M/D/Y, M/D/YY, and so on. The following RegEx expression captures all of those variations:

    ^\d{4}$|^(?:[1-9]|0[1-9]|1[012])[-./](?:[1-9]|0[1-9]|[12]\d|3[01])[-./]\d{1,4}$

    5) Even with the incredible power and flexibility of RegEx, it still suffers from the same failing that the pure T-SQL solution does in that it cannot determine which months have which maximum days, especially in terms of leap years.

    6) Rather than mess with error-prone patterns (that could be accomplish via RegEx, but it would be a much more complex pattern that would likely require the very cool if/then/else operators, but still not authoritative), you could get a highly reliable answer by simply asking SQL Server to attempt the conversion. Sounds like TRY_CONVERT, right? It kinda is. Imagine you could do a TRY / CATCH block in a scalar function (and ignore, for the moment, the performance implications of T-SQL scalar UDFs), you could wrap a CONVERT in SQL Server 2008 R2 in TRY / CATCH and if it fails just return NULL. That's all TRY_CONVERT does. But for some odd reason, we aren't allowed to use TRY / CATCH in T-SQL functions. BUT, you can use try / catch in SQLCLR functions :-). And in fact, I implemented this exact functions a few years ago and included it in the SQL# library, and even better, in the Free version. It's called: Util_IsValidConvert.

    I worked up a demo that includes TRY_CONVERT, but if you are testing on a version prior to SQL Server 2012, just comment those 3 lines out. I also included the RegEx example to show that it gets most, but not all of the entries correct. But Util_IsValidConvert, when using the DATETIME datatype, matches the output of TRY_CONVERT. But it only returns a BIT, so you just need to use it in a CASE statement (i.e. CASE Util_IsValidConvert() = 1 THEN CONVERT(DATETIME,...) ELSE NULL END). I started with the example code in the original post, but added quite a few test cases to show the wide range of variations. Enjoy 🙂

    IF (OBJECT_ID(N'tempdb..#DatesInStrings') IS NULL)

    BEGIN

    CREATE TABLE #DatesInStrings (ID INT IDENTITY (1,1), ApproximateDate VARCHAR(30));

    INSERT INTO #DatesInStrings VALUES ('01/02/14'), ('01-02-14'),('2014'),('01//02/14'),('01.02.2014'),

    ('01--2000'),('02-01/02'),('01-02-014');

    INSERT INTO #DatesInStrings VALUES ('01/02/4'), ('01-02-4'),('05/2014'),('01.02/14'),('01.02-2014'),

    ('1/2/4'),('1-2-4'),('1-2.4'),('1/2-4');

    INSERT INTO #DatesInStrings VALUES ('4'), ('14'),('014'),('214'),('04');

    INSERT INTO #DatesInStrings VALUES ('2/30/14'),('2/29/4'),('2/29/2000'),('2/29/1900'),('33/2/14'),

    ('2/33/14'),('2/10/999');

    END;

    ;WITH cte(ID, ApproximateDate, DateImproved) AS

    (

    SELECT ID, ApproximateDate, REPLACE(REPLACE(REPLACE(ApproximateDate, '//', '/'), '--', '-'), '..', '.')

    FROM #DatesInStrings

    )

    SELECT ID, ApproximateDate, DateImproved,

    TRY_CONVERT(DATE, DateImproved) AS [TRY_CONVERT(DATE)],

    TRY_CONVERT(DATETIME, ApproximateDate) AS [TRY_CONVERT(DATETIME, Approximate)],

    TRY_CONVERT(DATETIME, DateImproved) AS [TRY_CONVERT(DATETIME, Improved)],

    --TRY_PARSE(DateImproved AS DATE), TRY_PARSE(DateImproved AS DATETIME) -- these get a .NET error

    SQL#.Util_IsValidConvert(DateImproved, N'DATE', 0, 0) AS [SQL#.IsValidConvert(DATE)],

    SQL#.Util_IsValidConvert(DateImproved, N'DATETIME', 0, 0) AS [SQL#.IsValidConvert(DATETIME)],

    SQL#.RegEx_IsMatch4k(DateImproved,

    N'^\d{4}$|^(?:[1-9]|0[1-9]|1[012])[-./](?:[1-9]|0[1-9]|[12]\d|3[01])[-./]\d{1,4}$', 1, NULL)

    AS [SQL#.RegEx_IsMatch4k()]

    FROM cte;

    Take care,

    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply