Extract date from a text field

  • I'm trying to extract date from a text field.  I'm using PADINDEX to do this. My results are very inconsistent.  Is there anyway I can do this through SQL.  I don't have an application to do this for me.  I am trying to get this for a report that is needed.

    The text field has dates that are not always listed as MM/DD/YYYY   sometimes its listed as M/DD/YYYY.  Also,

    Here is the query I am using:

    select

    substring(ar.finding_text,patindex('%[0-9]%/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',ar.finding_text),10)

    FROM [ARKPPDB].[PowerPath].[dbo].[accession_2] a

    LEFT OUTER JOIN acc_results ar on a.id = ar.acc_id

  • ashatimjohn wrote:

    I'm trying to extract date from a text field.  I'm using PADINDEX to do this. My results are very inconsistent.  Is there anyway I can do this through SQL.  I don't have an application to do this for me.  I am trying to get this for a report that is needed.

    The text field has dates that are not always listed as MM/DD/YYYY   sometimes its listed as M/DD/YYYY.  Also,

    Here is the query I am using:

    select substring(ar.finding_text,patindex('%[0-9]%/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',ar.finding_text),10) FROM [ARKPPDB].[PowerPath].[dbo].[accession_2] a LEFT OUTER JOIN acc_results ar on a.id = ar.acc_id

    The obvious thing missing here are several samples of the strings you're trying to do the date extractions from.  I also recommend that your read and heed the article available at the first link in my signature line below for this and future posts.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the day is < 10, is it always padded?  That is, '3/05/2021' vs '3/5/2021'?  Based on your code, it looks like you allow the month to be a single digit if < 10.  If so, that means you may need to adjust the length from 10 to 9 for those cases.

    Also, you can refine the PATINDEX a bit to be more specific to dates:

    patindex('%[0-9]%/[0123][0-9]/[2][012][0-9][0-9]%' /*presumably by 2300+, this code won't be used any more*/

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I got really whacky results using this:  patindex('%[0-9]%/[0123][0-9]/[2][012][0-9][0-9]%'

  • You've provided us NO sample data, as Jeff mentioned.  We can't know what your data is unless you tell us.  Nor does "whacky results" really tell us anything.

    Good luck with this, though, I hope you figure it out.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Since there's no sample data, if the date is the only thing in the string, find the first occurrence of a digit and the last occurrence of a digit and pull what's in between.  SQL server automatically handles a remarkable number of different formats, many of which are not in the CONVERT() listing.

    If you'd like a coded example, then post some readily consumable data for examples (see the article I previously referred you to for one of many ways of how to do that properly) and people will jump at the chance to help you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If no matching PATINDEX string was found in the data, then it would default to the start of the string, which might seem to be weird results if you didn't expect it, but is consistent with how PATINDEX works.

    To be safe from that issue, you should use NULLIF, like so:

    substring(ar.finding_text, NULLIF(patindex('%[0-9]%/[0123][0-9]/[2][012][0-9][0-9]%', 0), 10)

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I had to do something similar, importing a .CSV ,  but at least the date field was in it's own column

    Month and day did not have a leading 0, so I had 1/3/2022 , 10/3/2021 , 4/17/2021  and 12/23/2021

    I built a CASE statement, looking at where the '/' are, to build a YYYYMMDD field with leading 0.

    • This reply was modified 2 years, 10 months ago by  homebrew01.
    • This reply was modified 2 years, 10 months ago by  homebrew01.
  • Even after participating in forums for more than 2 decades I guess I'll never understand why people post and then never answer a simple question asking a a small example of data either here or on the other forum I've found this same OP on.  I've also seen some answers that might work but, lordy, they might not and, even if they do, they're going to be slow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm sorry, I had to step away and was not able to respond to my post.  I apologize for that.

    The date could be 2/1/2022 or 12/1/2022

    The time could be 3:27 pm or 12:27 pm

    The problem I have is that I can't get "at" out of the middle and some times do not convert to datetime.

    Thank you all for trying to help me and I apologize for upsetting anyone.

    Here is sample data:

    Physician/Physician’s office called on 2/1/2022 at 3:27 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 3:34 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 2:47 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 4:17 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 2:52 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 2:51 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 4:17 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 4:34 PM Central.

    Physician/Physician’s office called on 2/1/2022 at 11:49 PM Central.

  • Just to be sure, you didn't upset anyone.  I'm just amazed that people don't do more on their posts to help others actually know enough to help them.  For future posts both here and on StackOverflow, you might want to read the article at the first link in my signature line below.  I've also provided a slight different/modernized version of how top provide "Readily Consumable" data in the code below, which will help your questions get answered more quickly without having to answer a thousand questions.

    So here's the code I came up with once I knew what your data looked like.  I took the liberty of splitting out two other columns that you might end up needed, as well.  If it's not fast enough and the case of the data is consistent, we could add a binary collation to the CHARINDEX, the PATINDEX, and the REPLACE to make it faster.  I didn't do so because I don't assume the quality of the data would support it.  If you don't need those two other  columns, they're pretty easy to just delete or comment out.

       DROP TABLE IF EXISTS #TestTable;
    GO
    --===== Create and populate the test table from the data provided.
    -- This is NOT a part of the solution. We''re just creating test data here.
    SELECT *
    INTO #TestTable
    FROM (VALUES
    ('Physician/Physician’s office called on 2/1/2022 at 3:27 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 3:34 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 2:47 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 4:17 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 2:52 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 2:51 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 4:17 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 4:34 PM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 11:49 PM Eastern.')
    ,('Physician/Physician’s office called on 2/1/2022 at 11:27 AM Eastern.')
    ,('Physician/Physician’s office called on 2/1/2022 at 11:34 AM Eastern.')
    ,('Physician/Physician’s office called on 2/1/2022 at 10:47 AM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 9:17 AM Mountain.')
    ,('Physician/Physician’s office called on 2/1/2022 at 10:52 AM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 10:51 AM Central.')
    ,('Physician/Physician’s office called on 2/1/2022 at 9:17 AM Pacific.')
    ,('Physician/Physician’s office called on 2/1/2022 at 9:34 AM Pacific.')
    ,('Physician/Physician’s office called on 2/1/2022 at 11:49 AM Somewhere over the rainbow.')
    )v(SomeString)
    ;
    --===== Let''s see what we''ve got
    SELECT * FROM #TestTable
    ;
    --===== Solve the given problem.
    -- So long as the following pattern is true in the source data, this should always work.
    -- WhoCalled by the string ' called on ' to determine the caller followed by...
    -- by a date string followed by the string ' at ' followed by a meridian time (Has AM/PM) for the CallDT followed by...
    -- a time zone name string for the TimeZone.
    -- The result is available as a DATETIME2(0) with no decimal seconds.
    SELECT SomeString
    ,WhoCalled = LEFT(SomeString,ca1.CalledOn-1)
    ,CallDT = CONVERT(DATETIME2(0),REPLACE(SUBSTRING(SomeString,ca1.CalledOn+10,ca2.Meridian-CalledOn-9),'at',''))
    ,TimeZone = TRIM(' .' FROM SUBSTRING(SomeString,ca2.Meridian+2,500))
    FROM #TestTable
    CROSS APPLY (VALUES(CHARINDEX(' called on ',SomeString)))ca1(CalledOn)
    CROSS APPLY (VALUES(PATINDEX('% [AP]M %',SomeString)+2))ca2(Meridian)
    ;
    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for your feedback, and I'm so happy there are people like you out helping those of us that just don't get it...  I tried your code.  I posted on stackover, but it's hard to read their comments (in my opinion).

    I couldn't get your code to work.  I am able to pull time, but the problem is it won't convert.  This is a vendor software and a huge text field that i've already  crawled thru just to pull out what I was able to get.  I'm still trying just to get to date/time for reporting.

    Is there a way to skip conversion errors in a select statement?

     

    Thank you again!

  • It would be helpful if you posted the CREATE TABLE statement for the table that contains the data and any rows where the code you've used isn't working.  And, to answer your question, yes... there are methods to skip conversion errors but let's first make sure that the error is in the data and not in the code we're using.

    And can you confirm that you're actually using SQL Server 2019?  Thanks.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 0

    I got this working. Thank you all for your assistance.

    select

    *,

    LTRIM(RTRIM(REPLACE(REPLACE(IIF(LTRIM(RTRIM(RIGHT(SUBSTRING([Time Client

    Called], CHARINDEX('on',[Time Client Called]), 15),13))) is null, null,

    LTRIM(RTRIM(RIGHT(SUBSTRING([Time Client Called], CHARINDEX('on',[Time

    Client Called]), 15),13)))), 'a', ''),'t',''))) as "Date Client Called",

    IIF(LTRIM(RTRIM(RIGHT(SUBSTRING([Time Client Called], CHARINDEX('at',[Time

    Client Called]), 11),8))) is null, null, LTRIM(RTRIM(RIGHT(SUBSTRING([Time

    Client Called], CHARINDEX('at',[Time Client Called]), 11),8)))) as "Time

    Called"

    into #tmpCalls

    FROM #tmpPattern

  • Awesome!  Thank you for the feedback.  No need to reply to my question about code on StackOverFlow.  I just found out they deleted a super important post that leads to the total destruction of the Myths surrounding Random GUID Fragmentation and the Myths of supposed "Best Practice" Index Maintenance.  I'm probably never going to post there ever again.  It's just not worth satisfying people's power hungry ego while they wear such impenetrable blinders.  Even worse, I received no notification that it was deleted.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply