Converting file name MMDDYYYY.txt to a Date Field

Question

Converting file name MMDDYYYY.txt to a Date Field

rjjh78

SSCrazy

Points: 2787
More actions
September 14, 2023 at 2:46 pm

#4286462
Hello,
Converting file name MMDDYYYY.txt to a Date Field but CONVERT or CAST do not seem to work perhaps because I also have a REPLACE? Can someone help me understand how to do this if it is possible? See sample code below and many thanks in advance!
```
CREATE TABLE #t (getFileName varchar(100))
INSERT INTO #T (getFileName) VALUES ('09142023.txt')

SELECT REPLACE(getFileName, '.txt','') FROM #t
```

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246988 More actions · Answer 1

Put the date into YYYYMMDD format before casting and it should work.

DECLARE @x VARCHAR(100) = '09142023';

SELECT @x
      ,SomeDate = CAST (CONCAT (RIGHT(@x, 4), LEFT(@x, 2), SUBSTRING (@x, 3, 2)) AS DATE);

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

ScottPletcher SSC Guru Points: 101126 More actions · Answer 2

SELECT CAST(SUBSTRING((getFileName, 5, 4) + 
    LEFT(getFileName, 4) AS date) AS FileNameDate 
FROM #t

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

kaj SSCommitted Points: 1607 More actions · Answer 3

Make the filename use the international ISO format YYYYMMDD instead of that quaint american date format and you'll have no trouble at all. 🙂

Phil Parkin SSC Guru Points: 246988 More actions · Answer 4

kaj wrote:

Make the filename use the international ISO format YYYYMMDD instead of that quaint american date format and you'll have no trouble at all. 🙂

+1, this has the added benefit of being sortable in Windows File Explorer.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

kaj SSCommitted Points: 1607 More actions · Answer 5

... but if you can't change the date format in the file name...

This would also be a possible solution:

DECLARE @x VARCHAR(100) = '09142023';

SELECT @x as [@x]
  , CONVERT(date,STUFF(STUFF(@x,3,0,'/'),6,0,'/'),101) as SomeDate

I use STUFF to insert a slash in the right places so that the string now conforms to the 101 style used by CONVERT.

Jeffrey Williams SSC Guru Points: 90305 More actions · Answer 6

Using the OP's original example:

SELECT *
     , file_date = datefromparts(f.string_date % 10000, f.string_date / 1000000, f.string_date / 10000 % 100)
  FROM #t
 CROSS APPLY (VALUES (substring(getFileName, 1, charindex('.', getFileName, 1) - 1))) AS f(string_date);

Assumptions: the filename is formatted as MMDDYYYY.txt. Using CROSS APPLY to remove the extension it is then just math to parse out month, day and year.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 7

Here's a quick question for you because we have to enforce such things where I work... do you want the conversion to fail if the date contains less than 8 characters?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 8

Deleted. Made a mistake and didn't check before I posted. 🙁

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 9

IF the MMDDYYYY formatted string date is guaranteed to always be 8 characters, the following works (most of the posted code doesn't consider the inclusion of the ".txt") according to the 'MMDDYYYY.txt' string format and, because it does not use CHARINDEX, it's more than twice as fast as all the other code that works against the OPs data so far (with the exception of Scott Pletcher's code, which I missed on the first pass). It will produce an error if it's shorter and probably an error if it's longer but no guarantee there. If you want errors to convert to NULLs, use TRY_CONVERT or TRY_CAST on systems that support it (2016+).

DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
 SELECT @BitBucket = CONVERT(DATE,SUBSTRING(getFileName,5,4)+SUBSTRING(getFileName,1,4)) --SWAPS MMDD and YYYY to make ISO date.
   FROM #T
    SET STATISTICS TIME OFF;
GO 5

Now, full disclosure says the time difference is only about 4.5 seconds faster on 10 million rows so ... who cares?

Consider this... if you wrote everything on your server to run a little more than twice as fast, your server workload would run twice as fast. The trouble is that few look at that big picture when they're writing their code and so their server is dying from the "SQL Death by a Million Cuts".

If you'd like to test, here's some test code to build the 10 Million rows according to the spec the OP wrote.

--=====================================================================================================================
--      Presets (This creates 10 million rows in about 5 seconds on my laptop)
--=====================================================================================================================
--===== If the test table already exists, drop it to make reruns in SSMS easier.
     -- Using "old" method for pre-2012 folks.
     IF OBJECT_ID('TempDB.dbo.#T') IS NOT NULL
   DROP TABLE #T
;
GO
--===== Parameters to control the desired range of dates and and number of rows.
DECLARE  @LoDateLimit   DATETIME    = '2020'   --Inclusive, same as 2020-01-01
        ,@HiDateLimit   DATETIME    = '2030'   --Exclusive, same as 2030-01-01
        ,@RowCount      INT         = 10000000 --Currently, 10 million rows
;
--===== Local constants to simplify formulas and increase performance.
DECLARE  @DayCount      INT         = DATEDIFF(dd,@LoDateLimit,@HiDateLimit)
;
--=====================================================================================================================
--      Create and populate the table (HEAP) on the fly using given parameters.
--=====================================================================================================================
 SELECT TOP (@RowCount) --Needs to be hardocded for 2005 or less.
        getFileName = SUBSTRING(ca.ISODateTxt,5,8)+SUBSTRING(ca.ISODateTxt,1,4)+'.txt' --SWAPS YYYY and MMDD to make MMDDYYYY date.
   INTO #T
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
  CROSS APPLY (VALUES (CONVERT(CHAR(8),DATEADD(dd,ABS(CHECKSUM(NEWID())%@DayCount),@LoDateLimit),112)))ca(ISODateTxt)
;
--===== Display a sample of the contents of the Temp Table for verification.
SELECT TOP (100) * FROM #T

After finding an old post of mine from 2018, if you want to scrape off slightly more than an extra 100 ms (a suggestion on my part but written by Eirikur Eiriksson), the following will to the trick. It's similar to a lot of the other code but uses DATEFROMPARTS with explicit conversions to INT. Here's that bit of code...

DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
 SELECT @BitBucket = DATEFROMPARTS(
                         CONVERT(INT,SUBSTRING(getFileName,5,4),0)
                        ,CONVERT(INT,SUBSTRING(getFileName,1,2),0)
                        ,CONVERT(INT,SUBSTRING(getFileName,3,2),0)
                     )
FROM #T;
    SET STATISTICS TIME OFF;
GO 5

And, finally, you'd think that Integer Math would be the fastest but the LEFT(8) or SUBSTRING(1,8) in the CROSS APPLY slows it down by about 600 ms.

DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
 SELECT @BitBucket = DATEFROMPARTS(ca.IntDate%10000, ca.IntDate/1000000, ca.IntDate/10000%100)
   FROM dbo.TestIntDateTime  t
  CROSS APPLY (VALUES (CONVERT(INT,SUBSTRING(t.getFileName,1,8))))ca(IntDate)
    SET STATISTICS TIME OFF;
GO 5

This reply was modified 1 year, 3 months ago by Jeff Moden. Reason: Missed Scott Pletcher's code and added credit to him there

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 10

For the love of whatever you hold dear, stop using “MMDDYYYY” formatting!

It doesn't sort chronology as text. You should be using “YYYYMMDD”.

ScottPletcher SSC Guru Points: 101126 More actions · Answer 11

Jeff Moden wrote:

IF the MMDDYYYY formatted string date is guaranteed to always be 8 characters, the following works (most of the posted code doesn't consider the inclusion of the ".txt") according to the 'MMDDYYYY.txt' string format and, because it does not use CHARINDEX, it's more than twice as fast as all the other code that works against the OPs data so far. It will produce an error if it's shorter and probably an error if it's longer but no guarantee there. If you want errors to convert to NULLs, use TRY_CONVERT or TRY_CAST on systems that support it (2016+).
DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
 SELECT @BitBucket = CONVERT(DATE,SUBSTRING(getFileName,5,4)+SUBSTRING(getFileName,1,4)) --SWAPS MMDD and YYYY to make ISO date.
   FROM #T
    SET STATISTICS TIME OFF;
GO 5
Now, full disclosure says the time difference is only about 4.5 seconds faster on 10 million rows so ... who cares?
Consider this... if you wrote everything on your server to run a little more than twice as fast, your server workload would run twice as fast. The trouble is that few look at that big picture when they're writing their code and so their server is dying from the "SQL Death by a Million Cuts".
If you'd like to test, here's some test code to build the 10 Million rows according to the spec the OP wrote.
--=====================================================================================================================
--      Presets (This creates 10 million rows in about 5 seconds on my laptop)
--=====================================================================================================================
--===== If the test table already exists, drop it to make reruns in SSMS easier.
     -- Using "old" method for pre-2012 folks.
     IF OBJECT_ID('TempDB.dbo.#T') IS NOT NULL
   DROP TABLE #T
;
GO
--===== Parameters to control the desired range of dates and and number of rows.
DECLARE  @LoDateLimit   DATETIME    = '2020'   --Inclusive, same as 2020-01-01
        ,@HiDateLimit   DATETIME    = '2030'   --Exclusive, same as 2030-01-01
        ,@RowCount      INT         = 10000000 --Currently, 10 million rows
;
--===== Local constants to simplify formulas and increase performance.
DECLARE  @DayCount      INT         = DATEDIFF(dd,@LoDateLimit,@HiDateLimit)
;
--=====================================================================================================================
--      Create and populate the table (HEAP) on the fly using given parameters.
--=====================================================================================================================
 SELECT TOP (@RowCount) --Needs to be hardocded for 2005 or less.
        getFileName = SUBSTRING(ca.ISODateTxt,5,8)+SUBSTRING(ca.ISODateTxt,1,4)+'.txt' --SWAPS YYYY and MMDD to make MMDDYYYY date.
   INTO #T
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
  CROSS APPLY (VALUES (CONVERT(CHAR(8),DATEADD(dd,ABS(CHECKSUM(NEWID())%@DayCount),@LoDateLimit),112)))ca(ISODateTxt)
;
--===== Display a sample of the contents of the Temp Table for verification.
SELECT TOP (100) * FROM #T
After finding an old post of mine from 2018, if you want to scrape off slightly more than an extra 100 ms (a suggestion on my part but written by Eirikur Eiriksson), the following will to the trick. It's similar to a lot of the other code but uses DATEFROMPARTS with explicit conversions to INT. Here's that bit of code...
DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
 SELECT @BitBucket = DATEFROMPARTS(
                         CONVERT(INT,SUBSTRING(getFileName,5,4),0)
                        ,CONVERT(INT,SUBSTRING(getFileName,1,2),0)
                        ,CONVERT(INT,SUBSTRING(getFileName,3,2),0)
                     )
FROM #T;
    SET STATISTICS TIME OFF;
GO 5
And, finally, you'd think that Integer Math would be the fastest but the LEFT(8) or SUBSTRING(1,8) in the CROSS APPLY slows it down by about 600 ms.
DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
 SELECT @BitBucket = DATEFROMPARTS(ca.IntDate%10000, ca.IntDate/1000000, ca.IntDate/10000%100)
   FROM dbo.TestIntDateTime  t
  CROSS APPLY (VALUES (CONVERT(INT,SUBSTRING(t.getFileName,1,8))))ca(IntDate)
    SET STATISTICS TIME OFF;
GO 5

The only difference I see between that code and mine is CONVERT rather than CAST ... I didn't realize CAST had that "big overhead". (Presumably using LEFT rather than SUBSTRING couldn't have that much overhead.)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 12

ScottPletcher wrote:

The only difference I see between that code and mine is CONVERT rather than CAST ... I didn't realize CAST had that "big overhead". (Presumably using LEFT rather than SUBSTRING couldn't have that much overhead.)

Crud. My sincere apologies, Scott. I missed your code. You're code is faster that the other code, as well. I've updated my response to point that out.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rjjh78 SSCrazy Points: 2787 More actions · Answer 13

rjjh78

SSCrazy

Points: 2787

September 15, 2023 at 11:07 pm

#4287414

Many thanks all for all the replies!

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 14

rjjh78 wrote:

Many thanks all for all the replies!

Thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)