October 2, 2019 at 12:08 pm
I regularly load in multiple files using SSIS into a staging table and then onto another table (I call them trans as they are never fixed). The data goes into stage as it comes out of the file or other source. This way I can always prove I took in what I was supplied.
Stage_policy may contain many date fields and the source file is forever changing format of the dates (dont ask why - they just do and will no doubt always carry on doing this). When I move the date to the trans_policy table I want to guarantee that I get the change from Varchar to date correct. In the past we have had files with what we though was UK layout ie dd/mm/yyyy and found out they had changed to MM/DD/YYYY.
Is there a published stored proc anywhere that runs through these columns and checks where the days, months, and years are in the string.
I am about to start writing but as always if there are good pointers out there that would help. I have had a good look but could not find anything.
Oh and SQL Server 2014.
Cheers
E
October 2, 2019 at 1:38 pm
Not sure exactly what you're after here - please supply some sample data and expected results. You can use TRY_CONVERT to verify whether a string can be converted to a valid date. Whether that date is the correct one is a different matter - the only way to distinguish between '06/07/2019' and '07/06/2019', I suppose, is to trust the supplier of the data to use a consistent format.
John
October 2, 2019 at 2:13 pm
If you load them into a staging table with the date stored as varchar, you can go through the date column looking at the maximum value of substring(date,1,2) and substring(date,4,2). Hopefully, you will have at least one date where the day-part is greater than 12 so you will be able to deduce the format of the date for conversion.
October 2, 2019 at 3:32 pm
Take a look at this post where a present a solution for a similar issue.
You can take this a step further, and create a iTVF for future use
October 2, 2019 at 6:49 pm
So, I spent some time creating a function that will successfully convert *MOST* string formats to DATETIME. This has only been tested in a us-en environment.
CREATE FUNCTION dbo.iTVF_StringToDate (
@strDate nvarchar(100)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
SELECT DateResult = CASE
WHEN @strDate LIKE '%[a-z][a-z][a-z]%' THEN TRY_CONVERT(datetime, @strDate) -- 02 Oct 2019 20:26:45:260
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[0][1-9]-[0-3][0-9]T%' THEN TRY_CONVERT(datetime, @strDate, 127) -- yyyy-mm-ddThh:mi:ss
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[1][0-2]-[0-3][0-9]T%' THEN TRY_CONVERT(datetime, @strDate, 127) -- yyyy-mm-ddThh:mi:ss
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[0][1-9]-[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 121) -- yyyy-mm-dd hh:mi:ss
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]-[1][0-2]-[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 121) -- yyyy-mm-dd hh:mi:ss
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]/[0][1-9]/[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 111) -- yyyy/mm/dd
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9]/[1][0-2]/[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 111) -- yyyy/mm/dd
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9].[0][1-9].[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 102) -- yyyy.mm.dd
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9].[1][0-2].[0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 102) -- yyyy.mm.dd
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9][0][1-9][0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 112) -- yyyymmdd
WHEN @strDate LIKE '[0-9][0-9][0-9][0-9][1][0-2][0-3][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 112) -- yyyymmdd
WHEN @strDate LIKE '[0-9][0-9]-[1][3-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy (dd = 13-19)
WHEN @strDate LIKE '[0-9][0-9]-[2][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy (dd = 20-29)
WHEN @strDate LIKE '[0-9][0-9]-[3][0-1]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy (dd = 30-31)
WHEN @strDate LIKE '[0-9][0-9]/[1][3-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 101) -- mm/dd/yyyy (dd = 13/19)
WHEN @strDate LIKE '[0-9][0-9]/[2][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 101) -- mm/dd/yyyy (dd = 20/29)
WHEN @strDate LIKE '[0-9][0-9]/[3][0-1]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 101) -- mm/dd/yyyy (dd = 30/31)
WHEN @strDate LIKE '[1][3-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 105) -- dd-mm-yyyy (dd = 13-19)
WHEN @strDate LIKE '[2][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 105) -- dd-mm-yyyy (dd = 20-29)
WHEN @strDate LIKE '[3][0-1]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 105) -- dd-mm-yyyy (dd = 30-31)
WHEN @strDate LIKE '[1][3-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy (dd = 13-19)
WHEN @strDate LIKE '[2][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy (dd = 20-29)
WHEN @strDate LIKE '[3][0-1]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy (dd = 30-31)
WHEN @strDate LIKE '[0-3][0-9].[0][1-9].[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 104) -- dd.mm.yyyy
WHEN @strDate LIKE '[0-3][0-9].[1][0-2].[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 104) -- dd.mm.yyyy
--==========================================================================================================================
-- This is where there is a big possibilty of getting the wrong date, as we do not know whether the dates
-- are Month-Day or Day-Month. So, rather return NULL, and let the user do the conversion manually.
--==========================================================================================================================
--WHEN @strDate LIKE '[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 110) -- mm-dd-yyyy
--WHEN @strDate LIKE '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' THEN TRY_CONVERT(datetime, @strDate, 103) -- dd/mm/yyyy
--==========================================================================================================================
ELSE NULL
END;
GO
Now create a table with various string date formats, and cross apply the function to do the convertsion
IF OBJECT_ID(N'tempdb..#DateFormats', N'U') IS NOT NULL
BEGIN
DROP TABLE #DateFormats;
END;
CREATE TABLE #DateFormats (
FmtID int NOT NULL
, FmtString varchar(40) NOT NULL
, strDate AS CONVERT(nvarchar(100), GETDATE(), FmtID)
);
INSERT INTO #DateFormats ( FmtID, FmtString )
VALUES
( 106, 'dd mon yyyy' )
, ( 113, 'dd mon yyyy hh:mi:ss:mmm (24h)' )
, ( 100, 'mon dd yyyy hh:miAM' )
, ( 107, 'Mon dd, yyyy' )
, ( 109, 'mon dd yyyy hh:mi:ss:mmmAM' )
, ( 126, 'yyyy-mm-ddThh:mi:ss.mmm' )
, ( 127, 'yyyy-mm-ddThh:mi:ss.mmmZ' )
, ( 23, 'yyyy-mm-dd' )
, ( 120, 'yyyy-mm-dd hh:mi:ss (24h)' )
, ( 121, 'yyyy-mm-dd hh:mi:ss.mmm (24h)' )
, ( 111, 'yyyy/mm/dd' )
, ( 102, 'yyyy.mm.dd' )
, ( 112, 'yyyymmdd' )
, ( 104, 'dd.mm.yyyy' )
-- These 4 formats have the potential to return incorrect values (4 June vs 6 May)
-- So the function will return NULL. You could easily modify the function to assume
-- the format that is most common in your environment.
, ( 101, 'mm/dd/yyyy' )
, ( 103, 'dd/mm/yyyy' )
, ( 105, 'dd-mm-yyyy' )
, ( 110, 'mm-dd-yyyy' );
SELECT *
FROM #DateFormats AS src
CROSS APPLY dbo.iTVF_StringToDate(src.strDate) AS conv;
October 2, 2019 at 6:50 pm
Another version, trying to do the least prep work possible (although for determining day or month first, the entire staging table may need to be scanned):
--Assumes that all date strings for a given column in that specific
-- staging table iteration are consistent.
--Recognized formats, where [<delim>] is an optional delim char:
--YYYY[<delim>]MM[<delim>]DD --for dates from 2013 on.
--MM[<delim>]DD[<delim>]YYYY
--DD[<delim>]MM[<delim>]YYYY
--[<assumedOK>]mmm[<assumedOK>]
--Unrecognized format(s):
--YYYY[<delim>]DD[<delim>]MM --if this format is possible, let me know.
DECLARE @byte_to_check int
DECLARE @date_column varchar(30)
DECLARE @date_conversion_code int
DECLARE @date_delim_found bit
DECLARE @date_format varchar(10)
DECLARE @day_first_in_date bit
SELECT TOP (1) @date_column = date_column
FROM dbo.stage_policy
IF @date_column LIKE '20[1][3456789]%'
OR @date_column LIKE '20[23456789]%'
BEGIN
SET @date_format = 'YYYY'
/*assumed format is MM next, never DD next*/
/*
IF SUBSTRING(@date_format, 5, 1) LIKE '[0-9]'
BEGIN
SET @date_delim_found = 0
SET @byte_to_check = 5
END /*IF*/
ELSE
BEGIN
SET @date_delim_found = 1
SET @byte_to_check = 6
END /*ELSE*/
*/
END /*IF*/
ELSE
IF @date_column LIKE '%[a-z]%'
BEGIN
SET @date_format = 'mmm'
SET @byte_to_check = NULL
END /*IF*/
ELSE
BEGIN
IF SUBSTRING(@date_format, 3, 1) LIKE '[0-9]'
BEGIN
SET @date_delim_found = 0
SET @byte_to_check = 3
END /*IF*/
ELSE
BEGIN
SET @date_delim_found = 1
SET @byte_to_check = 4
END /*ELSE*/
END /*ELSE*/
IF @byte_to_check > 0
BEGIN
SET @date_column = NULL
SELECT TOP (1) @date_column = date_column
FROM dbo.stage_policy
WHERE SUBSTRING(date_column, @byte_to_check, 2) LIKE '1[3456789]%' OR
SUBSTRING(date_column, @byte_to_check, 2) LIKE '[23]%'
IF @date_column IS NULL
SET @day_first_in_date = 0
ELSE
SET @day_first_in_date = 1
END /*IF*/
SET @date_conversion_code = CASE
WHEN @date_format = 'mmm' THEN NULL
WHEN @date_format = 'YYYY' THEN
CASE WHEN @date_delim_found = 1 THEN 102 ELSE 112 END
WHEN @day_first_in_date = 0 AND @date_delim_found = 1 THEN 101
WHEN @day_first_in_date = 1 AND @date_delim_found = 1 THEN 103
WHEN @day_first_in_date = 0 THEN 1001 /*DDMMYYYY*/
ELSE 1002 END /*MMDDYYYY*/
INSERT INTO dbo.policy ( date_column, ... )
SELECT
CASE WHEN @date_conversion_code IS NULL
THEN CAST(date_column AS date)
WHEN @date_conversion_code = 1001
THEN SUBSTRING(date_column, 5, 4) + SUBSTRING(date_column, 3, 2) + LEFT(date_column, 2)
WHEN @date_conversion_code = 1002
THEN SUBSTRING(date_column, 5, 4) + LEFT(date_column, 2) + SUBSTRING(date_column, 3, 2)
ELSE CONVERT(date, date_column, @date_conversion_code) END AS date_column,
....
FROM dbo.stage_policy
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 2, 2019 at 9:53 pm
Any format containing yyyy-mm-dd can also throw you for a loop if the data had French origins because SQL Server uses YYYY-DD-MM with the French Language.
There is NO WAY to ascertain the correct position of mm and dd from unknown sources even if you use the main ISO format of YYYYMMDD (with no delimiters) because you simply can't trust the source. The only way to pull this off is to negotiate with the source just exactly what the format will be and then they need to stick with it. Anything else is a pure crap shoot.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2019 at 7:44 am
Any format containing yyyy-mm-dd can also throw you for a loop if the data had French origins because SQL Server uses YYYY-DD-MM with the French Language.
There is NO WAY to ascertain the correct position of mm and dd from unknown sources even if you use the main ISO format of YYYYMMDD (with no delimiters) because you simply can't trust the source. The only way to pull this off is to negotiate with the source just exactly what the format will be and then they need to stick with it. Anything else is a pure crap shoot.
I was going to reply with the same idea (but mine would not be so perspicuous ) 🙂
Far away is close at hand in the images of elsewhere.
Anon.
October 3, 2019 at 8:01 am
when the source data incorporates date ranges that span a few month they it is sometimes possible to analyse the data as part of the load process and see which conversion formula works for the majority of the dates in a single column - and then use that one for all dates on that column,
Still not a full guarantee that the format is indeed the one used at source.
If dates are normally on a short interval then it is impossible to guarantee which was the format used as mentioned before
October 3, 2019 at 3:33 pm
Thanks all,
that's going to take me some time to analyse. The source data is just terrible. They have been swapping the format of the dates on almost a daily basis. Currently they are supplying them in two different formats in the same file.
Generally I do not get into this position but thought it would be handy to have a trick handy for the next time this happens.
Lot of interesting reading.
Many Many Thanks
E
October 3, 2019 at 3:38 pm
Thanks all,
that's going to take me some time to analyse. The source data is just terrible. They have been swapping the format of the dates on almost a daily basis. Currently they are supplying them in two different formats in the same file.
Generally I do not get into this position but thought it would be handy to have a trick handy for the next time this happens.
Lot of interesting reading.
Many Many Thanks
E
I would suggest
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply