May 21, 2013 at 12:21 pm
USE [collect2000]
GO
/****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 12:54:28 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tvf_ScrubData]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[tvf_ScrubData]
GO
USE [collect2000]
GO
/****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 12:54:28 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[tvf_ScrubData]
(
-- Add the parameters for the function here
@data varchar(8000),
@type varchar(20)
)
RETURNS TABLE
AS
RETURN
(
select result from
(select LTRIM(rtrim(@data)) as data) a
outer apply
(
SELECT case
when @type = 'posint' then
case when data like '%[^0123456789]%' then null else data end
when @type = 'int' then
case when substring(data,1,1) like '%[^-0123456789]%' then null else case when data like '%[^0123456789]%' then null else data end end
when @type = 'decimal' then
case when substring(data,1,1)like '%[^-0123456789.]%' then
null
else
case when data = '' then null
when data is null then null
when data like '%[^-0123456789.]%' then null
else
case when charindex('.',data) > 0 and charindex('.',data,charindex('.',data)) > 0 then null
else data
end
end
end
when @type = 'posdecimal' then
case when data = '' then null
when data is null then null
when data like '%[^-0123456789.]%' then null
else
case when charindex('.',data) > 0 and charindex('.',data,charindex('.',data)) > 0 then null
else data
end
end
when @type = 'date' then
case when data = '' then null
when data is null then null
when len(data)<> 10 then null
when data like '[0123456789][0123456789][/.-][0123456789][0123456789][/.-][12][0123456789][0123456789][0123456789]' then data
else null
end
when @type = 'time' then
case when data = '' then null
when data is null then null
when data like '[0123456789][0123456789][:.][0123456789][0123456789]' or data like '[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789]' or data like '[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789][.][0123456789][0123456789][0123456789][0123456789]' then data
else null
end
when @type = 'datetime' then
case when data = '' then null
when data is null then null
when len(data)<> 10 then null
when data like '[0123456789][0123456789][/.-][0123456789][0123456789][/.-][12][0123456789][0123456789][0123456789] [0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789][.][0123456789][0123456789][0123456789][0123456789]' then data
else null
end
end as result ) b
)
GO
May 21, 2013 at 12:44 pm
What is this supposed to do?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 21, 2013 at 12:56 pm
It's an inline tvf that scrubs data by type using LIKE.
int, decimal (signed and unsigned), date, time, date time.
For those situations where data just gets stuck in a varchar field and then you have to deal with it later.
It was fun to make it be an inline TVF instead of a function.
I was bored.
Plus, if you pass in the type, then all your scrubamajigs are in the same place.
May 21, 2013 at 1:03 pm
Can you post some examples of using it? I may be having a dense moment but I am not getting it. If I saw it in use it would help me. :hehe:
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 21, 2013 at 3:25 pm
Here's some sample data to throw at it.
select * from
(
select '01/01/2013' as data
union all select '01-02-2012'
union all select '01-03-2010'
union all select '123'
union all select '-123'
union all select '10:30'
union all select '01:28:21'
union all select '02:12:14.345'
union all select '1.123'
union all select '-1.23'
union all select '1-.23'
union all select '1.23.1' ) a outer apply tvf_ScrubData(a.data,'int')
May 21, 2013 at 5:45 pm
I can't believe I'm still doing this:
USE [collect2000]
GO
/****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 17:29:25 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tvf_ScrubData]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[tvf_ScrubData]
GO
USE [collect2000]
GO
/****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 17:29:25 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:Name
-- Create date:
-- Description:
-- =============================================
CREATE FUNCTION [dbo].[tvf_ScrubData]
(
-- Add the parameters for the function here
@data varchar(8000),
@type varchar(20)
)
RETURNS TABLE
AS
RETURN
(
select result from
(select LTRIM(rtrim(@data)) as data ,
'[0123456789][012][/.-][0123][0123456789][/.-][12][0123456789][0123456789][0123456789]' dateFmt,
'%[^0123456789]%' posIntFmt,
'[^-0123456789]' negIntFmt,
'%[^0123456789]%' intFmt,
'[^-0123456789.]' negDecFmt,
'%[^0123456789.]%' decFmt,
'[0123456789][0123456789][:.][0123456789][0123456789]' timeFmt1,
'[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789]' timeFmt2,
'[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789][.][0123456789][0123456789][0123456789]' timeFmt3
) a
outer apply
( select
rtrim(substring(data,1,CHARINDEX(' ',data))) dataDate,
ltrim(substring(data,CHARINDEX(' ',data),99)) dataTime,
SUBSTRING(data,1,1) firstChar,
SUBSTRING(data,2,LEN(data)-1) rest ,
Case when CHARINDEX('.',data) = 0 or (charindex('.',data) > 0 and charindex('.',data,charindex('.',data)+1) = 0) then 1 else 0 end oneDecimal
) c
cross apply
(
SELECT case
when @type = 'posint' then
case when data not like posIntFmt then data end
when @type = 'int' then
case when firstChar not like negIntFmt and substring(data,2,99) not like intFmt then data end
when @type = 'decimal' then
case when firstChar not like negDecFmt and substring(data,2,99) not like decFmt and oneDecimal = 1 then data end
when @type = 'posdecimal' then
case when IsNull(data,'')<> '' and data not like decFmt and oneDecimal = 1 then data end
when @type = 'date' then
case when IsNull(data,'')<> '' and len(data)< 11 then
case
when len(data)=9 and CHARINDEX('/',data) =2 and '0'+data like dateFmt then '0'+data
when len(data)=9 and CHARINDEX('/',data) =3 and substring(data,1,3)+'0'+SUBSTRING(data,4,6) like dateFmt then substring(data,1,3)+'0'+SUBSTRING(data,4,6)
when len(data)=8 and '0'+ substring(data,1,2)+'0'+SUBSTRING(data,3,6) like dateFmt then '0'+ substring(data,1,2)+'0'+SUBSTRING(data,3,6)
when data like dateFmt then data
end
end
when @type = 'time' then
case when IsNull(data,'')<> '' and (data like timeFmt1 or data like timeFmt2 or data like timeFmt3) then data end
when @type = 'datetime' then
case when IsNull(dataDate,'')<> '' and len(dataDate)< 11 then
case
when len(dataDate)=9 and CHARINDEX('/',dataDate) =2 and '0'+dataDate like dateFmt then '0'+dataDate
when len(dataDate)=9 and CHARINDEX('/',dataDate) =3 and substring(dataDate,1,3)+'0'+SUBSTRING(dataDate,4,6) like dateFmt then substring(dataDate,1,3)+'0'+SUBSTRING(dataDate,4,6)
when len(dataDate)=8 and '0'+ substring(dataDate,1,2)+'0'+SUBSTRING(dataDate,3,6) like dateFmt then '0'+ substring(dataDate,1,2)+'0'+SUBSTRING(dataDate,3,6)
when dataDate like dateFmt then dataDate
end
end
+ ' '+
case when IsNull(dataTime,'')<> '' and (dataTime like timeFmt1 or dataTime like timeFmt2 or dataTime like timeFmt3) then dataTime end
end
as result) b where result is not null
)
GO
and to test
select * from
(
select '01/01/2013' as data
union all select '01-02-2012'
union all select '01-03-2010'
union all select '1-4-2012'
union all select '123'
union all select '-123'
union all select '10:30'
union all select '01:28:21'
union all select '02:12:14.345'
union all select '1.123'
union all select '-1.23'
union all select '1-.23'
union all select '1.23.1' ) a cross apply tvf_ScrubData(a.data,'decimal')
May 21, 2013 at 5:46 pm
Which is to say, what CAN'T you do in an inline tvf if you really really want to?
May 22, 2013 at 8:17 am
lnardozi 61862 (5/21/2013)
Which is to say, what CAN'T you do in an inline tvf if you really really want to?
'datetime' probably needs a bit of work!
May 22, 2013 at 10:31 am
'datetime' probably needs a bit of work!
datetime in v1 was definitely crap. v2 should be better though. That wasn't really my point though - my point is that all the functions used to scrub data could be expressed as an inline TVF. Really it would be better to have a single TVF for each type so the data would be returned in the type you were converting to so you wouldn't have to CAST it. I guess my point is really that LIKE is similar to regex, if you want it to be, while still being set based. It is WAY simpler to use a CLR tvf, but it doesn't inline.
May 23, 2013 at 6:49 pm
lnardozi 61862 (5/22/2013)
'datetime' probably needs a bit of work!
datetime in v1 was definitely crap. v2 should be better though. That wasn't really my point though - my point is that all the functions used to scrub data could be expressed as an inline TVF. Really it would be better to have a single TVF for each type so the data would be returned in the type you were converting to so you wouldn't have to CAST it. I guess my point is really that LIKE is similar to regex, if you want it to be, while still being set based. It is WAY simpler to use a CLR tvf, but it doesn't inline.
I like the idea. How would you apply it in validating an entire multicolumn row instead of a single source column at a time?
May 24, 2013 at 4:16 pm
lnardozi 61862 (5/21/2013)
Which is to say, what CAN'T you do in an inline tvf if you really really want to?
Effective and high performance running totals in a pre-2012 environment, for one. Update a table for another. Use NEWID() directly. The list goes on. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2013 at 10:36 pm
Effective and high performance running totals in a pre-2012 environment, for one. Update a table for another. Use NEWID() directly. The list goes on. 😀
The gauntlet, she has been thrown :-D. If only one of them were even REMOTELY possible... but what the hey, they said you couldn't build an (non-text) index on a varchar(max) column either. Hey, this is where the FUN comes in!
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply