August 24, 2012 at 1:23 pm
tyson.price (8/24/2012)
I get your point but, that is why we have comments. I will take performance over easy to read any day of the week. Comments are easy to type and can make some incredibly complicated code easy to understand.
I can't argue that. Do you think this:
SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)
out performs this:
select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')
I'm also not arguing learning regular expressions helps on the resume. My point is very general and not really specific to anyone in particular here. I don't think anyone really disagrees with what I'm saying. Which is "when all other things are equal, keep it simple".
I'm coming from an IBM mainframe background, long ago, and when you come in and the middle of the night to fix an issue that someone made complicated just because they could it makes for a longer night. You also don't have time to stop and learn when the on-lines need to be up in a couple of hours.
I see the same thing in day to day support now that I support Windows applications.
Comments are great and really help. In most shops I've been in they are as common as documentation:w00t:
The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.
In a business envronment with frequent turnover it is very important to use sructured easy to understand code. I'm not saying inefficient code.
1. Performance of both will be similar, I think SUBSTRING with PATINDEX, will outperform double-REPLACE in most if not all of cases (REPLACE speed will depend on number of leading and trailing zeros). Will need to be tested on 1,000,000 run at least.
2. Me too. I also came from Mainframe (Assembler and FORTRAN) long ago... Code which uses RegularExpresssion is way easier to read than "noodles"-like code of recursive CTE and just on a pair with double REPLACE. I cannot see how simple pattern may be more difficult to understand than logic behind REPLACE,LTRIM,REPLACE. '%' - used in LIKE very often by most of T-SQL devs - it's a wildchar, ^ is logical NOT, EXCLUDE etc. Very simple, clearly explained in BoL.
And at the end, using SUBSTRING and PATINDEX is a perfectly structured and easy to understand code if you know T-SQL (especially in comparison to use of Tally table or even the worse - recursive CTE).
If you don't know SQL, I wouldn't try to resolve any issue, which sometimes will happen "In a business envronment with frequent turnover"...
August 24, 2012 at 1:26 pm
tyson.price (8/24/2012)
You also don't have time to stop and learn when the on-lines need to be up in a couple of hours.
Again, I full heartedly agree that's probably a bad time to learn but, if someone is troubleshooting SQL code, then they really should know these simple and very common methods beforehand as well as some of the performance ramifications.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2012 at 1:36 pm
tyson.price (8/24/2012)
I get your point but, that is why we have comments. I will take performance over easy to read any day of the week. Comments are easy to type and can make some incredibly complicated code easy to understand.
I can't argue that. Do you think this:
SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+']%',@Source),1000)
out performs this:
select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')
I'm also not arguing learning regular expressions helps on the resume. My point is very general and not really specific to anyone in particular here. I don't think anyone really disagrees with what I'm saying. Which is "when all other things are equal, keep it simple".
I'm coming from an IBM mainframe background, long ago, and when you come in and the middle of the night to fix an issue that someone made complicated just because they could it makes for a longer night. You also don't have time to stop and learn when the on-lines need to be up in a couple of hours.
I see the same thing in day to day support now that I support Windows applications.
Comments are great and really help. In most shops I've been in they are as common as documentation:w00t:
The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.
In a business envronment with frequent turnover it is very important to use sructured easy to understand code. I'm not saying inefficient code.
I think that what is simple is not easy to define.
It seems that what you find easier to read is different than me. I find the version with PATINDEX easier to read and understand than the other while it seems you find the other one easier to read.
I think you are correct that few people around here would choose to use complicated code when a more simple version will work. If the shops you work in don't comment, fight the system and do it yourself. Others who look at your code will appreciate it and may even start doing it themselves.
To demonstrate how simple it can be:
--Get the substring starting with first occurrence of any character other than @Char
SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)
The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.
I disagree with this statement entirely. In my experience those who write really complicated and clever code do so to stroke their own ego, or they can't figure out a different way. They tend to be arrogant and do little to help anybody else to unravel their bowl of broken spaghetti. One of the best ways any developer can offer to others is to comment their code. IMHO anything else is lazy and shows a disrespect for the industry and the company they work for in addition to the people who have to follow them.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 24, 2012 at 1:39 pm
tyson.price (8/24/2012)
Do you think this:
SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)
out performs this:
select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')
"A Developer must not guess... a Developer must KNOW!" 😉 Test it. Here's the code to build a million row test table for the problem at hand.
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
The code above takes just a second or two to run on a decent machine.... something less than 7 seconds to run on a 10 year old single CPU desktop box.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2012 at 1:43 pm
Sean Lange (8/24/2012)
The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.
I disagree with this statement entirely. In my experience those who write really complicated and clever code do so to stroke their own ego, or they can't figure out a different way. They tend to be arrogant and do little to help anybody else to unravel their bowl of broken spaghetti. One of the best ways any developer can offer to others is to comment their code. IMHO anything else is lazy and shows a disrespect for the industry and the company they work for in addition to the people who have to follow them.
+100000000000000000000000000000
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2012 at 1:57 pm
Jeff Moden (8/24/2012)
tyson.price (8/24/2012)
Do you think this:
SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)
out performs this:
select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')
"A Developer must not guess... a Developer must KNOW!" 😉 Test it. Here's the code to build a million row test table for the problem at hand.
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
The code above takes just a second or two to run on a decent machine.... something less than 7 seconds to run on a 10 year old single CPU desktop box.
Based on your quote from Sergiy, I ran the following:
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2;
go
declare @StartDate datetime2(7) = sysdatetime(),
@HoldString varchar(128),
@Char char(1) = '0';
select
@HoldString = SUBSTRING(SomeString,PATINDEX('%[^'+@Char+'X]%',SomeString),1000)
from
#TestTable;
declare @EndDate datetime2(7) = sysdatetime();
select datediff(ms,@StartDate, @EndDate);
go
declare @StartDate datetime2(7) = sysdatetime(),
@HoldString varchar(128),
@Char char(1) = '0';
select
@HoldString = REPLACE(LTRIM(REPLACE(SomeString, '0', ' ')), ' ', '0')
from
#TestTable;
declare @EndDate datetime2(7) = sysdatetime();
select datediff(ms,@StartDate, @EndDate);
go
declare @StartDate datetime2(7) = sysdatetime(),
@HoldString varchar(128),
@Char char(1) = '0';
select
@HoldString = right(SomeString,datalength(SomeString)-PATINDEX('%[^'+@Char+']%',SomeString)+1)
from
#TestTable;
declare @EndDate datetime2(7) = sysdatetime();
select datediff(ms,@StartDate, @EndDate);
go
drop table #TestTable;
go
The double replace is approximately 2 times slower than the substring/patindex and the right/datalength/patindex versions (I added another version to my testing).
The VM I used has 2 processors x64 processors, 20 GB RAM.
EDIT: Run using SQL Server 2008 R2 as well.
August 24, 2012 at 1:58 pm
For anybody else still following along I put together a test harness to check this out. Thanks to Jeff for the sample data.
CREATE FUNCTION RemoveLeftCharPatIndex
(
@Source VARCHAR(1000)
,@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)
return @NewValue
end
go
create FUNCTION RemoveLeftCharTrimReplace
(
@Source VARCHAR(1000)
,@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)
return @NewValue
end
go
--generate 1,000,000 rows of test data
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
SET STATISTICS TIME ON
print 'PatIndex Version'
select SomeString, dbo.RemoveLeftCharPatIndex(SomeString, '0')
from #TestTable
--print '----------------------------------------'
--print ''
print 'TrimReplace Version'
select SomeString, dbo.RemoveLeftCharTrimReplace(SomeString, '0')
from #TestTable
SET STATISTICS TIME OFF
drop table #TestTable
drop function RemoveLeftCharPatIndex
drop function RemoveLeftCharTrimReplace
I ran this several times on my desktop running Win7 Ent with SQL 2008R2 (Not super machine but not horrible either). It seems that the PATINDEX has a slight edge on performance.
Here are my results:
PatIndex Version
SQL Server Execution Times:
CPU time = 5039 ms, elapsed time = 12791 ms.
-----------------------------------
TrimReplace Version
SQL Server Execution Times:
CPU time = 5881 ms, elapsed time = 12651 ms.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 24, 2012 at 2:03 pm
Jeff Moden (8/24/2012)
tyson.price (8/24/2012)
Do you think this:
SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)
out performs this:
select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')
"A Developer must not guess... a Developer must KNOW!" 😉 Test it. Here's the code to build a million row test table for the problem at hand.
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
The code above takes just a second or two to run on a decent machine.... something less than 7 seconds to run on a 10 year old single CPU desktop box.
Sorry Jeff, Right now I'm on PC without SQL Server installed, so can only guess.
If you can run quick test it would be great.
Again, saying that, SUBSTRING and PATINDEX performance should be quite constant and number of leading characters to remove will be irrelevant. REPLACE will depend on how many leading and trailing zeros are in the string.
Not saying that SUBSTRING & PATINDEX method is much more flexible in what kind of leading things it can find and remove (again the example of removing any leading digit '%[^0-9]%')...
Oops, looks like the tests were run! Thanks!
As I rightly guessed... 🙂
And I'm sure than more zero will be added at front and at the end of string it will be even greater difference.
August 24, 2012 at 2:12 pm
The following code on my laptop running SQL Server 2005, 8 GB RAM, 8 cores, x64 processor:
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2;
go
declare @StartDate datetime,
@HoldString varchar(128),
@Char char(1);
SET @StartDate = GETDATE();
SET @Char = '0';
select
@HoldString = SUBSTRING(SomeString,PATINDEX('%[^'+@Char+'X]%',SomeString),1000)
from
#TestTable;
declare @EndDate datetime;
SET @EndDate = GETDATE();
select datediff(ms,@StartDate, @EndDate);
go
declare @StartDate datetime,
@HoldString varchar(128),
@Char char(1);
SET @StartDate = GETDATE();
SET @Char = '0';
select
@HoldString = REPLACE(LTRIM(REPLACE(SomeString, '0', ' ')), ' ', '0')
from
#TestTable;
declare @EndDate datetime;
SET @EndDate = GETDATE();
select datediff(ms,@StartDate, @EndDate);
go
declare @StartDate datetime,
@HoldString varchar(128),
@Char char(1);
SET @StartDate = GETDATE();
SET @Char = '0';
select
@HoldString = right(SomeString,datalength(SomeString)-PATINDEX('%[^'+@Char+']%',SomeString)+1)
from
#TestTable;
declare @EndDate datetime;
SET @EndDate = GETDATE();
select datediff(ms,@StartDate, @EndDate);
go
drop table #TestTable;
go
The substring method was still about 2 times faster than the multiple replace and about 1.5 times faster than my right() solution.
Really comes down to it depends on the system that you are running on which is better. The best thing to do here is test, test, and test again.
August 24, 2012 at 2:30 pm
Don't trust me, but using Lynn's code to test performance with more leading zeros (100 more) the results are consistent.
All the methods took longer to run and the double REPLACE took twice as long as the other two. SUBSTRING and RIGHT seem to have similar results for elapsed time.
August 24, 2012 at 4:15 pm
Sean Lange (8/24/2012)
For anybody else still following along I put together a test harness to check this out. Thanks to Jeff for the sample data.
CREATE FUNCTION RemoveLeftCharPatIndex
(
@Source VARCHAR(1000)
,@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)
return @NewValue
end
go
create FUNCTION RemoveLeftCharTrimReplace
(
@Source VARCHAR(1000)
,@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)
return @NewValue
end
go
--generate 1,000,000 rows of test data
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
SET STATISTICS TIME ON
print 'PatIndex Version'
select SomeString, dbo.RemoveLeftCharPatIndex(SomeString, '0')
from #TestTable
--print '----------------------------------------'
--print ''
print 'TrimReplace Version'
select SomeString, dbo.RemoveLeftCharTrimReplace(SomeString, '0')
from #TestTable
SET STATISTICS TIME OFF
drop table #TestTable
drop function RemoveLeftCharPatIndex
drop function RemoveLeftCharTrimReplace
I ran this several times on my desktop running Win7 Ent with SQL 2008R2 (Not super machine but not horrible either). It seems that the PATINDEX has a slight edge on performance.
Here are my results:
PatIndex Version
SQL Server Execution Times:
CPU time = 5039 ms, elapsed time = 12791 ms.
-----------------------------------
TrimReplace Version
SQL Server Execution Times:
CPU time = 5881 ms, elapsed time = 12651 ms.
My recommendation is to never use SET STATISTICS to measure code when scalar functions are involved. Please see the following article for why...
http://www.sqlservercentral.com/articles/T-SQL/91724/
That same article also shows how to convert scalar UDFs to higher performance "iSFs" (Inline Scalar Functions).
Also, returning data to the display is the "great equalizer". Instead, dump the results to a throw away variable. When I get home, I'll show you what I mean.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2012 at 4:20 pm
Jeff Moden (8/24/2012)
Sean Lange (8/24/2012)
For anybody else still following along I put together a test harness to check this out. Thanks to Jeff for the sample data.
CREATE FUNCTION RemoveLeftCharPatIndex
(
@Source VARCHAR(1000)
,@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)
return @NewValue
end
go
create FUNCTION RemoveLeftCharTrimReplace
(
@Source VARCHAR(1000)
,@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)
return @NewValue
end
go
--generate 1,000,000 rows of test data
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
SET STATISTICS TIME ON
print 'PatIndex Version'
select SomeString, dbo.RemoveLeftCharPatIndex(SomeString, '0')
from #TestTable
--print '----------------------------------------'
--print ''
print 'TrimReplace Version'
select SomeString, dbo.RemoveLeftCharTrimReplace(SomeString, '0')
from #TestTable
SET STATISTICS TIME OFF
drop table #TestTable
drop function RemoveLeftCharPatIndex
drop function RemoveLeftCharTrimReplace
I ran this several times on my desktop running Win7 Ent with SQL 2008R2 (Not super machine but not horrible either). It seems that the PATINDEX has a slight edge on performance.
Here are my results:
PatIndex Version
SQL Server Execution Times:
CPU time = 5039 ms, elapsed time = 12791 ms.
-----------------------------------
TrimReplace Version
SQL Server Execution Times:
CPU time = 5881 ms, elapsed time = 12651 ms.
My recommendation is to never use SET STATISTICS to measure code when scalar functions are involved. Please see the following article for why...
http://www.sqlservercentral.com/articles/T-SQL/91724/
That same article also shows how to convert scalar UDFs to higher performance "iSFs" (Inline Scalar Functions).
Also, returning data to the display is the "great equalizer". Instead, dump the results to a throw away variable. When I get home, I'll show you what I mean.
Like I did in my code?
August 24, 2012 at 5:32 pm
Lynn Pettis (8/24/2012)
Like I did in my code?
Exactly. I was going to comment on that but ran out of time. Nice going, Lynn.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2012 at 4:26 pm
Sorry for the delay with the coded response.
Here's what I was talking about. Currently, there's no such thing as an "Inline Scalar Function" (iSF for short) but you can make one that comes real close using an iTVF or "Inline Table Valued Function" to return a scalar value.
Here's the self contained code that includes the 2 functions tested so far and a function to emulate an iSF. The difference in performance is pretty large if you consider that the first function and the iSF do identical things.
RAISERROR('Setting up the test environment...',0,1) WITH NOWAIT;
--=======================================================================================
-- Conditionally drop the functions and use a nice safe place that everyone has.
--=======================================================================================
--===== Identify the database to use.
USE tempdb;
--===== Conditionally drop all of the functions to make reruns in SSMS easier.
IF OBJECT_ID('tempdb.dbo.RemoveLeftCharPatIndex') IS NOT NULL
DROP FUNCTION dbo.RemoveLeftCharPatIndex
;
IF OBJECT_ID('tempdb.dbo.RemoveLeftCharTrimReplace') IS NOT NULL
DROP FUNCTION dbo.RemoveLeftCharTrimReplace
;
IF OBJECT_ID('tempdb.dbo.isfRemoveLeadingCharacter') IS NOT NULL
DROP FUNCTION dbo.isfRemoveLeadingCharacter
;
GO
--=======================================================================================
-- Conditionally drop the functions and use a nice safe place that everyone has.
--=======================================================================================
CREATE FUNCTION dbo.RemoveLeftCharPatIndex
(
@Source VARCHAR(1000),
@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)
RETURN @NewValue
END
;
GO
CREATE FUNCTION dbo.RemoveLeftCharTrimReplace
(
@Source VARCHAR(1000),
@Char CHAR(1)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @NewValue VARCHAR(1000)
SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)
RETURN @NewValue
END
;
GO
--===== This is like the RemoveLeftCharPatIndex funtion except it's an "Inline Scalar Function".
CREATE FUNCTION dbo.isfRemoveLeadingCharacter
(
@SomeString VARCHAR(8000),
@RemoveChar CHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT CleanString = SUBSTRING(@SomeString,PATINDEX('%[^'+@RemoveChar+']%',@SomeString),8000)
;
GO
--=======================================================================================
-- Conditionally drop and rebuild the test data.
--=======================================================================================
--===== Conditionally drop the test table to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#TestTable') IS NOT NULL
DROP TABLE #TestTable
;
--===== Create and populate the test with 1,000,000 rows of test data
SELECT TOP 1000000
SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros
+ 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
--=======================================================================================
-- Run the tests using a duration timer because SET STATISTICS TIME ON induces
-- large delay times because of the RBAR nature of Scalar UDFs.
--=======================================================================================
--===== Declare some obviously named variables.
DECLARE @Bitbucket VARCHAR(8000),
@StartTime DATETIME,
@DurationMS INT
;
--===== Run the tests on each function
RAISERROR('=======================================',0,1) WITH NOWAIT;
RAISERROR('============ Running Tests ============',0,1) WITH NOWAIT;
RAISERROR('=======================================',0,1) WITH NOWAIT;
-- Note that @Bitbucket allows us to take display and disk time out of the equation.
-----------------------------------------------------------------------------------------
SELECT @StartTime = GETDATE();
RAISERROR('============ PatIndex Version',0,1) WITH NOWAIT;
SELECT @Bitbucket = dbo.RemoveLeftCharPatIndex(SomeString, '0')
FROM #TestTable;
SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE())
RAISERROR('Duration ms: %u',0,1,@DurationMS) WITH NOWAIT;
-----------------------------------------------------------------------------------------
SELECT @StartTime = GETDATE();
RAISERROR('============ TrimReplace Version',0,1) WITH NOWAIT;
SELECT @Bitbucket = dbo.RemoveLeftCharTrimReplace(SomeString, '0')
FROM #TestTable;
SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE())
RAISERROR('Duration ms: %u',0,1,@DurationMS) WITH NOWAIT;
-----------------------------------------------------------------------------------------
SELECT @StartTime = GETDATE();
RAISERROR('============ isf Substring/PatIndex',0,1) WITH NOWAIT;
SELECT @Bitbucket = ca.CleanString
FROM #TestTable tt
CROSS APPLY dbo.isfRemoveLeadingCharacter(tt.SomeString,'0') ca;
SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE())
RAISERROR('Duration ms: %u',0,1,@DurationMS) WITH NOWAIT;
-----------------------------------------------------------------------------------------
RAISERROR('======================================',0,1) WITH NOWAIT;
RAISERROR('============ RUN COMPLETE ============',0,1) WITH NOWAIT;
RAISERROR('======================================',0,1) WITH NOWAIT;
Here are the run results from my ten year old single 32 bit 1.8GHz CPU desktop box...
Setting up the test environment...
(1000000 row(s) affected)
=======================================
============ Running Tests ============
=======================================
============ PatIndex Version
Duration ms: 10793
============ TrimReplace Version
Duration ms: 12843
============ isf Substring/PatIndex
Duration ms: 1466
======================================
============ RUN COMPLETE ============
======================================
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply