October 18, 2007 at 1:08 am
Max Yasnytskyy (10/18/2007)[hrAre you trying to say that to solve this sort of problem any additional table is required?
Heck no... there are dozens of slower ways to do it 😀 But, like Serqiy said, a nice little static utility table like this sure does make solving many "impossible" or "gonna need a cursor" problems simply melt away.
Build a permanent Tally table... you'll love it once you learn to use it.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2007 at 6:52 am
Max Yasnytskyy (10/17/2007)
I have found an interesintg function in msdb, seems like it is working much betterSET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[udfTRANSELATE] (@mixedString AS varchar(max))
RETURNS varchar(max)
as
BEGIN
DECLARE @resultString varchar(max), @tmpString char(1), @i int, @strLen int
SELECT @resultString='', @i=0, @strLen=len(@mixedString)
WHILE @strLen>=@i
BEGIN
SELECT @tmpString=substring (@mixedString,@i/*character index*/,1/*one character*/)
if msdb.dbo.ConvertToInt (@tmpString,11,11)=11
SELECT @resultString=@resultString+@tmpString
SELECT @i=@i+1
END
RETURN @resultString
END
this is what i needed. thanx.
the first solution didnt work. because of the Tally i didnt implement.
i'm using 2005. sorry for the cunfusion, didnt notice wrong forum.
ps.
meanwhile the client "revised" some specs, so i do need these numbers now. but i still need to pull them from the original one.
so it will be somethimg like that:
if msdb.dbo.ConvertToInt (@tmpString,11,11)=11
SELECT @resultString=@resultString+@tmpString
else
SELECT @numberString=@numberString+@tmpString
SELECT @i=@i+1
i dont need it as a function because i need it only in one place in the whole code.
October 18, 2007 at 7:53 am
the first solution didnt work. because of the Tally i didnt implement...
ps.
meanwhile the client "revised" some specs, so i do need these numbers now. but i still need to pull them from the original one.
so it will be somethimg like that:
if msdb.dbo.ConvertToInt (@tmpString,11,11)=11
SELECT @resultString=@resultString+@tmpString
else
SELECT @numberString=@numberString+@tmpString
SELECT @i=@i+1
i dont need it as a function because i need it only in one place in the whole code.
You'll still need it as a function... and, if you had invoked the Tally table, it would be easy. 😉 The Tally table makes a great deal of such tasks as simple as falling off your chair when you realize what you can do with it 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2007 at 7:58 am
WEll - since we're now throwing in 2005 - this would be the perfect time to create your dbo.RegexReplace CLR function, as so:
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Partial Public Class UserDefinedFunctions
<SqlFunction> _
Public Shared Function RegexReplace(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal rep As SqlString) As SqlString
' Add your code here
Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)
Return New SqlString(rex.Replace(New String(input.Value), rep.Value))
End Function
End Class
And then the whole "cleaning process" becomes:
declare @dirtystring varchar(500)
declare @cleanstring varchar(500)
declare @rex varchar(100)
select @dirtystring='123k4biodf-k123l4h123098d-09dfknqwerklhsdfo0er98tqwer/;/;leriuqwer-t0'
select @rex='[^0-9]'
select @cleanstring=dbo.regexreplace(@dirtystring,@rex,'')
select @cleanstring
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 18, 2007 at 8:11 am
Jeff Moden (10/18/2007)
the first solution didnt work. because of the Tally i didnt implement...
ps.
meanwhile the client "revised" some specs, so i do need these numbers now. but i still need to pull them from the original one.
so it will be somethimg like that:
if msdb.dbo.ConvertToInt (@tmpString,11,11)=11
SELECT @resultString=@resultString+@tmpString
else
SELECT @numberString=@numberString+@tmpString
SELECT @i=@i+1
i dont need it as a function because i need it only in one place in the whole code.
You'll still need it as a function... and, if you had invoked the Tally table, it would be easy. 😉 The Tally table makes a great deal of such tasks as simple as falling off your chair when you realize what you can do with it 😀
for now i'm happy if it just works. and it doesnt slow the whole stored procedure considerably.
maybe later i'll learn about those tally tables. 🙂
October 18, 2007 at 7:57 pm
Now, that looks like a good use for a CLR... how's the performance? (I can't test it... don't have 2k5 available)...
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2007 at 8:19 pm
I don't have a HUGE database on hand with relevant data, but on 50K records...maybe a second.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 18, 2007 at 8:55 pm
Matt Miller (10/18/2007)
I don't have a HUGE database on hand with relevant data, but on 50K records...maybe a second.
Thanks, Matt. Like I've said so many times, I don't have 2k5 to test on (gotta break down and install the Dev version)... Would you be willing to try it on my "performance test table"? This script builds a million row test table... run the CLR on "SomeCSV" column, please... I didn't include any non-breaking spaces in the code this time... 😉
--===== Create and populate a 1,000,000 row test table.
-- Column RowNum has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Takes about 80 seconds to execute.
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
Sure do appreciate it... these types of tests help me to know which direction to steer especially since your CLR is one of the few uses I've seen that even come close to justifying the use of a CLR... 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2007 at 10:23 pm
ACtually - I'm way ahead of ya. That last response felt a bit like a cop-out, so I went and built my own "big table. For lack of a better way to do it - I decided to go out and "clean" 1M NEWID()'s of all of their numeric characters.
For anyone who looked at my earlier post - one of the CLR instructions got "whacked" from the first paste. Grab it again. Testing on a XP machine running Dev.Ed. (1GB Total RAM on the machine), so there might be some amount of RAM-bound problems with the result, but...I think they speak for themselves anyway.RAM doesn't seem to be a HUGE issue, no spikes for it at any phase.
Testing script
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbltest](
[gid] [uniqueidentifier] NOT NULL CONSTRAINT [DF_tbltest_gid] DEFAULT (newid()),
[num] [bigint] NOT NULL
) ON [PRIMARY]
CREATE UNIQUE CLUSTERED INDEX tbltest_Clust on tbltest(gid)
--fill with data
insert tbltest (num)
select t2.num from master.dbo.spt_values cross join spt_values t2 --5.5M records, now with a GUID
--now - try the test
use test
go
declare @t varchar(100)
declare @g datetime
--just get a count of our "victims"
select @g=getdate()
select count(*) from t2 where cast(gid as varchar(100)) between 'a' and 'd'
select 'regular count',datediff(ms,@g,getdate())
--Regex method - don't DISPLAY the update - just cycle through them
select @g=getdate()
select @t=dbo.regexreplace(gid,'[0-9]','') from t2
where cast(gid as varchar(100)) between 'a' and 'd'
select 'regex replace',datediff(ms,@g,getdate())
--Derek's method - don't DISPLAY the update - just cycle through them
select @g=getdate()
select @t=dbo.udfTRANSELATE(gid) from t2
where cast(gid as varchar(100)) between 'a' and 'd'
select 'dereks solution', datediff(ms,@g,getdate())
results:
Count: 1,033,762
Regular count 6,483 (6s - as you'd expect)
Regex replace 91,486 (1 min. 31.486 s)
dereks solution 2,512,687 (41 min.52.687 s)
Improvement 27 times faster
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 18, 2007 at 10:41 pm
Matt, just for fun - compare the memory usage when you run CLR udf and udfTRANSELATE
October 18, 2007 at 10:53 pm
sqlServer had 256MB in use at all times. Didn't ask for more (100MB stayed "free" the whole time).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 18, 2007 at 11:02 pm
As a general remark - i had a problem with CLR function which was executed about 200000 times a day and was building XML strings up to 300 characters and then using them as commands when remoting to the custom message queue server. SQL Server was running on 16 CPU x64 server with 16 GB or RAM.
Because of the way .net garbage collector works and intensive processing of strings, which are as we all know immutable and every modification of the string essentially creates the new string, .net bit of SQL server was running out of memory every so often, freezing the whole SQL instance for up to 2 minutes.
October 19, 2007 at 1:24 am
Ok everyone... pay very close attention... I'm going to say it again until you get it...
[font="Arial Black"]CLR's are for people who don't know T-SQL[/font]
First, Matt... thank you, Sir, for providing the CLR test. Absolutely none of my current piss off about people insisting that CLR's are actually useful is directed at you. You've been most helpful in testing that which I could not. Thank you again.
On to war... 😉
As a review... Matt's stats (I'm a poet and don't know it :Whistling: ) from his fine CLR experiment are as follows; notice the duration for the "Regex Replace" which is the result of the CLR used...
results:
Count: 1,033,762
Regular count 6,483 (6s - as you'd expect)
Regex replace 91,486 (1 min. 31.486 s)
dereks solution 2,512,687 (41 min.52.687 s)
Improvement 27 times faster
Now, there are some minor differences between the system tables in SQL Server 2k5 (which Matt used) and SQL Sever 2k, but I've replicated his test table and run a test on 5.5 Million rows just like he did. The difference is, I used my Tally Table driven function... as a reminder, here's the code for that function...
CREATE FUNCTION fnDeleteNumbers (@String VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Return VARCHAR(8000)
SET @Return = ''
SELECT @Return = @Return+SUBSTRING(@String,N,1)
FROM dbo.Tally WITH (NOLOCK)
WHERE N <= LEN(@String)
AND SUBSTRING(@String,N,1) LIKE '%[^0-9]%'
RETURN @Return
END
...and here's the test code I used... builds the same size table with the same columns, the same data types, and the same index...
--===== Setup the environment
SET NOCOUNT ON
IF OBJECT_ID('dbo.tblTest','U') IS NOT NULL
DROP TABLE dbo.tblTest
--===== Declare a couple of local variables like Matt did
DECLARE @T VARCHAR(100)
DECLARE @G DATETIME
--===== Create the test table (using SQL Server 2000)
SET @G = GETDATE()
SELECT TOP 5500000
IDENTITY(BIGINT,1,1) AS Num,
NEWID() AS GID
INTO dbo.TblTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to create test table (hh:mi:ss:mmm)'
--===== Create the same index as Matt did
SET @G = GETDATE()
CREATE UNIQUE CLUSTERED INDEX TblTest_Clust ON TblTest(GID)
SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to create clustered index (hh:mi:ss:mmm)'
--===== As Matt said, "Get a count of the victims"
SET @G = GETDATE()
SELECT COUNT(*) AS [Count of "Victims"] FROM TblTest
WHERE CAST(GID AS VARCHAR(100)) BETWEEN 'A' AND 'D'
SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to create clustered index (hh:mi:ss:mmm)'
--===== Start the duration timer and do the test...
-- As Matt said, "don't DISPLAY the update - just cycle through them"
SET @G = GETDATE()
SELECT @T = dbo.fnDeleteNumbers(GID)
FROM TblTest WHERE CAST(GID AS VARCHAR(100)) BETWEEN 'A' AND 'D'
SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to Tally Table replace (hh:mi:ss:mmm)'
Now... this is where you really need to pay attention... here's the results from the code run above... pay really close attention to the RED text and compare it to the duration for Matt's "Regex Replace" time from the beginning of this message...
-----------------------------------------------------
00:00:21:203 Time to create test table (hh:mi:ss:mmm)
----------------------------------------------------------
00:00:54:267 Time to create clustered index (hh:mi:ss:mmm)
Count of "Victims"
------------------
1030347
----------------------------------------------------------
00:00:07:233 Time to create clustered index (hh:mi:ss:mmm)
-------------------------------------------------------
[font="Arial Black"]00:01:29:093 Time to Tally Table replace (hh:mi:ss:mmm)[/font]
In case you can't figure it out... the T-SQL/Tally Table solution beat the CLR solution by almost 2 and a half seconds!!!!
So, let me say it one more time just in case it hasn't sunk in yet...
[font="Arial Black"]CLR's are for people who don't know T-SQL[/font] 😛 😀 😉
And, allow me to introduce my new "tag line":
"Before you think outside the box, consider how good the box you're in is!" :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
October 19, 2007 at 4:54 am
so...
Your saying CLR's are for people who know sql, or people who don't know sql...I'm confused. 😉
Lowell
October 19, 2007 at 6:34 am
Max Yasnytskyy (10/18/2007)
Because of the way .net garbage collector works and intensive processing of strings, which are as we all know immutable and every modification of the string essentially creates the new string, .net bit of SQL server was running out of memory every so often, freezing the whole SQL instance for up to 2 minutes.
I figured that was the general direction you were heading in. A lot of that's improved with the newer updates to the framework. Certainly after them and SQL2005 SP2 - seems to not happen a lot.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 31 through 45 (of 172 total)
You must be logged in to reply to this topic. Login to reply