February 5, 2009 at 4:34 pm
Actually .NET has very robust exception handling abilities, and if you implement them along with TSQL exception handling there shouldn't be problems.
Right, a missing "catch" in an assembly is a pain!
Describe a stability problem that you have encountered in a SAFE/EXTERNAL_ACCESS assembly implementation? If you go UNSAFE all bets are off, but SAFE/EXTERNAL_ACCESS cannot cause you stability issues.
You also have to run this kind of code with UNSAFE access, so it really is all about how you control what you put into your SQL Server.
I am not talking about UNSAFE. Not all developers have solid programming skills to write something that "scales". Putting bad code on the server is not a good thing. I do understand that "with power comes responsibility" but is a lot harder to do it right in a busy server.
Personally I had to deal with (mis)use cases related with large memory allocations.
SEE: http://blogs.msdn.com/sqlclrperf/archive/2007/06/01/scalable-memory-usage-in-sqlclr.aspx
It is NOT pretty to get GC running and you have no control or very little over it 🙁
Describe a problem caused by a SAFE/EXTERNAL_ACCESS assembly in SQL Server and how it was harder to deal with.
It is harder for example to replace an under-performing assembly than an stored procedure across multiple servers for example.
* Noel
February 6, 2009 at 8:40 am
noeld (2/5/2009)
I am not talking about UNSAFE. Not all developers have solid programming skills to write something that "scales". Putting bad code on the server is not a good thing. I do understand that "with power comes responsibility" but is a lot harder to do it right in a busy server.
Personally I had to deal with (mis)use cases related with large memory allocations.
SEE: http://blogs.msdn.com/sqlclrperf/archive/2007/06/01/scalable-memory-usage-in-sqlclr.aspx
It is NOT pretty to get GC running and you have no control or very little over it 🙁
You get the same performance problems in SQLCLR as you could in TSQL from a developer not having the knowledge on how to efficiently do things. You might spot the TSQL problem in code earlier, but only because you have more experience with TSQL.
If your objects implement IDisposible, then you should be coding with "using" blocks, which trigger GC when they fall out of scope. This is one of the recommended practices with coding in SQLCLR. Also whereever the same object is consistently declared the same way it is best to evaluate whether it could be better used as a static readonly object, where it is only created once by the system and can be reused consistently. For example common Regular Expression objects.
If you use expensive objects like datasets in SQLCLR, replace them with DataReaders and work itteratively, which is not bad for performance in CLR like it is in TSQL.
It is harder for example to replace an under-performing assembly than an stored procedure across multiple servers for example.
How so? If you rely on Visual Studio's deploy processes to deploy your assemblies, then stop and learn how to use manual scripting which will allow you to ALTER ASSEMBLY just like you ALTER PROCEDURE. Changing an assembly out is no more difficult than changing a stored procedure out if you know the DDL required to perform the operations.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 1:18 pm
To me the SQLCLR vs. xp_cmdshell seems like the old script vs. small application debate. Seems that SQLCLR is very similar to writing a small application. I mean you're writing code, building an assembly, mapping it to the sql server. Seems like a lot of overhead to get a directory listing into a table. Granted I do like the idea of passing table variables and the such are nice and I can see for operations that are impossible within TSQL the need. But, I can see how doing all of this is like hitting a finishing nail with a sledge hammer. Again, to me it comes down to what am I really trying to do, and how much time I have to do that task. Lets face it there are situations where time is of the essence and you don't have time to create a new assembly and "hook" it to your sql server.
My other question is how does this differer from creating a dll that is mapped to an extended stored procedure?
Sean
February 6, 2009 at 9:19 pm
Don, here is a VERY good example on why CLR's sometimes ARE the way to go:
SQL Server CLR function to improve performance of validating email addresses
Lordy, here we go again... someone takes a really horrible example of some horribly written SQL and supposedly tests it against a CLR and supposedly comes up with some wonderful execution time and, as near as I can tell, none of you have actually tested it.
I've said it before and I'll say it again, CLR's are for people who don't know how to write T-SQL. Just look at the example code in that link...
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[udf_Txt_IsEmail] (
@EmailAddr varchar(255) -- Email address to check
) RETURNS BIT -- 1 if @EmailAddr is a valid email address
/*
* Checks an text string to be sure it's a valid e-mail address.
* Returns 1 when it is, otherwise 0.
* Example:
SELECT CASE WHEN 1=dbo.udf_Txt_IsEmail('anovick@NovickSoftware.com')
THEN 'Is an e-mail address' ELSE 'Not an e-mail address' END
*
* Test:
print case when 1=dbo.udf_txt_isEmail('anovick@novicksoftware.com')
then 'Passes' else 'Fails' end + ' test for good addr'
print case when 0=dbo.udf_txt_isEmail('@novicksoftware.com')
then 'Passes' else 'Fails' end + ' test for no user'
print case when 0=dbo.udf_txt_isEmail('anovick@n.com')
then 'Passes' else 'Fails' end + ' test for 1 char domain'
print case when 1=dbo.udf_txt_isEmail('anovick@no.com')
then 'Passes' else 'Fails' end + ' test for 2 char domain'
print case when 0=dbo.udf_txt_isEmail('anovick@.com')
then 'Passes' else 'Fails' end + ' test for no domain'
print case when 0=dbo.udf_txt_isEmail('anov ick@novicksoftware.com')
then 'Passes' else 'Fails' end + ' test for space in name'
print case when 0=dbo.udf_txt_isEmail('ano#vick@novicksoftware.com')
then 'Passes' else 'Fails' end + ' test for # in user'
print case when 0=dbo.udf_txt_isEmail('anovick@novick*software.com')
then 'Passes' else 'Fails' end + ' test for * asterisk in domain'
****************************************************************/
AS BEGIN
DECLARE @AlphabetPlus VARCHAR(255)
, @max-2 INT -- Length of the address
, @Pos INT -- Position in @EmailAddr
, @OK BIT -- Is @EmailAddr OK
-- Check basic conditions
IF @EmailAddr IS NULL
OR NOT @EmailAddr LIKE '_%@__%.__%'
OR CHARINDEX(' ',LTRIM(RTRIM(@EmailAddr))) > 0
RETURN(0)
SELECT @AlphabetPlus = 'abcdefghijklmnopqrstuvwxyz01234567890_-.@'
, @max-2 = LEN(@EmailAddr)
, @Pos = 0
, @OK = 1
WHILE @Pos < @max-2 AND @OK = 1 BEGIN
SET @Pos = @Pos + 1
IF NOT @AlphabetPlus LIKE '%'
+ SUBSTRING(@EmailAddr, @Pos, 1)
+ '%'
SET @OK = 0
END -- WHILE
RETURN @OK
END
go
GRANT EXEC on dbo.udf_txt_isEmail to PUBLIC
go
It's not the most efficient of code. That's why I have to update it.
Even the author of the code admits it's not optimal. In fact, it's RBAR on steroids.
But, even at that, on a 10,000 row test to find all bad emails, it certainly does NOT take over 14 seconds to run. I'll prove THAT in a minute on my poor ol' 7 year old, single 1.8 Ghz CPU with only a Gig of RAM.
Second of all, this particular check can be done so easily that it's actually an insult to common sense and the resources of the machine to use a UDF. Yeah, yeah... I know... "Convenient encapsulation for developers". What you really mean is "Convenient encapsulation for developers that don't have a clue" and that's why people use some UDF's and it's certainly why they resort to a bloody CLR.
I can't test the CLR from that horrible article, but I'll make it easy for someone who can to compare the actual code all at once...
First, we need 10,000 rows of test data... that's easy... most of the rows in the following are "good" and there's some broken email addresses... read the comments in the code that follows...
SET NOCOUNT ON
DROP TABLE JBMTest
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "EMailAddr" has good email addresses. We'll break some later.
-- Jeff Moden
SELECT TOP 10000
RowNum = IDENTITY(INT,1,1),
EMailAddr = CAST('simpletest@someemail.com' AS VARCHAR(255)),
Status = CAST('Good' AS VARCHAR(50))
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Break some known email addresses
UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novicksoftware.com', Status = 'Good' WHERE RowNum = 1000
UPDATE dbo.JBMTest SET EMailAddr = '@novicksoftware.com', Status = 'Bad No User' WHERE RowNum = 2000
UPDATE dbo.JBMTest SET EMailAddr = 'anovick@n.com', Status = 'Bad 1 Char Domain' WHERE RowNum = 3000
UPDATE dbo.JBMTest SET EMailAddr = 'anovick@no.com', Status = 'Good 2 Char Domain' WHERE RowNum = 4000
UPDATE dbo.JBMTest SET EMailAddr = 'anovick@.com', Status = 'Bad No Domain' WHERE RowNum = 5000
UPDATE dbo.JBMTest SET EMailAddr = 'anov ick@novicksoftware.com',Status = 'Bad Space in Name ' WHERE RowNum = 6000
UPDATE dbo.JBMTest SET EMailAddr = 'ano#vick@novicksoftware.com',Status = 'Bad # in Name' WHERE RowNum = 7000
UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novick*software.com',Status = 'Bad * in Name' WHERE RowNum = 8000
UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novick_software.com',Status = 'Good _ in Name' WHERE RowNum = 9000
GO
Now, let's re-write the function with some common sense... something that understands the power of NULLs and something that, in and of itself, has no explicit RBAR...
CREATE FUNCTION dbo.udf_Txt_IsEmail_Improved
-- Written by Jeff Moden
(
@EmailAddr VARCHAR(255)
)
RETURNS BIT
AS
BEGIN
--===== Declare obviously name variables
DECLARE @OK BIT
SET @OK = 0
--===== Check the address for form and content
SELECT @OK = 1
WHERE @EmailAddr LIKE '_%@__%.__%'
AND @EmailAddr NOT LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'
RETURN @OK
END
GO
GRANT EXEC ON dbo.udf_txt_isEmail_Improved to PUBLIC
GO
Let's go one step further, eh? Let's just do it right because this code is SO simple, that you don't even need a UDF for this...
--===== My test without a UDF because THE CODE IS THAT SIMPLE!!!!!!!
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
SELECT *
FROM dbo.JBMTest
WHERE EmailAddr NOT LIKE '_%@__%.__%'
OR EmailAddr LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'
PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms without a UDF'
GO
Alright... we've got test data... we have the function from the article... we have my function... and we have some nice simple set-based code that blows all of the above away...
--===== The looping UDF test
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
SELECT *
FROM dbo.JBMTest
WHERE dbo.udf_txt_isEmail(EMailAddr) = 0
PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms for Looping UDF'
GO
--===== My set-based UDF test
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
SELECT *
FROM dbo.JBMTest
WHERE dbo.udf_txt_isEmail_Improved(EMailAddr) = 0
PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms for my UDF'
GO
--===== My test without a UDF because THE CODE IS THAT SIMPLE!!!!!!!
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
SELECT *
FROM dbo.JBMTest
WHERE EmailAddr NOT LIKE '_%@__%.__%'
OR EmailAddr LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'
PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms without a UDF'
GO
When I run that bit of computational heaven against the test data I built above, here's the results I get...
1263 ms for Looping UDF
356 ms for my UDF
233 ms without a UDF
Like I said... I don't know where the hell the author of that article get's off saying that his looping UDF takes over 14 seconds to run, but he was wrong even in that!!! :crazy:
I'll also plug set-based code here a minute. Lots of folks keep saying they don't have the time to figure out set based code and/or that it makes for much longer, more complicated code. I believe that, especially in this case, we just proved that's a huge load hooie 😛 The set based code is both easier to read and understand than the While loop UDF and it's also a heck of a lot shorter. In fact, it's so short and easy to do that anyone who puts it in a UDF needs to be shot in the head with a volley of high velocity, bone-in, frozen pork chops! 😀
Ok... now let's just say that the CLR that was included in the article actually comes in at 53 milliseconds like the article claims. First of all, I don't trust anything in that article because the time for the UDF was so very far off that I don't trust the authors methods of testing. Second, look at all the crap that can happen with a CLR. What crap? Look back at this thread! That's what! Look at all the things that can go wrong! Even Jonathan, as much of an expert on the subject he is, missed something. Add the overhead of CLR's to that, and ya just gotta ask yourself, is it all worth it?
Somebody, please! Test the CLR and the code I just posted all on one machine! Let's see what really gives!
Now, about this directory stuff and using xp_CmdShell and parsing directory listings that could change... again, it all sounds like a good idea, but it's not. Someone already stated that the format of the DOS DIRectory listing could change and that would break the code, so I'm thinkin' that's a pretty bad idea to try and parse it. And, it's for good reason that a lot of DBA's don't want xp_CmdShell to be used. But, what harm is there if the server is an ETL computer with no public facing interfaces?
But, let's just assume the DBA stands firm (and they have the right to... they've been charged with protecting the data and the environment it's in!). Will the DBA allow the use of OLE automation? Even if not, if you explain that you can move a directory listing into a table with OLE automation instead of a CLR, I'll just be that (s)he will get all cooperative really sudden like. Something like this should do...
-- CREATE PROCEDURE dbo.GetFileDetails
-- --===== Declare the I/O parameters
-- @piFullPath VARCHAR(128)
-- AS
declare @piFullPath VARCHAR(128)
set @piFullPath = 'c:\temp\'
SET NOCOUNT ON
--=======================================================================================
-- Local variables
--=======================================================================================
--===== These are processing control and reporting variables
DECLARE @Counter INT --General purpose counter
DECLARE @CurrentName VARCHAR(256) --Name of file currently being worked
DECLARE @DirTreeCount INT --Remembers number of rows for xp_DirTree
DECLARE @ErrorMessage VARCHAR(255) --the potential error message
DECLARE @ErrorObject INT --the error object
DECLARE @HR INT --the HRESULT returned from
DECLARE @IsFile BIT --1 if Name is a file, 0 if not
--===== The are object "handle" variables
DECLARE @ObjFile INT --File object
DECLARE @ObjFileSystem INT --File System Object
--===== These variable names match the sp_OAGetProperty options
-- Made names match so they're less confusing
DECLARE @Attributes INT --Read only, Hidden, Archived, etc
DECLARE @DateCreated DATETIME --Date file was created
DECLARE @DateLastAccessed DATETIME --Date file was last read
DECLARE @DateLastModified DATETIME --Date file was last written to
DECLARE @Name VARCHAR(128) --File Name and Extension
DECLARE @Path VARCHAR(128) --Full path including file name
DECLARE @ShortName VARCHAR(12) --8.3 file name
DECLARE @ShortPath VARCHAR(100) --8.3 full path including file name
DECLARE @Size INT --File size in bytes
DECLARE @Type VARCHAR(100) --Long Windows file type (eg.'Text Document',etc)
--=======================================================================================
-- Create temporary working tables
--=======================================================================================
--===== Create a place to store all file names derived from xp_DirTree
IF OBJECT_ID('TempDB..#DirTree','U') IS NOT NULL
DROP TABLE #DirTree
CREATE TABLE #DirTree
(
RowNum INT IDENTITY(1,1),
Name VARCHAR(256) PRIMARY KEY CLUSTERED,
Depth BIT,
IsFile BIT
)
--===== Create a place to store the file details
IF OBJECT_ID('TempDB..#FileDetails','U') IS NOT NULL
DROP TABLE #FileDetails
CREATE TABLE #FileDetails
(
RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Name VARCHAR(128), --File Name and Extension
Path VARCHAR(128), --Full path including file name
ShortName VARCHAR(12), --8.3 file name
ShortPath VARCHAR(100), --8.3 full path including file name
DateCreated DATETIME, --Date file was created
DateLastAccessed DATETIME, --Date file was last read
DateLastModified DATETIME, --Date file was last written to
Attributes INT, --Read only, Hidden, Archived, etc
Size INT, --File size in bytes
Type VARCHAR(100) --Long Windows file type (eg.'Text Document',etc)
)
--=======================================================================================
-- Get all the file names for the directory (includes directory names as IsFile = 0)
--=======================================================================================
--===== Get the file names for the desired path
INSERT INTO #DirTree (Name, Depth, IsFile)
EXEC Master.dbo.xp_Dirtree @piFullPath,1,1 -- Current diretory only, list file names
-- Remember the row count
SET @DirTreeCount = @@ROWCOUNT
--===== Update the file names with the path for ease of processing later on
UPDATE #DirTree
SET Name = @piFullPath + Name
select * from #dirtree
--=======================================================================================
--
--=======================================================================================
--===== Create a file system object and remember the "handle"
EXEC @HR = dbo.sp_OACreate 'Scripting.FileSystemObject', @ObjFileSystem OUT
SET @Counter = 1
WHILE @Counter <= @DirTreeCount
BEGIN
--===== Get the current name and see if it's a file
SELECT @CurrentName = Name,
@IsFile = IsFile
FROM #DirTree
WHERE RowNum = @Counter
--===== If it's a file, get the details for it
IF @IsFile = 1 AND @CurrentName LIKE '%%'
BEGIN
--===== Create an object for the path/file and remember the "handle"
EXEC dbo.sp_OAMethod @ObjFileSystem,'GetFile', @ObjFile OUT, @CurrentName
--===== Get the all the required attributes for the file itself
EXEC dbo.sp_OAGetProperty @ObjFile, 'Path', @Path OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'ShortPath', @ShortPath OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'Name', @Name OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'ShortName', @ShortName OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'DateCreated', @DateCreated OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'DateLastAccessed', @DateLastAccessed OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'DateLastModified', @DateLastModified OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'Attributes', @Attributes OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'Size', @Size OUT
EXEC dbo.sp_OAGetProperty @ObjFile, 'Type', @Type OUT
INSERT INTO #FileDetails
(Path, ShortPath, Name, ShortName, DateCreated, DateLastAccessed, DateLastModified, Attributes, Size, Type)
SELECT @Path,@ShortPath,@Name,@ShortName,@DateCreated,@DateLastAccessed,@DateLastModified,@Attributes,@Size,@Type
END
SELECT @Counter = @Counter + 1
END
EXEC sp_OADestroy @ObjFileSystem
EXEC sp_OADestroy @ObjFile
SELECT * FROM #FileDetails
Major kudos to Robyn and Phil for posting the basis of that code on Simple-Talk.
I used an undocumented extended stored procedure because it was easier than settingh up the loop to find the file names in a directory (Oh my! I have sinned! Twice! ;)) Lemme ask you... who amongst you hasn't used an undocumented stored procedure? Who amongst you has a DBA that hasn't? Who amongst you think that undocumented stored procedures are more subject to change than fully documented ones? The required privs on very heavily document sp_MakeWebTask changed from "dbo" in 2k sp3a to "SA" in 2k sp4 with no warning of the change until the change actually came out. Sure, there was some warning... but that doesn't change the fact that a lot of code suddenly broke and that simple proxy tricks needed to be incorporated to fix the now broken code. And, to top it all off, xp_Dirtree is available and working fine in 2k, 2k5, and 2k8. If you really don't like it, take the time to get the file names using OLE automation, instead.
Just a little more and I'll get off the ol' soap box... the posts no longer seem to be available after the new forum move that SSC made a while back, but prior to that, a couple of folks (Matt Miller and Sergiy in particular) took on all comers. We even had contests between ourselves. The ONLY case where CLR's even came close to beating T-SQL was with RegEx... the idea of checking email addresses certainly qualifies as a good need for RegEx... but, ya gotta ask yourself... if you can check 10,000 email addresses in 356 milliseconds in T-SQL, is it worth going the CLR route? How about checking a million in "[font="Courier New"]21813 ms without a UDF[/font]" without a CLR?
Don't get taken in with all the false glitter surrounding CLR's and their usage. If you're going to write code against databases, spend some time learning how to do it right. With the exception of RegEx, damned near all CLR's are simply not the right way to do it. And, consider this... isn't it a bit of an oxymoron to let someone who can't write database code use another tool to try to do it? Get a real database developer, instead. :)
By the way, in SQL Server 2005, my production code for the DIRectory code [font="Arial Black"]doesn't [/font]have a While loop in it... send me a case of beer and I'll show you how. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 6, 2009 at 9:33 pm
Ok, what did I miss?? I don't quite know what I missed? If it was the exception from providing a bad path... I didn't miss that, like I said to Phil, I wouldn't provide a bad path in code that uses this, so it wasn't necessarily missed as much as it wasn't important to me.
Now what I find really funny is you preach about RBAR and non-looping methods, and your alternative to a CLR solution here is to loop with a OLE Automation call? You might compare memory consumption between your solution and the CLR in this article before you posit that it is a a better solution. Either way, you have to enable a disabled by default operation on the SQL Server. You can leak memory with OLE automation much faster than with SQLCLR, and to the point that it can cause server instability, don't believe me, just forget a sp_OADestroy call.
The choice for CLR doesn't have to always be about performance. I am sure that you would argue a 200ms difference over 100000 rows in speed between a CLR function and your TSQL function is worthy, and proves CLR is not as efficient. You leave the company and another developer has to touch that function and it takes them hours/days to figure it out, edit and test it, did you save something? If you are talkin ms difference and it simplifies code sufficiently to save cost overall in application management then CLR definately provides added benefit to cover the difference. Precious few would even notice the difference when you are in the ms ranges over large datasets.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 9:43 pm
Phil Factor (2/5/2009)
Oooh. This is getting very useful. I'd missed out on the advantages of passing string parameters as SQLStrings and the trick of passing back a NULL. It makes perfect sense. Now it has the behaviour that one would want.
Phil,
We can actually go a bit further with this if we want and use a custom struct to build a list off of and still get the Directory not Found result row in the set. Try the attached install script. To see the code for the struct and the modified output, use the select from sys.assembly_files listed previously in this thread.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 9:54 pm
Jonathan Kehayias (2/6/2009)
Ok, what did I miss?? I don't quite know what I missed? If it was the exception from providing a bad path... I didn't miss that, like I said to Phil, I wouldn't provide a bad path in code that uses this, so it wasn't necessarily missed as much as it wasn't important to me.
That was it and I agree... someone like you wouldn't pass a bad path.
Now what I find really funny is you preach about RBAR and non-looping methods, and your alternative to a CLR solution here is to loop with a OLE Automation call? You might compare memory consumption between your solution and the CLR in this article before you posit that it is a a better solution. Either way, you have to enable a disabled by default operation on the SQL Server. You can leak memory with OLE automation much faster than with SQLCLR, and to the point that it can cause server instability, don't believe me, just forget a sp_OADestroy call.
Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.
The choice for CLR doesn't have to always be about performance. I am sure that you would argue a 200ms difference over 100000 rows in speed between a CLR function and your TSQL function is worthy, and proves CLR is not as efficient. You leave the company and another developer has to touch that function and it takes them hours/days to figure it out, edit and test it, did you save something?
Heh... you mean yet another Developer who can't read simple T-SQL? Look at the code again... even a complete newbie can figure it out in a couple of minutes if they use Books Online. And, if they can't, they really shouldn't be messing with a database by any means.
If you are talkin ms difference and it simplifies code sufficiently to save cost overall in application management then CLR definately provides added benefit to cover the difference. Precious few would even notice the difference when you are in the ms ranges over large datasets.
So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR... here's a reminder...
/*
USE [master]
GO
DROP FUNCTION [dbo].[os_directory_info]
DROP ASSEMBLY SQLCLRNet_DirectoryBrowser
DROP USER SQLCLRNet_ExampleLogin
DROP LOGIN SQLCLRNet_ExampleLogin
DROP ASYMMETRIC KEY SQLCLRNet_ExampleKey
GO
*/
USE [master]
GO
/****** Object: SqlAssembly [SQLCLRNet_DirectoryBrowser] Script Date: 01/23/2009 22:19:49 ******/
IF EXISTS (SELECT * FROM sys.assemblies asms WHERE asms.name = N'SQLCLRNet_DirectoryBrowser')
DROP ASSEMBLY [SQLCLRNet_DirectoryBrowser]
GO
/****** Object: SqlAssembly [SQLCLRNet_DirectoryBrowser] Script Date: 01/23/2009 22:19:49 ******/
CREATE ASSEMBLY [SQLCLRNet_DirectoryBrowser]
AUTHORIZATION [dbo]
FROM 0x ... really big binary number here...
WITH PERMISSION_SET = SAFE
GO
-- Create the Asymmetric Key from the Assembly
CREATE ASYMMETRIC KEY SQLCLRNet_ExampleKey
FROM ASSEMBLY [SQLCLRNet_DirectoryBrowser]
GO
-- Create the Login from the Asymmetric Key
CREATE LOGIN SQLCLRNet_ExampleLogin
FROM ASYMMETRIC KEY SQLCLRNet_ExampleKey
GO
-- Grant the External Access Priviledge to the Login
GRANT EXTERNAL ACCESS ASSEMBLY TO SQLCLRNet_ExampleLogin
GO
-- Create the database user for Authorization on the Assembly
CREATE USER SQLCLRNet_ExampleLogin FOR LOGIN SQLCLRNet_ExampleLogin
GO
-- Set Authorization to the Database User
ALTER AUTHORIZATION ON ASSEMBLY::[SQLCLRNet_DirectoryBrowser] TO SQLCLRNet_ExampleLogin
GO
-- Set the Assembly for External Access
ALTER ASSEMBLY [SQLCLRNet_DirectoryBrowser] WITH PERMISSION_SET = EXTERNAL_ACCESS
GO
-- Create the TSQL Function that maps to the Assembly
CREATE FUNCTION [dbo].[os_directory_info](@path [nvarchar](max), @filter [nvarchar](100) = null)
RETURNS TABLE (
[name] [nvarchar](max) NULL,
[is_directory] [bit] NULL,
[size_in_bytes] [bigint] NULL,
[create_date] [datetime] NULL,
[last_written_to] [datetime] NULL,
[last_accessed] [datetime] NULL,
[attributes] [nvarchar](max) NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [SQLCLRNet_DirectoryBrowser].[UserDefinedFunctions].[os_directory_info]
--Jeff Moden
Change is inevitable... Change for the better is not.
February 6, 2009 at 10:01 pm
Jeff Moden (2/6/2009)
When I run that bit of computational heaven against the test data I built above, here's the results I get...
1263 ms for Looping UDF
356 ms for my UDF
233 ms without a UDF
Like I said... I don't know where the hell the author of that article get's off saying that his looping UDF takes over 14 seconds to run, but he was wrong even in that!!! :crazy:
Add the overhead of CLR's to that, and ya just gotta ask yourself, is it all worth it?
Somebody, please! Test the CLR and the code I just posted all on one machine! Let's see what really gives!
I didn't use his CLR method directly because it has an inherent flaw in its usage of Regex in it that I wouldn't make in my own code. Instead I used a functionally identical function:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
private static readonly Regex emailRegex = new Regex(@"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBoolean EmailRegex(SqlString email)
{
// Put your code here
return new SqlBoolean(emailRegex.IsMatch(email.Value));
}
};
Using the static readonly member means the Regex object is built once and reused which is a bit more efficient memory wise.
The results over 5 test runs using Jeff's code versus the above on my Dell Dual Core 2.0 Ghz Laptop with 4GB RAM:
533 ms for Looping UDF
186 ms for my UDF
123 ms without a UDF
63 ms for CLR UDF
I must have a better computer than Jeff does, but all numbers are roughly 1/2 of his projected times so it should show the numbers are all fair.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 10:15 pm
Jeff Moden (2/6/2009)
Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.
Not true. I've seen this happen in consulting twice for OLE automation and a few times for XML documents that get created but not removed. Its not that difficult of a mistake to make. Leaking in CLR is only possible with UNSAFE assemblies, as a part of the design of the hosting environment. The ability to leak memory requires Host Protection Attributes that are prevented from loading in anything but UNSAFE.
So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR.
You could do everyone a favor and just reference them to the text file rather than posting the varbinary output string of a compiled assembly in here and making this page of the forums unusable because it doesn't know how to handle a string that long. You aren't comparing similar code, by looking at the binary compiled assembly object. The code is the C# and should the following connect item get added to SQL Server:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=265266
the ability to write code similar to the following will exist in SQL:
CREATE PROCEDURE my_proc @par1 int,
@par2 int WITH
EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS
using System;
using System.Data;
using System.Data.SqlClient;
...
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 10:26 pm
sean hawkes (2/6/2009)
To me the SQLCLR vs. xp_cmdshell seems like the old script vs. small application debate. Seems that SQLCLR is very similar to writing a small application. I mean you're writing code, building an assembly, mapping it to the sql server. Seems like a lot of overhead to get a directory listing into a table. Granted I do like the idea of passing table variables and the such are nice and I can see for operations that are impossible within TSQL the need. But, I can see how doing all of this is like hitting a finishing nail with a sledge hammer. Again, to me it comes down to what am I really trying to do, and how much time I have to do that task. Lets face it there are situations where time is of the essence and you don't have time to create a new assembly and "hook" it to your sql server.My other question is how does this differer from creating a dll that is mapped to an extended stored procedure?
Sean
This solution is based on the premise that security is important enough to take the time to implement. If time is more important than preventing unrestricted access to the OS as the SQL Service account with xp_cmdshell, then this solution probably isn't for you.
In the words of Mr Miyagi, "Different, but same". This is different from an extended stored procedure because it is in managed code which is more commonly known, easier to develop, and doesn't require the level of knowledge that C++ unmanaged extended procedures did. It also doesn't have the risks of memory leaks that the unmanage code did given the CAS level that it is created with. It is the same because you are using an external language to write the code, but that is about the only similarity that exists between the two.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 10:36 pm
Jonathan Kehayias (2/6/2009)
I didn't use his CLR method directly because it has an inherent flaw in its usage of Regex in itthat I wouldn't make in my own code. Instead I used a functionally identical function:
Outstanding! Thanks for running the tests! Like I said, the one place that Matt Miller, Sergiy,
and I proved that CLR's were definitely better for performance was in the area of RegEx and I
was also absolutely sure that you could make a major improvement there, as well. Well done!
And, I'm not being a bit ironic there... I mean it.
Because of the crud you have to go through with OLE automation and the slowness of the
associated sp_OA* routines, I'd be willing to bet that a CLR would also beat the code I
posted for performance. We didn't test that in any of our deep dives, but I'm sure it's true and
a good CLR for that would be great.
But, like you said, performance isn't the only thing to consider when making the decision to use
a CLR. Look at the installation code you had to write. Can a "newbie" actually do that as you
suggested? I submit, probably not. And then, there are those hardcore DBA's that say "Not on
my watch" when it comes to CLR's. 😉 Better know how to do several different ways... something
a "newbie" wouldn't actually know, anyway.
As you've just shown, although there's a 50% difference in performance between that given CLR
and the fastest code I wrote in T-SQL to do the same thing but, consider for a moment that
it's only 60 milliseconds difference... across a million rows and extrapolating your run times on 10k
rows vs a million rows, thats only 6 seconds difference... and I have two lines in a WHERE clause...
with the CLR, there's source code to check in, a DLL to test and install, and an installation procedure
to make the code useful as a CLR function. Lord help us if we migrate servers.
Just to be clear, I'm not saying that all CLR's should be avoided. You said it quite correctly
when you said that some folks abuse them and you picked a very good example for what a CLR
should probably be used for. In fact, I have no problem with your good article at all. One of my major
points was that someone posted some a link to some pretty bad T-SQL and the race between that
and a CLR and a test that probably wasn't even done correctly. CLR's get glamourized because of
such mistakes much like set based code gets a bad name because someone writes some really bad
set based code to compare to. I want to make sure that, because of that miserable article that someone
referred us all to, that people don't go CLR crazy like they did when cursors were introduced in early
versions (6.5 I believe) of SQL Server.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 6, 2009 at 10:51 pm
Jonathan Kehayias (2/6/2009)
Jeff Moden (2/6/2009)
Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.Not true. I've seen this happen in consulting twice for OLE automation and a few times for XML documents that get created but not removed. Its not that difficult of a mistake to make. Leaking in CLR is only possible with UNSAFE assemblies, as a part of the design of the hosting environment. The ability to leak memory requires Host Protection Attributes that are prevented from loading in anything but UNSAFE.
So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR.
You could do everyone a favor and just reference them to the text file rather than posting the varbinary output string of a compiled assembly in here and making this page of the forums unusable because it doesn't know how to handle a string that long. You aren't comparing similar code, by looking at the binary compiled assembly object. The code is the C# and should the following connect item get added to SQL Server:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=265266
the ability to write code similar to the following will exist in SQL:
CREATE PROCEDURE my_proc @par1 int,
@par2 int WITH
EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS
using System;
using System.Data;
using System.Data.SqlClient;
...
That's what I mean... what's the difference between someone forgetting to do an sp_OADestroy and someone forgetting to mark a CLR as something other than UNSAFE especially if they're a newbie? Even if it gives you a warning, is a newbie gonna know what to do about it?
But, point well taken, hopefully on both parts... you do have a caveat for both methods.
Ya beat me to it on the other thing... I didn't realize the binary was quite so wide in your code and started fixing it as you were typing. It's fixed now. Thanks for the heads up, though.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 6, 2009 at 10:53 pm
Its taken me close to a year to create a series of articles that have sufficient utility to be worthwhile, and are logically good use of SQLCLR so as to not be chewed alive by the community. Replacing less secure methodologies is my favorite place to make use of SQLCLR, though generally speaking, I wouldn't be reading directories from my TSQL code either, so I don't really have these examples in use in my own environment.
The EmailRegex I do have in use in a data warehouse that I built that loads data from all different kinds of sources, so it is very useful in the data cleansing process post load. I didn't do that in CLR for performance, more than I already had the code from answering a forums post, and it was easily reused. The difference in performance between the TSQL and the CLR isn't enough to make it matter to me. Like I said, if you are talking ms what did you really save and do you really notice it?
As for the installation script I provided, could anybody do that. Sure, if they were willing to research the "best practice" implementation for a EXTERNAL_ACCESS Assembly. Most aren't willing to go that route and instead just flip TRUSTWORTHY ON and move on with deploying under dbo. In fact, I think I have one of the few articles on the web that actually uses Keys over Trustworthy for non-SAFE assemblies. Since my purpose was to demonstrate how to be more "secure" with SQLCLR, setting TRUSTWORTHY wouldn't have really been good to do, especially in master, so I did it the right way, rather than the easy way. The installer script here does provide a simple framework that can be used for any assembly. Just a little copy paste rename kind of stuff to do.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 10:58 pm
Jeff Moden (2/6/2009)
Ya beat me to it on the other thing... I didn't realize the binary was quite so wide in your code and started fixing it as you were typing. It's fixed now. Thanks for the heads up, though.
I wish there was a better way to provide that binary string in forums. At least here you can attach files to forums posts as text. Other places it is completely impossible to send someone a sample CLR compiled assembly for even a simple Hello World example. It is equally as long as the assembly I posted for reading the directory information.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
February 6, 2009 at 11:15 pm
Jonathan Kehayias (2/6/2009)
so I did it the right way, rather than the easy way
BWAA-HAA!!! Man, I'm right there with you on that one! Now, all we have to do is convince the rest of the world, especially the managers that want it real bad, to give people just a little more time so they don't get it that way... real bad.
Good talking with you again, Jonathan. Kudos on the article.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 168 total)
You must be logged in to reply to this topic. Login to reply