Back in 1999, or possibly early 2000, I was told a rumor regarding the upcoming version of SQL Server. The rumor was that there was some experimenting going on at Microsoft, regarding the possibilities of using VBScript to write stored procedures in SQL Server. Supposedly there was a chance that this feature would be available in Shiloh (the codename for SQL Server 2000). A couple of months later the beta of SQL Server 2000 came, and then the final product, and as we all know there were no possibility to use VBScript to write stored procedures. Even though the person who told me this was definitely in a position to know about these things, I do not know exactly what truth there was to this rumor. And if it was correct I still have no idea how far this feature came.
Since then we have been waiting for a long time for a new version of SQL Server. And in November it will finally be available, the final version of SQL Server 2005 (also known as Yukon). The first official beta of Yukon was available two years ago, and since then there has been another beta and several Community Technical Preview (CTP) versions. For each version more and more people have previewed the product, and the hype and interest keeps on building. And in the middle of all this interest there is one feature --called CLR Integration or SQLCLR for short-- that is more discussed than any other. In SQL Server 2005 we will no longer be restricted to writing procedures in T-SQL; we can now use any .Net language to create database programmability objects! This is nothing short of a revolution in the SQL Server world, and many say that the complexity of this feature is the main reason why Yukon has taken so long to complete.
Heaven and hell
For an application developer who writes code using C# (or any other .Net language) this might sound like heaven. But for a DBA who have never even seen a C# program it is more likely to feel like hell. Does every SQL Server DBA need to take a crash course in C# now? Is T-SQL dead and buried with SQL Server 2005? Does my company need to buy Visual Studio licenses for all DBAs? These are common questions. Fortunately the answer to them is NO! T-SQL is not dead; it is more alive than ever with lots of new features being added to the language in SQL Server 2005. And every DBA will not need to learn C#, they will still be able to handle all their responsibilities in T-SQL and the new management tools included in SQL Server 2005. In fact, to use .Net code in SQL Server a sysadmin will first need to enable this server-wide feature. Many companies will not need to activate it at all; they can upgrade their databases to SQL Server 2005 and take advantage of all the other benefits it brings.
However, Microsoft did not add CLR Integration to SQL Server just because they had some extra time on their hands and thought it might be cool. The reality is that there are scenarios where a managed .Net procedure will perform better than one written in T-SQL. And it is definitely easier to write better code with a modern programming language such as C# than what we write with T-SQL (even though T-SQL is getting some nice new features in this area in SQL Server 2005). Many application developers who are also database developers will be interested in taking advantage of this. Visual Studio 2005 will have built-in integration with SQL Server 2005 for creating SQLCLR programmability objects. The developer can do all his work in his preferred development environment, so there is another reason for him to implement functionality with SQLCLR instead of T-SQL. But as you had probably guessed it is not as simple as just choosing between SQLCLR and T-SQL based on your preferences and skills. While I did say above that there are times when SQLCLR will perform better, there are definitely lots of scenarios where T-SQL is better. The integration of CLR brings nothing new to how data in a relational database is best managed. Declarative set-based DML always (or close to it) outperforms procedural processing here. One of the biggest concerns DBAs have regarding CLR Integration is that it will open up the doors to SQL Server for a hoard of programmers who know nothing about data management. I do not think that will happen though, why should everyone start programming database functionality just because they recognize the syntax better? But still, there is a quite possible risk that we will see a lot more procedural code, processing data row-by-row. We will also start seeing more or less exotic uses of SQL Server, such as procedures calling out to web services or working with the file system. The richness of the .Net Framework brings a lot of useful functionality, but it also means it will be much easier to do wrong --or at least questionable-- things in database code.
My personal opinion is that CLR Integration is an interesting new tool that can be helpful in some situations, but should definitely be used with care. I do not see it as the most important new feature in SQL Server 2005 (far from it), but it probably is the one that needs the most coverage to make sure that anyone who will be using it knows as much as possible about it. Developers will need to know what effects their .Net implementations will have in SQL Server, and DBAs will probably want to be able to review code that is being executed in their databases. This article is therefore the start of a series that I will publish on this matter. In the upcoming parts I will describe how CLR Integration works, what you can do with it (and of course how to do those things) and when you should or should not use it. To end this part, and start looking at CLR Integration, I will show an example of a useful user-defined scalar function implemented in C# and how to use it in SQL Server 2005.
Using CLR Integration to validate email addresses
One thing that many have asked for in SQL Server is the possibility to use regular expressions for validation purposes. Using a specially crafted regular expression it is possible to verify that a string is written according to some formatting rules that have been defined, for instance that an email address seems correct. Although SQL Server 2005 does not have this feature, the .Net Framework does. We can therefore implement a user-defined function in C# that takes a string and validates it against a regular expression. This is an example of a good use of the CLR Integration in SQL Server 2005. As most probably already know, implementing a function in C# does not mean that we write C# code inside a T-SQL DDL statement such as CREATE FUNCTION. Instead we need to create a .Net assembly containing a class with a method that will act as the user-defined function. We could do this using only a text editor such as Notepad, the C# compiler and T-SQL statements to define the function inside SQL Server. I will save that for a later article though. Here I will instead simply outline the steps necessary to build and deploy the function using Visual Studio 2005 and the integration with SQL Server.
- First of all we need to make sure that CLR Integration is activated in SQL Server 2005. This can be done using sp_configure or we can use the new configuration tool called SQL Server Surface Area Configuration. Check that the feature called CLR Integration has a checkmark in 'Enable CLR Integration'.
- Now, in Visual Studio 2005, create a new project. Choose the folder Visual C# Projects and the project type called SQL Server Project. Note that this project type is only available in Visual Studio 2005 Professional Edition and above. Name the project as you like, for instance StringHelper.
- When the project is being created you will be asked for a database reference. Specify a SQL Server 2005 server and a database. This is the database where you want the functionality to be created.
- Add a new item by right-clicking the project and selecting Add New Item, User-defined Function. Name the function RegExValidate.
- Now there is a new file called RegExValidate.cs in the project. Edit it with the following code (just clear everything in it and copy-paste this code):
using System.Data.SqlTypes; using System.Text.RegularExpressions; using Microsoft.SqlServer.Server; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlBoolean RegExValidate( SqlString expressionToValidate, SqlString regularExpression) { Regex regex = new Regex(regularExpression.Value); return regex.IsMatch(expressionToValidate.Value); } }
RegExValidate.cs now defines a class which has a static method called RegExValidate. This method takes two parameters of the type SqlString (the SqlTypes are types that map to the data types in SQL Server and also handle null values), an expression to validate and a regular expression to validate it against. It then creates an instance of the RegEx class that exists in the .Net Framework, and uses the method IsMatch of that instance to validate whether or not the expression matches the format defined in the regular expression. IsMatch returns true or false and we return that value to the caller of the function using a SqlBoolean (which of course corresponds to the bit data type in SQL Server).
Please note that this code is not perfect. For instance we should be checking the parameters for nulls etc, but I want the example to be as simple as possible.
- Now compile the project into an assembly. In the menu choose Build, Build Solution.
- Finally we must deploy our new assembly to the SQL Server database specified with the database reference we created at the start. Choose Deploy, Deploy Solution in the menu. If all goes well you should see a message that says that one file was deployed to SQL Server.
- Now start SQL Server Management Studio (the replacement for Enterprise Manager and Query Analyzer). Connect Object Explorer to the server you chose earlier and drill down the specified database. Expand the nodes Programmability, Functions, Scalar-valued Functions and there you should see a function called dbo.RegExValidate.
- The function can now be used to validate strings with regular expressions, like this:
DECLARE @regex NVARCHAR(100)
SET @regex = N'^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$'
SELECT name, emailaddress, dbo.RegExValidate( emailaddress, @regex ) AS validemail FROM dbo.PEOPLE
However, to really show the strength of this validation we can create a constraint that use it.
CREATE TABLE dbo.PEOPLE (
name VARCHAR(25) NOT NULL
, emailaddress VARCHAR(255) NOT NULL
, CONSTRAINT PEOPLE_ck_validemailaddress CHECK ( dbo.RegExValidate( emailaddress, N'^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$' ) = 1 )
)
Now we have a CHECK constraint that makes sure that it is not possible to enter email addresses that do not match the format defined in the regular expression.
Summary
The CLR Integration in SQL Server 2005 is an interesting new tool for the toolbox of every database developer and DBA. But it can also be a double-edged sword if used incorrectly, and therefore it is very important that anyone who will be using it really understands what it is, how it works and when to use it (or not use it). Look for further articles in this series.