Introduction
Manipulating Binary Large Objects (BLOBs) and Character Large Objects (CLOBs) has always been difficult in SQL Server. The new SQL Server 2005 provides new data types (NVARCHAR(MAX), VARCHAR(MAX) and VARBINARY(MAX)) for large object storage (up to 2 GB) allowing better manipulation as well as the ability to process these data types using CLR procedures and functions.
This article shows how to create CLR functions to seamlessly compress and decompress large data objects with minimum performance impact using compression algorithms included in.NET Framework 2.0. Code samples included in this article can be used in any database implementation but do not cover all possible scenarios. For large implementations or mission critical applications consider using third party products like SQLCompres.NET (it is free).
The new data types
SQL Server 2005 provides three new data types to store and manipulate large objects. VARCHAR(MAX) can store CLOBs, NVARCHAR(MAX) does the same but allows Unicode characters and VARBINARY(MAX) can store BLOBs. Books Online states that max indicates that the maximum storage size is 2^31-1 bytes. These new data types can be used with a wide range of T-SQL commands and behave much like traditional VARBINARY(n), VARCHAR(n) and NVARCHAR(N).
The new data types should replace the TEXT and IMAGE types from previous versions. As per SQL Server Books Online TEXT and IMAGE columns should not be used in new development and legacy applications should be changed to use the new types.
These new types, as opposed to TEXT and IMAGE, can be used as variables and function parameters and can be returned by CLR (or T-SQL) scalar-value functions. These new characteristics make them great candidates for compression. Previous attempts to add CLOB and BLOB compression to SQL Server involved using extended procedures, a difficult and risky business. Using the CLR integration capabilities introduced with SQL Server 2005 makes such implementation more secure and stable (and sometimes faster than their extended procedure counterparts).
CLRs procedures and functions parameters can receive and process these data types as SQLChars and SQLBytes
. SQLChars
can be used to pass VARCHAR(MAX) and NVARCHAR(MAX) parameter while SQLBytes
is used to pass VARBINARY(MAX) types. CLR uses only Unicode characters and passing VARCHAR(MAX) as a parameter implies converting it to Unicode before parameters are passed.
Get the code here:
Compressing a BLOB
SQL Server 2005 allows CLR code to be registered as functions and stored procedure. Developers can now extend SQL Servers functionality using a broad array of programming languages from managed C++ to Visual Basic. How the CLR is hosted inside SQL Server goes beyond the scope of this article. For those who worry about enabling CLR integration, suffice to say that Microsoft has made a conscious effort to keep this integration as safe and secure as possible.
Lets use VARBINARY(MAX) for simplicity sake, since data can be converted between BLOB and CLOB types this articles code can be extended using T-SQL functions. Listing 1 contains a function to compress BLOBs in SQL server, the function receives a VARBINARY(MAX) or IMAGE as a SQLBytes
and compresses it using the DeflateStream
class provided in .NET Framework 2.0. SQLBytes
represents a mutable type that wraps either an array or a stream. We are going to assume it wraps an array to avoid complicated code, and get the data from the Buffer
property. Using this property might fail with larger BLOBs causing the CLR to throw an out of memory exception (but dont worry, unlike extended procedures errors, CLR exceptions should not crash your SQL Server).
Listing 1: Compression function
using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.IO; using System.IO.Compression; public partial class UserDefinedFunctions { // Setting function characteristics [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,DataAccess=DataAccessKind.None)] public static SqlBytes fn_compress(SqlBytes blob) { if (blob.IsNull) return blob; // Retrieving BLOB data byte[] blobData = blob.Buffer; // Preparing for compression MemoryStream compressedData = new MemoryStream(); DeflateStream compressor = new DeflateStream(compressedData, CompressionMode.Compress, true); // Writting uncompressed data using a DeflateStream compressor compressor.Write(blobData, 0, blobData.Length); // Clossing compressor to allow ALL compressed bytes to be written compressor.Flush(); compressor.Close(); compressor = null; // Returning compressed blob return new SqlBytes(compressedData); } };
Compressing a BLOB in SQL Server 2005 is as easy as passing a SQLBytes
parameter, reading its content and writing it to a compression stream. The compression stream writes to a MemoryStream
that is later used to create a new SQLBytes
object that can be returned to SQL Server applications or directly to the client. There is only one caveat: Microsofts implementation of DeflateStream requires the stream to be closed before it writes the last compressed bytes, flushing is not enough.
Listing 2 loads an assembly in SQL Server (a process called cataloging where assemblies are verified for security and reliability) and creates the compression function. Listing 3 shows how to use the function in a T-SQL update. This usage makes compressing columns a seamless process that would require only server side adjustments.
Listing 2: Loading assembly and creating CLR function
CREATE ASSEMBLY [BlobCompression] FROM 'D:\Development\BlobCompression.Dll' WITH PERMISSION_SET = SAFE CREATE FUNCTION [fn_decompress] ( @compressedBlob varbinary(MAX)) RETURNS varbinary(MAX) AS EXTERNAL NAME [BlobCompression].[UserDefinedFunctions].[fn_decompress];
Listing 3: Compressing data
create table #temp ( blob_col varbinary(max)); insert into #temp values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in your project. This file is located in the Test Scripts folder in the Solution Explorer.')); drop table #temp;
Decompressing a BLOB
Listing 4 contains a function to decompress a BLOB. This function follows the same principles used in compression but now reads from a stream returning a decompressed block that can be used to create and return a decompressed SQLBytes
object.
Listing 4: Decompression function
using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.IO; using System.IO.Compression; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)] public static SqlBytes fn_decompress(SqlBytes compressedBlob) { if (compressedBlob.IsNull) return compressedBlob; // Preparing to read data from compressed stream DeflateStream decompressor = new DeflateStream(compressedBlob.Stream, CompressionMode.Decompress, true); // Initializing variables int bytesRead = 1; int chunkSize = 10000; byte[] chunk = new byte[chunkSize]; // Preparing destination stream to hold decompressed data MemoryStream decompressedData = new MemoryStream(); try { // Reading from compressed stream while ((bytesRead = decompressor.Read(chunk, 0, chunkSize)) > 0) { // Writting decompressed data decompressedData.Write(chunk, 0, bytesRead); } } catch (Exception) { throw; // Nothing to do... } finally { // Cleaning up decompressor.Close(); decompressor = null; } // Returning a decompressed BLOB return new SqlBytes(decompressedData); } };
Listing 5 loads an assembly and creates a compression and decompression CLR function. Compression and decompression can be tested using listing 6, it creates a table and add some values to it, a compression update is run followed by a select statement that returns uncompressed data.
Listing 5: Loading assembly and creating CLR functions
CREATE ASSEMBLY [BlobCompression] FROM 'D:\Development\BlobCompression.Dll' WITH PERMISSION_SET = SAFE CREATE FUNCTION [fn_decompress] ( @compressedBlob varbinary(MAX)) RETURNS varbinary(MAX) AS EXTERNAL NAME [BlobCompression].[UserDefinedFunctions].[fn_decompress]; CREATE FUNCTION [fn_compress]( @blob varbinary(MAX)) RETURNS varbinary(MAX) AS EXTERNAL NAME [BlobCompression].[UserDefinedFunctions].[fn_compress];
Listing 6: Testing functionality
create table #temp ( blob_col varbinary(max)); insert into #temp values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in your project. This file is located in the Test Scripts folder in the Solution Explorer.')); update #temp set blob_col = master.dbo.fn_compress(blob_col); select convert(varchar(1000), master.dbo.fn_decompress(blob_col)) from #temp; drop table #temp;
Limitations
The code included in this article allows column level compression in SQL Server 2005 but it lacks functions for consistency check and will not work very well with large objects (5 MB or more depending on configuration). It is intended to show how to use CLR integration in SQL Server to extend the engines functionality and provides an overview on what can be done with the new BLOB/CLOB data types.