Overview
Bitwise operators can be challenging to manage at first. However, with practice and patience, and under the right conditions, these operators can provide remarkable performance improvements in production environments. This article will compare two methods of accomplishing the same output, one with a normalized model and the other with bitwise operators. In essence, bitwise operators allow you to denormalize a portion of your database (whether you actually denormalize or not is not important).
Generally speaking, bitwise operators can provide great benefits in OLTP databases that either:
- Suffer from too much normalization (yes, it’s possible…)
- Run a lot of lengthy reports during business hours
- Suffer from lots of database locks on a highly used table (with few records)
- Need to have access to external data (in relatively small amounts)
- Need to speed up triggers
The usual issue with bitwise operators is that they only operate with the type int (actually, they work with the bigint type as well), limiting their usefulness for large referenced tables (usually, bitwise operators can be used when the referenced table has 63 records or less). When dealing with a greater number of records, different options may be available, but potentially at an additional performance cost. These techniques include:
- Using multiple fields to stores the referenced values
- Using prime numbers stored as real numbers instead of binary values (this gives up to 130 records to play with instead of 63). This method requires using a formula rather than the bitwise operators
- Using custom-developed algorithms (parsers) as either functions or extended stored procedures and store the values as string and not int
For the remainder of this article, we will use an integer field to store our binary values. The usefulness of this method is still high considering the many applications that can leverage this technique and the performance improvements that can result.
Introduction
The example we will use is that of a Users and Errors table as such (the scripts to create the tables and data can be found at the end of this article):
Users Table
| ||
Field
| Type
| Constraints
|
UserId | Int | Primary Key, Clustered, Not Null |
UserName | Varchar(50) | Not Null |
UserErrorFlag | Int | Not Null, Default (0) |
Errors Table
| ||
Field
| Type
| Constraints
|
ErrorId | Int | Primary Key, Clustered, Not Null |
ErrorDescr | Varchar(50) | Not Null |
ErrorFlag | Int | Not Null |
The Users table links to an Errors table that contains the list of possible errors that can occur during on a registration page. The many-to-many relationship between the Users table and the Errors table is resolved through the UserErrors table (shown below). The purpose of adding the UserErrorFlag field in the Users table is to store a mask that can help us identify which Errors a user has without reading the UserErrors table. The Errors table contains a field (ErrorFlag) that represents a power of two (1, 2, 4, 8, 16…).
The Errors table contains the following records:
Errors Table
| ||||
ErrorId
| ErrorDescr
| ErrorFlag
| Comments | Binary Value |
1 | Invalid Email | 1 | This is 2^0 | 0001 |
2 | Invalid Name | 2 | 2^1 | 0010 |
3 | Invalid Phone Number | 4 | 2^2 | 0100 |
4 | Unknown Error | 8 | 2^3 | 1000 |
To represent a user that has both ErrorId 1 and 3, the UserErrorFlag field should contain the value 5 (sum of the error flags 1 and 4 in the Errors table). We find 5 by using the following table:
Errors Table
| |||
ErrorId
| ErrorFlag
| Binary Value
| Selected Binary Value |
1 | 1 | 0001 | 0001 |
2 | 2 | 0010 | 0000 |
3 | 4 | 0100 | 0100 |
4 | 8 | 1000 | 0000 |
TOTAL | 0101
|
By selecting ErrorId 1 and 3, we need to add the binary values for the corresponding flags, which is 0101 (or 5).
Normalized Example
The Users table contains 10,000 users and our Errors table contains only four records. The purpose of our example is to return the list of users that contain a specific error number (for instance ErrorId=4). We are creating a stored procedure with an input parameter for the ErrorId we need to retrieve. The procedure (called SP_Example1) should return the list of user names (unique names) that have at least one error (any error) and should indicate by a ‘Yes’ if one of the errors is equal to 4. This field would then be used by a report to display a special icon.
EXEC SP_Example1 4
The core of this stored procedure is shown below.
SELECT
Users.UserId,
UserName,
ShowIcon = ISNULL((SELECT 'Yes' FROM UserErrors B WHERE B.UserId = Users.UserId AND B.ErrorId = @iError), 'No')
FROM Users LEFT JOIN UserErrors ON Users.UserId = UserErrors.UserId
This statement returns 5,000 rows. The following is a sample of the output of SP_Example1:
UserId
| UserName
| ShowIcon
|
1 | User 1 | Yes |
3 | User 3 | No |
5 | User 5 | Yes |
7 | User 7 | Yes |
9 | User 9 | No |
11 | User 11 | Yes |
… |
|
|
What is interesting to see is the query plan that SQL Server uses. As you can see, it requires a join, a Table Scan and an Index Seek on the UserErrors table. The key is to note the complexity of the query plan compared to the one using a bitwise operator (more on that later).
Our first test will be to run this procedure 100 times to enhance our visibility on performance differences. To do this, we call the following procedure:
EXEC Test1
Bitwise Example
Similarly, we create a second stored procedure (SP_Example2) that returns the same records, this time using the mask as a parameter. So instead of passing the value 4, we send 8 and execute the stored procedure as such:
EXEC SP_Example2 8
Here is the core of the stored procedure. This time, there is no need to join.
SELECT UserId, UserName, ShowIcon = CASE WHEN UserErrorFlag & @iFlag > 0 THEN 'Yes' ELSE 'No' End FROM Users
As you can see, the bitwise operator used is & (AND). For example, to know if 5 contains 1 (0101 contains 0001), we simply execute:
SELECT CASE WHEN 5 & 1 > 0 THEN 'Yes' ELSE 'No' END -- Result: Yes, since 0101 AND 0001 = 0001 = 1 > 0
However, 2 is not contained in 5 (0010 is not contained in 0101):
SELECT CASE WHEN 5 & 2 > 0 THEN 'Yes' ELSE 'No' END -- Result: No, since 0101 AND 0010 = 0000 = 0
To use this stored procedure, we need to know the value of the mask to use rather than the ErrorId. This is usually not too hard since key values are stored on front-end applications. Instead of (or in addition to) storing the primary key, the application stores the mask. Since we no longer need to join on the UserErrors table to retrieve the ErrorId value, we now have a very simple T-SQL. The query plan reflects this:
Similarly, we now run this stored procedure 100 times by running the following command:
EXEC Test2
Results
The following table shows CPU cycles consumed by each test (Test1 and Test2). As you can see, the bitwise calculation clearly reduces the time it takes to obtain the records (CPU in green). The first plateau is obtained as a result of running Test1 and is about 30 seconds in length. In contrast, Test2 takes about 17 seconds to complete under the same conditions: a 40% improvement. In addition, the number of locks requested by SQL Server is also diminished drastically: from 180 down to 5 (Locks Requested by Second in Red). Using the (NOLOCK) hint would of course eliminate all locks requested by the stored procedures.
There is one more advantage to using bitwise operators: the capacity to return multiple conditions at once. For instance, if we wanted to return the same information but this time where the ErrorId is 1 and 4 we would execute the following query:
EXEC SP_Example2 9 -- Flags: 8 + 1
It turns out that this feature is extremely convenient when creating reports, since many reports require combining information. This is achieved at virtually no extra CPU cost.
The results shown above can vary greatly depending on the type of statement you are executing and the number of records in each table. The query selected for these examples reflects what a simple report would require. Also, imagine what this technique can do if you are using bitwise operators for more than one table at the same time (examples for such tables include Errors, State, Age Bracket, Access Rights and many more). You would probably need a binary field for each table, but the performance gains can be drastic.
Conclusion
Although a little complex to master, bitwise operators can serve a great purpose (and even fix some performance problems) when dealing with specific queries that can cause database slowdowns. Reports may be able to benefit greatly from this technique, especially if the reports run on an OLTP database. Using bitwise operators can give you a serious performance boost and can potentially reduce your database locks dramatically. However, implementing bitwise operators in your production environments may require great planning since front-end applications may also have to be changed to accommodate the additional mask values.
Scripts
/*Create an empty database first, and then run the following script. If you need to store more than 31 records in the reference table (Errors), use a BigInt type everywhere you store the binary values, and use the CAST function where appropriate.*/CREATE TABLE [Errors] ( [ErrorId] [int] NOT NULL , [ErrorDescr] [varchar] (50) NOT NULL , [ErrorFlag] [int] NOT NULL , CONSTRAINT [PK_Errors] PRIMARY KEY CLUSTERED ([ErrorId] ) ON [PRIMARY]) ON [PRIMARY] GO CREATE TABLE [Users] ( [UserId] [int] NOT NULL , [UserName] [varchar] (50) NOT NULL , [UserErrorFlag] [int] NOT NULL CONSTRAINT [DF_Users_UserErrorFlag] DEFAULT (0), CONSTRAINT [PK_Users] PRIMARY KEY NONCLUSTERED ([UserId]) ON [PRIMARY]) ON [PRIMARY] GO CREATE TABLE [UserErrors] ( [UserId] [int] NOT NULL FOREIGN KEY REFERENCES Users(UserId) , [ErrorId] [int] NOT NULL FOREIGN KEY REFERENCES Errors(ErrorId) , CONSTRAINT [PK_UserErrors] PRIMARY KEY CLUSTERED ([UserId],[ErrorId] ) ON [PRIMARY] ) ON [PRIMARY] GO -- Create some Errors INSERT INTO Errors VALUES (1, 'Invalid Email',1) INSERT INTO Errors VALUES (2, 'Invalid Name',2) INSERT INTO Errors VALUES (3, 'Invalid Phone Number',4) INSERT INTO Errors VALUES (4, 'Internal Error',8) GO -- Create user names and additional records Declare @i int Declare @Name varchar(10) Declare @iErr int Set @i = 10000 WHILE (@i>0) BEGIN SET @Name = 'User ' + CAST(@i AS varchar(5)) Set @iErr = 15 -- This is the flag for all errors If @i / 3. = @i / 3 Set @iErr = 5 -- This sets the flag for errorId 4 and 1 -- If @i is a multiple of 2, do not add an error If @i / 2. = @i / 2 Set @iErr = 0 INSERT INTO Users VALUES (@i, @Name, @iErr) If @iErr > 0 INSERT INTO UserErrors SELECT @i, ErrorId FROM Errors WHERE @iErr & ErrorFlag > 0 SET @i = @i - 1 END GO SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS ON GO -- SELECT UserId, 'Yes' FROM UserErrors B WHERE B.ErrorId = 4 CREATE PROCEDURE SP_Example1(@iError int) AS SELECTUsers.UserId, UserName, ShowIcon = ISNULL((SELECT 'Yes' FROM UserErrors B WHERE B.UserId = Users.UserId AND B.ErrorId = @iError), 'No') FROMUsers LEFT JOIN UserErrors ON Users.UserId = UserErrors.UserId WHERE UserErrors.UserId IS NOT NULL GROUP BYUsers.UserId, Users.UserName ORDER BY Users.UserId GO SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS ON GO CREATE PROCEDURE SP_Example2(@iFlag int) AS SELECTUserId, UserName, ShowIcon = CASE WHEN UserErrorFlag & @iFlag > 0 THEN 'Yes' ELSE 'No' End FROMUsersWHERE UserErrorFlag > 0 GO SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO CREATE PROCEDURE Test1 AS Declare @i int Set @i = 100 WHILE @i>0 BEGIN EXEC SP_Example1 4 Set @i = @i-1 END GO CREATE PROCEDURE Test2 AS Declare @i int Set @i = 100 WHILE @i>0 BEGIN EXEC SP_Example2 8 Set @i = @i-1 END GO