In this post we’re going to create some encrypted columns in a table in a test database and look at some of the practicalities and limitations of working with Always Encrypted (AE).
There are actually a fair few limitations, but not because there anything wrong with the technology, rather they naturally fall out of logic of working with encrypted data.
Let’s just bash on with it and you’ll see what I mean.
Setting up a table with encrypted columns
I’ve got a database called AlwaysEncryptedTest, and it’s sitting on a remote instance – i.e. not my local machine. I’ve also got the Keys I created in the last post (Understanding Keys and Certificates with Always Encrypted). I’m now going to create a table with a couple of encrypted columns:
CREATE TABLE [dbo].[EncryptedTable](
Id INT IDENTITY(1,1) CONSTRAINT PK_EncryptedTable PRIMARY KEY CLUSTERED,
LastName [nvarchar](32) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = MyColumnKey,
ENCRYPTION_TYPE = Deterministic,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
) NULL,
FirstName [nvarchar](32) COLLATE Latin1_General_BIN2 ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = MyColumnKey,
ENCRYPTION_TYPE = Randomized,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256') NULL
);
There’s plenty to discuss there already. You might first of all ask why am I specifying a particular collation for those columns (Latin1_General_BIN2) and overriding my database collation settings. I hate people who do that! The answer is I don’t have a choice, text columns encrypted with AE must have a BIN2 collation. BIN2 collations are (apart from anything else) case-sensitive.
If you think about it, it makes sense that you can’t have a case-insensitive comparisons on an encrypted column – and therefore those collations cannot be supported. When you perform a comparison based on encrypted data, what the engine is doing is comparing one encrypted value with another. To enable a case-insensitive comparison there would have to be some deterministic pattern so that you can tell that two different encrypted values differ only by case, that would be more complicated to implement, would weaken the encryption, and isn’t supported by the algorithm used by AE. The requirement to use a BIN2 collation is driven by what will happen in practice when you compare two encrypted values looking for an exact match.
So there’s something to think about. Any searches you do against encrypted text data are going to be case sensitive. So say your application allows you to search for people by name, and you’re looking for “McGiffen” (my name) then you’re not going to find it if you type “Mcgiffen” or “mcgiffen”. For someone like myself who often see mis-captitalization of their name, as little as I care , it may be a struggle to find me if my name (as it’s Personal Identifiable Information) has been encrypted.
So how do you deal with this? I don’t know – you’ll have to think about that – maybe maintain an upper case copy for searching against and transform search terms into upper case BEFORE sending them to the database.
There are also a whole bunch of other data types that you can’t encrypt which I would imagine are to do with similar practicalities of the encryption implementation. Rather than list those here you can find that set and other limitations in this MSDN article:
https://msdn.microsoft.com/en-GB/library/mt163865.aspx?f=255&MSPPError=-2147217396
You can see in the SQL above I’ve specified that the COLUMN ENCRYPTION KEY “MyColumnKey” which I created in the previous blog post. But then for one column I’ve specified ENCRYPTION TYPE of “Deterministic” and for one it is “Random”. What’s the difference?
Well, Deterministic means that every time you encrypt a given value with a given key, the encrypted result will be the same. Randomized means it will be different. With deterministic it might be possible for someone to perform statistical analysis based on the distribution of your data to infer what some common values are – with Randomized that’s not going to happen.
However, if your encryption type is Randomized then you can’t search on that column and you certainly can’t join on it (or group or index on it either).
This makes perfect sense if you realise that SQL only knows the encrypted value of your data, so if it is to return results by comparing one encrypted value with another then the values must match for anything to be returned, for that to happen they must both have been encrypted using deterministic encryption.
Finally, in the column definition I’ve specified the encryption algorithm – the one you see is the only algorithm currently supported by AE, so just use the same one.
Inserting Encrypted Data
Now we’ve got the table let’s try inserting some data.
It should be no big surprise that if I want to run the following query from SSMS it doesn’t work:
INSERT INTO dbo.EncryptedTable (LastName, FirstName)
VALUES ('McGiffen','Matthew ');
You get a longish error message that basically tells you you’re trying to stick unencrypted data in an encrypted column:
Msg 206, Level 16, State 2, Line 24
Operand type clash: varchar is incompatible with varchar(8000) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘MyColumnKey’, column_encryption_key_database_name = ‘AlwaysEncryptedTest’) collation_name = ‘Latin1_General_CI_AS’
To be able to insert data, there’s a few things you need to do.
First of all, this isn’t actually supported in SSMS 2016, so you need to download and install the vNext version which has the new feature “Parameterization for Always Encrypted”. Prior to this version if you want to test/play (beyond a few basics) with always encrypted data you had to write code in C# or another language.
Once you’ve got SSMS vNext, you can connect to your database. You have to enable Column Encryption for your connection – you’ll need to do this in your connection string whatever client you are querying your database from if your query might touch encrypted data – be that your application, SQLCMD or SSMS.
In SSMS you do that when you’re connecting to the instance in the “Connect to Server” dialog. Select “Options”, and go to “Additional Connection Parameters” and enter into the box “Column Encryption Setting = Enabled” – as per the image below:
(One thing to note, is that SSMS will then try to use this setting for all subsequent connections you make until you go in and remove it. As the setting is only supported on SQL 2016 onward you get an error connecting to older instances, which is little bit of a pain…)
Now we’ll open a new query window against our database, and then we have to actually enable “Parameterization for Always Encrypted”. Right-click over the query and select “Query Options”, then “Advanced” and enable the option as shown below:
There’s an MSDN article about this feature here:
In simple terms, it allows SSMS to perform additional parameterization on your query and encrypt any literal values that need encrypting before sending them to the database. In the previous versions of SSMS this would have just been sent plain text – and so the engine barks at you as in my error above.
The final step to get this to work is that you have to rewrite your original query to use variables rather than literal values in the insert. This helps SSMS to be able to manage the rest of the parameterization process. I’m going to show you a screen-shot of the updated code as it gets displayed in SSMS as there’s something else to notice:
See that my variable declarations and assignments are underlined in squiggly blue. If I hover over one of them, then I get this message:
This shows me that the Parameterization is in action and working correctly.
If I run the insert now it works fine. Woo-hoo.
There are some implications of this if we think it through. You are only going to be able to do inserts based on values held in a variable or parameter. So you can only do one row at a time – and no, you can’t use a table variable for this. That means you can’t use value lists in an insert or merge statement, but logically, you probably don’t need to. Remember this is likely to be only sensitive personal information we’re encrypting – not reference data for which we may have scripts with lots of values. As such we would generally be inserting this one row at a time, and ideally through a stored procedure using parameters. So everything should be peachy.
It’s also logical to realise you can’t insert from one table to another unless the values in both tables are encrypted (hopefully using the same keys!). This goes back to the fact that the engine can’t encrypt or decrypt values – that has to happen in the client. The above insert only works because SSMS (our client) is doing all the work.
Let’s just run a select and check that we can see the data:
SELECT * FROM dbo.EncryptedTable
Cool, this works. Note this will also work in SSMS 2016 as the query is simple enough to not require parameterization.
Let’s just have another look at the same data, but this time I’m going to run it direct from the server where the certificate for the Column Master Key doesn’t exist:
So here you can only see the encrypted version of the data.
Out of interest let’s look at the full value for the LastName field:
0x0180B908F995B86C64511991CDE834DD4888A058D1B482E5E14297FFDA9D3C8E0828E3B80F3C58F98197AC73F5867E2837DD2E6C266FFAA95E38A7B08C111AF7EBEEC476EE6BBDB9704F1AC007CE1F613B
That’s just a little bit longer than “McGiffen” – so you can see there is going to be some storage overhead on your encrypted columns.
I’m just going to chuck a few extra rows into the table (including a duplicate record for me) so I can demonstrate a couple of points mentioned earlier. Here’s what we have when I’m done:
Now let’s look again at the encrypted version of that:
Remember in my table definition, I set LastName to be Deterministic Encryption, but FirstName to be randomized. If you look at the encrypted values above we can see some repeated values in the LastName column – even if we didn’t have the Ids to compare with the first recordset it’s pretty straightforward to work out which is “McGiffen” and which is “Rubble”.
Then look at FirstName. We know both records 1 and 2 though have a first name of “Matthew”. If we check the encrypted values in the FirstName column we can see they are both different – this is Randomized Encryption in practice.
Querying Encrypted Data With a Predicate
I’m going to go back to my encrypted (and parameterized) connection and try running some queries against those tables:
DECLARE @LastName NVARCHAR(32) = 'McGiffen';
SELECT * FROM dbo.EncryptedTable
WHERE LastName = @LastName
Results:
So that worked fine, let’s just try and isolate those first two duplicates at the beginning though:
DECLARE @LastName NVARCHAR(32) = 'McGiffen';
DECLARE @FirstName NVARCHAR(32) = 'Matthew';
SELECT * FROM dbo.EncryptedTable
WHERE LastName = @LastName
AND FirstName = @FirstName;
This time I have no luck. I get the following error message that is telling me in a long-winded manner that I can’t compare against a column with Randomized encryption:
Msg 33299, Level 16, State 2, Line 35
Encryption scheme mismatch for columns/variables ‘@FirstName’, ‘FirstName’. The encryption scheme for the columns/variables is (encryption_type = ‘RANDOMIZED’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘MyColumnKey’, column_encryption_key_database_name = ‘AlwaysEncryptedTest’) and the expression near line ‘8’ expects it to be (encryption_type = ‘DETERMINISTIC’) (or weaker).
Msg 8180, Level 16, State 1, Procedure sp_describe_parameter_encryption, Line 1 [Batch Start Line 27]
Statement(s) could not be prepared.
An error occurred while executing batch. Error message is: Internal error. The format of the resultset returned by sp_describe_parameter_encryption is invalid. One of the resultsets is missing.
Of course this is exactly what we expected to happen.
What I want to get across is that this all makes perfect sense. If you understand what encryption is doing, then the limitations you face naturally arise from that. Key is understanding that the encryption occurs at the client side, so if you want to do anything in a query that requires the SQL engine to interact with an encrypted column then you are going to be limited. Mostly all it can do is compare two encrypted values and verify they are the same – and only if they are using the Deterministic setting. So, if you’re implementing AE, just think carefully about what columns you need to encrypt and particularly about how you need to interact with those columns.
What’s occurring in the background?
In the last post about certificates I mentioned the extra steps that AE has to go through when querying your data. Now we’ve got a table to play with, let’s look at a Profiler trace so we can see exactly what that involves in terms of interactions with SQL. The actions that occur within the client side libraries are a black box to us, but there is still some interesting stuff to be observed at the SQL end.
I’ll just set Profiler going using a default trace and insert another record into my table.
What we see when we do this is that there is an extra call to the database before the query is executed:
EXEC sp_describe_parameter_encryption
N'
DECLARE @LastName AS NVARCHAR (32) = @pce03080ba9844c4182a38c82216e4ad1;
DECLARE @FirstName AS NVARCHAR (32) = @pba60ad612001418aac8496e355e03892;
INSERT INTO dbo.EncryptedTable (LastName, FirstName)
VALUES (@LastName, @FirstName);
‘,N’@pce03080ba9844c4182a38c82216e4ad1 nvarchar(32),@pba60ad612001418aac8496e355e03892 nvarchar(32)’
This call is the client (SSMS) asking the server for details about any encryption that might be involved in the query. You’ll notice that SSMS has parameterized the query, replacing the literal values with dynamically named parameters.
The stored procedure returns two result-sets. The first contains a row for each Encryption Key involved in the query. In this case that’s just one, thus one record, so I’ve transposed it to make it easier to read:
You can see this is sending back to client the encrypted value of the Column Encryption Key – and also telling it where to find the certificate of the local machine to decrypt that key.
The second set describes each parameter, which of the identified Keys (from the first set) it is using, and also tells us whether it is using Deterministic or Randomized encryption (this is the “column_encryption_type” value).
Armed with that information, the client can then perform any encryption required and execute the actual query:
EXEC sp_executesql N'DECLARE @LastName AS NVARCHAR (32) = @pce03080ba9844c4182a38c82216e4ad1;
DECLARE @FirstName AS NVARCHAR (32) = @pba60ad612001418aac8496e355e03892;
INSERT INTO dbo.EncryptedTable (LastName, FirstName)
VALUES (@LastName, @FirstName);
',N'@pce03080ba9844c4182a38c82216e4ad1 nvarchar(32),@pba60ad612001418aac8496e355e03892 nvarchar(32)'
,@pce03080ba9844c4182a38c82216e4ad1=0x01F2CD73FDD15216E2D4DC89EB6DC046EBE3FFC70A967BB6A3F822B57EA840F9D60410825455391AAFE7DF161CE78F4C3D4DE65ED8FC2435115C4E0F81E49AD820
,@pba60ad612001418aac8496e355e03892=0x01C4FBE035F600CBF61B2589028863DF732973166D9752B29CBBF1C7814DF6E8BDAD0D95897D79E28884E103F350506B5465A0914AD22EE8C4BB6630FF02F39DB4
You can see that the parameterized query has now been executed via a call to sp_executesql. You can also see that those parameters have been set to the long binary values which represent the encrypted versions of the data I specified – which in this case was “Smith” and “John”.
I covered this briefly in the last post, but it’s good to understand that the extra step before executing the actual query has to happen even when there is no encryption involved as there is no way for the client to know which columns are encrypted and which are not. As such, once you are working with AE and have the Column Encryption Setting enabled for your connections, there will be some overhead on pretty much all your queries. Important to note though – if you are doing this via .NET code then all this extra process just happens automatically for you in background. From your point of view, you just code as normal.
Encrypting Existing Data
This post wouldn’t be complete unless I touched on this subject at least briefly. So that’s what I’ll do. Touch on it briefly.
So, how do you change an existing column containing data, so that it is encrypted?
If you’ve got this far then you can probably figure out for yourself that it’s not going to be as simple as an ALTER TABLE statement. Encryption is a client side function, so if you want to encrypt some data then that data’s got to go via a client.
There are various ways you can do that. You could hand crank something and do it that way. Generally your best bet is going to be a process that involves creating a new table with the encryption set up as you want it, copying over data from the old table into the new, than dropping the old table and renaming the new one. So no, this is not going to be a particularly online activity.
It can be a fairly straightforward one though as Microsoft has provided the tool to do that for you via SSIS using the SQL Server Import Export Wizard.
Here’s a nice MSDN post that takes you through that process