Index on Encrypted column

  • Hi all,

    We have a table with encrypted varbinary (80) column which stores cardnumbers. we have created a view for application which decrypts the cardnumber by DECRYPTBYKEYAUTOCERT function. We do have lots of queries using cardnumbers.

    Is it possible to add index on varbinary column? How do i make the search faster?

    Thanks in advance

  • Sure you can index on a varbinary column. I would guess, I have not tested, that it might be fastest to run the encryption on a parameter and do the filter against the binary data, then only decrypt the matching values.

  • You can index it, but it's completely useless to do so. It's not possible to encrypt a parameter and compare to the stored encrypted value as the encryption routines are non-deterministic. (encrypt and compare will result in no matches) Hence you need to decrypt and compare, which, since it's a function on the column, will prevent index seeks.

    What I've seen people do is to create a second column which contains a salted hash of the decrypted value (using hash_bytes). Since hashes are deterministic, index that then hash and compare (remembering the salt and remember to keep the salt value very, very safe) to get thje matching row (and potentially others as there is a chance of hash collisions), then decrypt the value on the matched rows and do a second comparison.

    Hit google, i'm sjure someone, somewhere has blogged or written an article on this technique.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for correcting me Gail.

  • 1)

    You can index it, but it's completely useless to do so.

    That is incorrect. While you cannot do an index SEEK, a query can still use an index SCAN to filter the data, which can still be an order of magnitude or two faster than a full table scan. Obviously you can't get good statistics on how many rows will satisfy the filter either, and that needs to be taken into account in query design.

    2) Hashes are indeed the way to go to get the optimal performance, although they are NOT true encryption and you can also get hash collision (very, very low probability here though).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/10/2009)


    1)

    You can index it, but it's completely useless to do so.

    That is incorrect. While you cannot do an index SEEK, a query can still use an index SCAN to filter the data, which can still be an order of magnitude or two faster than a full table scan. Obviously you can't get good statistics on how many rows will satisfy the filter either, and that needs to be taken into account in query design.

    Ok, so it's almost completely useless. 😉

    I would guess (but haven't tested) that it would have to be a covering index for SQL to take the risk of a scan, filter and lookup when it doesn't know the number of rows affected.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/10/2009)


    TheSQLGuru (11/10/2009)


    1)

    You can index it, but it's completely useless to do so.

    That is incorrect. While you cannot do an index SEEK, a query can still use an index SCAN to filter the data, which can still be an order of magnitude or two faster than a full table scan. Obviously you can't get good statistics on how many rows will satisfy the filter either, and that needs to be taken into account in query design.

    Ok, so it's almost completely useless. 😉

    I would guess (but haven't tested) that it would have to be a covering index for SQL to take the risk of a scan, filter and lookup when it doesn't know the number of rows affected.

    I had a client that had this and it did pick an index scan and lookup. Can't recall much details to see if there was ancillary reasons for that.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • GilaMonster (11/10/2009)


    TheSQLGuru (11/10/2009)


    1)

    You can index it, but it's completely useless to do so.

    That is incorrect. While you cannot do an index SEEK, a query can still use an index SCAN to filter the data, which can still be an order of magnitude or two faster than a full table scan. Obviously you can't get good statistics on how many rows will satisfy the filter either, and that needs to be taken into account in query design.

    Ok, so it's almost completely useless. 😉

    I would guess (but haven't tested) that it would have to be a covering index for SQL to take the risk of a scan, filter and lookup when it doesn't know the number of rows affected.

    I had a client that had this and it did pick an index scan and lookup. Can't recall much details to see if there was ancillary reasons for that.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Just for anyone who's really into this: http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx

    Otherwise, yes, indexing the encrypted value can be useful (and better than a full scan of the table!), so long as you don't mind doing all that decryption. That is practical on smaller tables, but not as efficient as indexing a hash (salted or otherwise) of the unencrypted value. One might even persist the first few characters of the unencrypted value and index that. It all depends on the details really, as always.

    Paul

  • Thank you all for the suggestion and link.

  • Paul White (11/10/2009)


    Just for anyone who's really into this: http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx

    One might even persist the first few characters of the unencrypted value and index that. It all depends on the details really, as always.

    Paul

    As I understand it the first few (4 or 6 I think) characters in a card are the issuer, so it'd not be terribly selective if you're looking for specific individuals' details. Last 4 might be better.

  • Andrew Gothard-467944 (11/11/2009)


    Paul White (11/10/2009)


    Just for anyone who's really into this: http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx

    One might even persist the first few characters of the unencrypted value and index that. It all depends on the details really, as always.

    Paul

    As I understand it the first few (4 or 6 I think) characters in a card are the issuer, so it'd not be terribly selective if you're looking for specific individuals' details. Last 4 might be better.

    Yes, I believe that last 4 is the common thing to add in as a 'helper' column for CC numbers.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply