Introduction
In SQL Server 2016, Microsoft introduced the concept of row-level security. The idea has some appeal; it allows you to get fine-grained control over who gets access to what data, potentially down to the level of individual rows. Normally, SQL Server security grants coarse access to a whole table or view (SQL Server can also do it for columns), and anything more granular than that requires the use of views or stored procedures.
There is another way to do row-level security without affecting database clients, and that's with Gallium Data - a free database proxy that can change the traffic between your database clients and your database servers.
Each solution has its strong points and weaknesses: let's compare them and find out which one fits which scenarios. The bottom line for the impatient is:
- SQL Server row-level security has the main advantage of being built-in
- Gallium Data has the main advantage of being much more flexible
SQL Server's Row-Level Security
You add row-level security to SQL Server by registering SQL functions, called predicates, which are invoked automatically when accessing a given table. There are two kinds of predicates:
- filter predicates, which are used for select, update and delete operations
- block predicates, which are used for insert, update and delete
When SQL Server receives a SQL command, it executes these predicates along with the operation, and joins the results of the predicates with the relevant tables.
Sample data
We'll use the following trivial database for our examples:
create table demo.customers ( id int not null, name nvarchar(100) not null, country char(2) not null, primary key (id) ) insert into demo.customers values (1, 'Andrea', 'AR'), (2, 'Bruno', 'BE'), (3, 'Charles', 'CA'), (4, 'Daniella', 'DK'), (5, 'Eric', 'ES')
Filter predicates
If we assume that our database users' names start with a region prefix (e.g. EU_jdoe), we can restrict access to this table with:
create function demo.RestrictCustomers(@country as sysname) returns table with schemabinding as return select 1 as res where (@country in ('DK', 'BE', 'ES') and user_name() like 'EU_%') or (@country in ('US', 'CA') and user_name() like 'NA_%') or (suser_name() = 'sa')
This specifies that users whose name starts with "EU_" only get access to customers in Denmark, Belgium or Spain, and users whose name starts with "NA_" only get access to customers in the US or Canada. We also make sure that the user, sa, can always see all customers.
This type of function can be as complex as needed, and can include joins and calls to other functions. Keep in mind, however, that every access to that table will incur the cost of that function, and of the resulting join.
We register our function as a filter predicate with:
create security policy CustomersPolicy add filter predicate demo.RestrictCustomers(country) on demo.customers with (state = on)
We'll need a user to test this:
create user EU_jdoe without login grant select on demo.customers to EU_jdoe
Then we run a simple query like this:
execute as user = 'EU_jdoe' select * from demo.customers revert
We get back only the three customer rows that are in Europe. It works!
Note that access is granted at the row level: there is no way to specify that a specific column in a specific row is off-limits using this mechanism.
Block predicates
Block predicates are similar, but they apply to inserts, updates and/or deletes. There are four kinds: after insert, before update, after update, and before delete. For instance, we can make sure that our users can only insert data in their region with:
alter security policy CustomersPolicy add block predicate demo.RestrictCustomers(country) on demo.customers after insert
This handles inserts, but you'd want to do the same thing for updates and deletes). This will check that the newly-inserted row is visible to the user who inserted it and, if not, it will reject the insert. With this in place, a user can insert a row if it obeys our predicate:
execute as user = 'EU_jdoe' insert into demo.customers values (100, 'Didi', 'DK') revert
but an invalid insert gets rejected:
execute as user = 'EU_jdoe' insert into demo.customers values (101, 'Aaron', 'AR') revert
with the error message:
The attempted operation failed because the target object 'demo.customers' has a block predicate that conflicts with this operation etc...
How Secure is SQL Server's Row-Level Security?
It's not bullet-proof. For instance, user EU_jdoe does not have visibility to customer 3 (who is in Canada), but can detect that the row exists with a cleverly crafted SQL:
execute as user = 'EU_jdoe' select * from demo.customers where 100/(id - 3) = 1 revert
This will fail (intentionally) with the error message: Divide by zero error encountered, which tells EU_jdoe that a row with id=3 exists. You can take this much further if you can run a lot of queries, for instance:
execute as user = 'EU_jdoe' select * from demo.customers where 100/(patindex('Ch%', name) - 1) = 1 revert
This will fail with the same error message, allowing user EU_jdoe to determine that there is at least one customer with name starting with 'Ch', even though that user is not supposed to have any visibility to that customer. It should be obvious that a clever attacker can sniff out data pretty quickly.
Overall, I'd say that row-level security is not likely to stop a malicious actor, and should be used more to complement an application's logic, for instance to externalize data access rules.
Gallium Data's row-level security
Gallium Data takes a very different approach. It acts as a smart proxy between database clients and database servers, and executes user-defined logic (expressed as filters and JavaScript code) that can modify that traffic when appropriate. For row-level security, Gallium Data has two options: either modify the SQL command on its way to SQL Server, or filter the result set coming back from SQL Server.
Request filter - change the SQL command
For instance, a simple request filter can look for the statement:
select * from demo.customers where name like '?'
and rewrite it to:
select * from demo.customers where name like '?' and country in ('DK', 'BE', 'ES')
depending on the current user. This works well if you know in advance what SQL commands to expect. This approach is effective for broad-stroke filtering, and can include more complex rewriting to mask or modify column values, e.g.:
select id, case when country in ('BE', 'DK', 'ES') and user_name() not like 'EU_%' then '<hidden>' when country in ('CA', 'US') and user_name() not like 'NA_%' then '<hidden>' else name end as name, country from demo.customers where name like '?'
Response filter - change the result set
The other (and potentially complementary) approach is to filter the result sets. Since it potentially involves execution of code for each row in a result set, this is obviously a more expensive, but very flexible solution. The JavaScript code for the filter could look something like:
let country = context.packet.country; if (country === 'DK' || country === 'BE' || country === 'ES') { context.packet.remove(); }
This is less efficient than changing the query, but it's much more flexible: we can modify the row, insert new rows, etc. For instance:
let country = context.packet.country; if (country === 'DK' || country === 'BE' || country === 'ES') { context.packet.name = '<n/a>'; context.packet.country ='??'; }
If you have to filter large numbers of rows (i.e. millions), it can be expensive (though computers are really fast these days) to do this one row at a time, which is why result set filters allow you to narrow their definition to be invoked only for certain rows. The filter above would therefore be much more efficient if it was defined with the following parameter: Column patterns: country=DK, country=BE, country=ES
You can also specify that the row should be hidden when the column patterns are satisfied. This allows for filtering without any code -- similar to SQL Server's row-level security. But for more complex logic, JavaScript is the way to go, as it gives you total power.
Summary
The advantages of SQL Server row-level security are:
- Built in: no need for an external system
- Affects all queries and commands, regardless of projections, derivations etc...
- Good performance, depending on the complexity of predicates
When we look at this versus Gallium Data, we see:
- Additional complexity: requires another system that must be kept up
- Works best with known SQL commands
- May increase response time, depending on amount and complexity of logic
The advantages of Gallium Data are:
- SQL commands can be changed in any way, including parameter values
- Rows can be hidden, modified, or added in result sets
- Does not require any special privileges in the database
- Can be targeted to only some SQL commands
- Does not execute in the database, therefore puts no extra load on it
- Can apply to calls on tables, views and stored procedures, and react to any value, whether persisted or computed
- Not database-specific: works also with PostgreSQL, MySQL and MongoDB.
- Can be much more generic: Gallium Data sees all the requests and responses on the wire, so it can apply more general rules (i.e. "mask all columns whose name contains 'price' from users 'EU_*'")
Compared with SQL Server row-level security
- SQL commands cannot be modified
- Can only show or hide entire rows
- Requires high-level database privileges to manage
- Affects all access to the table
- Adds load to database server(s)
- Cannot apply to computed values
- Block predicates not available in AzureSQL
Conclusion: who does it better?
These two approaches have different merits. They are not exclusive: it's possible to use both to implement different aspects of the requirements. SQL Server's row-level security is a good approach if all you need is hiding and showing whole rows based on predicate logic. Access is controlled regardless of the SQL commands, which makes it appropriate in dynamic environments where you don't necessarily know what commands to expect. But if that's the case, you should also be aware that row-level security can be bypassed with relatively little effort, so don't bet the farm on it.
Gallium Data gives you unlimited flexibility -- you can hide rows, but you can also change them, mask values, compute values, even insert new rows into a result set. It works best for environments where you do know what commands to expect, so you can catch them on the wire and modify them or their results.
P.S.
On a related note, you may be interested to see how Gallium Data can work with SQL Server's data classification feature to restrict data access, and how Gallium Data can easily restrict queries based on prior behavior.