SSMS 17.5 includes new classification options for columns that contain sensitive data. This is a valuable tool for better protecting data, and I'm glad that this was added to the platform. There are additional features being added in Azure SQL Database and SQL Server 2019+, but the addition of classification in SSMS works with all previous versions of SQL Server since it uses extended properties.
Note that this metadata doesn't include sensitive information, but it does provide information to anyone developing code in this database about where certain data is stored that has been classified. A few clients have asked that the classification properties be removed in development enviroments, as they consider the metadata itself a security risk. I don't agree, but they feel this is the case, so their concern is how they can remove the classifications before allowing a copy of the database into non production environments.
This article will look at how we can find the extended properties related to classifications and remove them from the database. This does not address the sensitive data itself, or the pseudonymization of this data, but that certainly should be something that is addressed before providing copies of databases to developers.
Finding Classifications
One of the easy ways for me to detect the properties being used is with SQL Source Control. This add-in for SSMS detects changes to my database that haven't been committed to my Version Control System (VCS) and displays them. I'll use that to find out the property values.
If you don't have a copy of SQL Source Control, I'm sorry, though I'll do the hard work you can use in the scripts below. If you want more information on SQL Source Control, you can see it here.
When I first check my Commit tab in SQL Source Control, I find nothing different from my VCS.
Let's now use the SSMS Data Classification tool. I'll right click the database and select Tasks and then Classify Data. If you've never used this, you can read a short blog I wrote on this tool.
When the tool opens, I see the Data Classification tab for my database. This takes a moment to refresh and scan your database for potential data classification recommendations. As you can see below, I have 17 columns that recommendations.
Clicking the bar shows me the various schemas, tables, and columns with data that I might want to classify, along with some suggested types and labels.
I can change the types and labels, but I just want data about the properties, so I'll click the checkbox at the top to select all items, and then click the "Accept" button.
Once I do this, the properties are built, but not applied. I need to click Save at the top to write these properties for the columns.
Once this is done, I'll return to my SQL Source Control tab. When this refreshes, I'll see a number of changes to objects. If I choose one and look at the differences, I can see the calls to the stored procedure, sp_addextendedproperty, that will attach these properties to the columns.
If I look closer, I can see there are 4 different names being used for these properties. The names are:
- sys_information_type_id
- sys_information_type_name
- sys_sensitivity_label_id
- sys_sensitivity_label_name
I don't need to save these changes, but I'll leave this tab alone. Now I have the data I need to start building a removal process.
Removing Extended Properties
Extended Properties are written to system tables using stored procedures. We don't get DDL, but instead need to build a series of calls to sp_dropextendedproperty that will remove these properties.
The structure of this stored procedure needs the name of the property, which we have from the section above. The procedure also needs schema, object, and column names. We can get those by joining together a number of system tables. I'll start this by querying the extended properties DMV.
SELECT e.[name] AS 'exprop', * FROM sys.extended_properties e WHERE e.[name] IN ('sys_information_type_id' , 'sys_information_type_name' , 'sys_sensitivity_label_id' , 'sys_sensitivity_label_name')
When I run this, I see a bunch of information.
I see my property names and values on the right side of this image. On the left, I see a major_id and a minor_id. These are the object_id and column_id of the object, so let's use these to get more data.
We can join to sys.objects first to get the name of the table. From there, we add in a join to sys.schemas to get the schema name. Remember, our stored procedure uses the human readable names, not the IDs.
SELECT e.[name] AS 'exprop', s.name AS 'Schema', o.name AS 'table' FROM sys.extended_properties e INNER JOIN sys.objects o ON e.major_id = o.object_id INNER JOIN sys.schemas s ON o.schema_id = s.schema_id WHERE e.[name] IN ( 'sys_information_type_id' , 'sys_information_type_name' , 'sys_sensitivity_label_id' , 'sys_sensitivity_label_name' );
We have most of our information and now need to get the column. The sys.columns DMV includes the column_id, which seems to match the minor_id. Let's join on object_id and column_id. This gives me a query that will get the data we need. When we run this, we see what looks like the information we need. First the code, then the results.
SELECT e.[name] AS 'exprop', s.name AS 'Schema', o.name AS 'table', c.name AS 'column' FROM sys.extended_properties e INNER JOIN sys.objects o ON e.major_id = o.object_id INNER JOIN sys.schemas s ON o.schema_id = s.schema_id INNER JOIN sys.columns c ON o.object_id = c.object_id AND e.minor_id = c.column_id WHERE e.[name] IN ( 'sys_information_type_id', 'sys_information_type_name', 'sys_sensitivity_label_id', 'sys_sensitivity_label_name' );
With this information, we can now construct the calls to sp_dropextendedproperty. We can't do this in one batch, so let's build a process.
Building a Process
This is a process that will run rarely, mostly to prepare a database for use by others. Even in a large database, the likelihood there are more than 10s of thousands of rows to remove is low. That means my choice of using a cursor here is not likely to become a problem.
I'll construct a cursor and then use the data from the cursor to make the calls to my stored procedure to remove the classifications. The code is simple, with my query above being the basis for the cursor. Here is the code:
DECLARE @schemaname sysname, @tablename sysname, @columnname sysname, @propertyname sysname; DECLARE xpropxurs CURSOR for SELECT e.[name] AS 'exprop', s.name AS 'Schema', o.name AS 'table', c.name AS 'column' FROM sys.extended_properties e INNER JOIN sys.objects o ON e.major_id = o.object_id INNER JOIN sys.schemas s ON o.schema_id = s.schema_id INNER JOIN sys.columns c ON o.object_id = c.object_id AND e.minor_id = c.column_id WHERE e.[name] IN ( 'sys_information_type_id', 'sys_information_type_name', 'sys_sensitivity_label_id', 'sys_sensitivity_label_name' ); OPEN xpropxurs FETCH NEXT FROM xpropxurs INTO @propertyname, @schemaname, @tablename, @columnname WHILE @@FETCH_STATUS = 0 BEGIN EXEC sys.sp_dropextendedproperty @name = @propertyname, -- sysname @level0type = 'schema', -- varchar(128) @level0name = @schemaname, -- sysname @level1type = 'table', -- varchar(128) @level1name = @tablename, -- sysname @level2type = 'column', -- varchar(128) @level2name = @columnname; -- sysname FETCH NEXT FROM xpropxurs INTO @propertyname, @schemaname, @tablename, @columnname END DEALLOCATE xpropxurs GO
When I run this, I get no output, though I could easily change that by adding a SELECT or PRINT inside the WHILE loop. I just see this as the "commands completed successfully".
The true test is to see if the properties still exist in the database. I'll start by checking SQL Source Control. When I click that tab, it refreshes and I see this:
All of my changes were removed from the database. If I close and re-open the Data Classification tab, I also see there are no columns classified.
Conclusion
This article has shown a process that will remove all Microsoft data classifications from a database. This only removes those classifications, and only those that are stored in Extended Properties using the SSMS Data Classification tool. Using this process, you can clean your databases quickly if you no longer want to keep this data in your database.
Note: In SQL Server 2019+, there are additional ways to classify data, which this process does not address.
While this is built to remove specific extended properties, this could be adapted to remove other properties as well. The structure of the calls to sp_dropextendedproperty change depending on where the property is applied. For the database or table level properties, less data is needed from the DMVs and less is used in the stored procedure call.