SQL Data Generator–Masking Production Data

Steve Jones, 2015-10-13 (first published: 2015-09-30)

I learned a new trick with SQL Data Generator that I wasn’t aware of previously. I think this is a good idea for masking some of that production data that you might not want developers to have.

Let’s start with a production table. In my case, I’ve created a Sandbox_Prod database with a table in it for employees. I’ve added a few records that contain some sensitive information.

CREATE TABLE Employees
(
empid INT IDENTITY(1,1)
, EmployeeName VARCHAR(250)
, EmpployeeEmail VARCHAR(250)
, active TINYINT
, salary money
, pwd VARBINARY(max)    
)
;
GO
INSERT INTO Employees
VALUES  ( 'sjones', 'sjones@sqlservercentral.com', 1, 10000, ENCRYPTBYPASSPHRASE('The User Sample', 'MyS%83ongPa44#word')) 
     ,  ( 'awarren', 'awarren@sqlservercentral.com', 1, 20000, ENCRYPTBYPASSPHRASE('The User Sample', 'Ano$2therS%83ongPa44#word') )
     ,  ( 'rsmith', 'rsmith@sqlservercentral.com', 1, 5000, ENCRYPTBYPASSPHRASE('The User Sample', 'Kedj93m@@83ongPa44#word'));
GO

I’ve got a second database, called Sandbox, that simulates my development environment. I’ve got the same table, but without any data in it.

What I want to do is move some of the production data to my development area, but not all of it. Some of the production data needs to be masked.

SQL Data Generator Sources

I can use SQL Data Generator from Redgate to do this, by using a data source that actually exists. In this case, I’ll create a new project and point it at my Sandbox database. I’ve deselected all of the tables except my Employees table.

When I pick my Employees table, I see the familiar generation screen on the right. Most of you are like me and notice the number of rows and the option to delete data.

However there’s another option. I can select the “Use existing data source” radio button instead. When I do this, I have a few choices for data. I can use an existing table or a CSV file. Both of those can be good choices, especially if I have sets of data I want to load into the table. Either one can help me to build known, specific data sets for development (or testing).

In my case I will choose and existing table. When I do this, I click the “Browse” button and I get a connection dialog for SQL Server. I pick my instance and the production database.

I click “Next” and then get the chance to select the table to use. In this case, I’ll pick the Employees table.

When I return to the main SDG screen, I see the table listed as the source, but my preview shows the actual production data. This is because I’ve mapped the production table as a source, and it will be used as it currently exists.

That’s not what I want. I want to mask the email address and the salary. However, now I can change things like I might do for any random data generation.

Let’s first click in the EmployeeEmail column. When I do that, I see the following, the column with its source set as the existing column in the production table.

However the drop down gives me lots of choices, including an Internet email generator.

If I select, then my preview changes. Now the image below shows production data for all columns other than the email.

I can repeat this for the salary (and password to be safe). When I do that, I’ll see random data for those columns and production data for others.

I can repeat this for all tables in my project, mapping through data that isn’t sensitive, and masking data that is. It’s a tedious process, but it’s a one time process for specific data. Once this is done, every restore can have the project run and the data masked. If production DBAs do this refresh, then developers never see sensitive information

Filed under: Blog Tagged: Data Generator, Redgate, syndicated

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

SQL Data Generator–Masking Production Data

Rate

Share

Share

Rate

SQL Data Generator–Masking Production Data

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts