Bulk database refactoring: how do you do it?

Question

Bulk database refactoring: how do you do it?

Phil Parkin

SSC Guru

Points: 246986
More actions
November 14, 2014 at 8:46 am

#312554

OK, a very broad title. Let me provide some background.
I've been working on a task to consolidate three databases down to two for the last couple of weeks and have developed a methodology for doing it which relies on a combination of source control, branching, using regular expressions to find and replace and schema comparisons between source files and target database.
You can imagine the complexity: tables, views, functions and procs are being moved from the 'old' database to one of two databases and many are being renamed in the process.
Once I have had a few responses, I will provide more details on how I have done it, as long as
a) People are interested and
b) My method is as good as, or better than, any other suggestions which people come up with. (If not, I won't embarrass myself :-))
Thanks & have a great weekend.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 1

I've had SSDT Solutions with multiple Projects with cross-Project (cross-Database) references and they work fine but I've never tried refactoring across Projects. It may be worth a look.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004410 More actions · Answer 2

Phil Parkin (11/14/2014)
I've been working on a task to consolidate three databases down to two...

That sounds like a lot of work. My question would be WHY? What is the gain here?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Parkin SSC Guru Points: 246986 More actions · Answer 3

Jeff Moden (11/15/2014)
Phil Parkin (11/14/2014)
I've been working on a task to consolidate three databases down to two...
That sounds like a lot of work. My question would be WHY? What is the gain here?

It's a good question. It's really to finish off a partially completed migration from an 'old' database. I think that they got part way through and then realised how big the job was and decided to finish it 'when there is a bit more time'. It's very untidy and the moment, with too many cross-database references, in both directions, along with a lot of objects which are no longer used but cause a lot of clutter.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

uravindarreddy Ten Centuries Points: 1390 More actions · Answer 4

Does all this involve data migration too?

Thats too complex.

Really looking forward for your methodology.

Phil Parkin SSC Guru Points: 246986 More actions · Answer 5

uravindarreddy (11/17/2014)
Does all this involve data migration too?
Thats too complex.
Really looking forward for your methodology.

It does involve data migration and the databases power our websites, so we need to minimise the time taken to apply the changes and move data around, when it comes to running this for real.

When I have a bit of time spare, I will summarise the way in which we have approached it.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Phil Parkin SSC Guru Points: 246986 More actions · Answer 6

OK, the desire to find out how to do this has been only tepid, so I'm not going to spend hours writing things out in great detail. Let's consider one of the simpler things I had to do, to illustrate the overall method.

Requirement: in database A, remove all superfluous database references.

So, scan all code looking for

[DatabaseA].<someschema>.<someobjectname>

or

DatabaseA.<someschema>.<someobjectname>

or

DatabaseA..<someobjectname>

or

[DatabaseA]..<someobjectname>

or

...

And replace it with either

or

dbo.<someobjectname> (if the lazy-coder .. notation has been used)

You get the picture. This simple requirement is not so simple after all.

My methodology works only if you have the database under source control. I've used it with both Redgate Source Control and VB database projects.

The first thing I did was write a simple C# console application which would do a (case-insensitive) regex find and replace, for all files of a certain type, in a nominated folder. It is called Replacer.exe and is executed like this:

Replacer.exe <TargetFolder> <FileType> <Regex search expression> <ReplacementText> <mode>

<mode> is either preview or 'do it'. I won't go into the complexities of Regular Expressions here.

I have a 'code helper' Excel spreadsheet which generates the command lines across multiple folders and for as many replacements are required. I copy the results into a .bat file and run it in preview mode.

If all looks well, I run it in actual mode.

Now, the source files are out of step with the database. A schema compare from source to database will highlight the differences and allow you to preview and validate them easily.

If you are happy with the changes, deploy them to the DB and commit them to source control.

A major benefit of using this method is that there is no need to modify the database while you are developing all of your replacement logic. A second major benefit is being able to use the power of Regular Expressions to help with the work.

Should anyone be interested in some of the finer details, feel free to post a response here.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

uravindarreddy Ten Centuries Points: 1390 More actions · Answer 7

Oh wow! Thats a nice use of Regular Expressions! Never tried before.

I am curious to know about how data is migrated?

The code can be changed but how the existing data is migrated by maintaining all relationships with consistency???

Phil Parkin SSC Guru Points: 246986 More actions · Answer 8

uravindarreddy (12/1/2014)
Oh wow! Thats a nice use of Regular Expressions! Never tried before.
I am curious to know about how data is migrated?
The code can be changed but how the existing data is migrated by maintaining all relationships with consistency???

To minimise the amount of data migration required, we did the following before 'go-live':

Set up replication from 'existing' databases to 'new' databases.

Then, at go-live time:

1) Shut down web site.

2) Ensure replication has 'caught up'.

3) Remove replication.

(At this point, the 'old' databases are effectively your backups in case anything goes wrong from here on.)

4) Run schema compare (new source code to new databases) and apply updates to apply the new schemata to the new databases.

(Being very careful with existing tables: tables or columns which have been renamed need to be handled specifically, as a generic schema compare will drop the old one and then create the new one).

5) Run SSIS packages to migrate data, for tables which have moved from one database to another.

(Obviously, these would have been developed and tested before go-live.)

6) Make old databases read-only, or take them off-line.

(We don't want anything which we've forgotten to update (hard-coded SQL in SQL Agent jobs, in-line queries in web pages ...) to carry on working and updating the wrong database. Better to get an error and fix it quickly.)

7) Update web configs to point to new databases

8) Bring web site back up.

9) Monitor everything. In particular keep an eye on query performance, focusing on the top-10 worst queries and trying to work out whether the migration is to blame.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

uravindarreddy Ten Centuries Points: 1390 More actions · Answer 9

Could you please elaborate on points 4, 5 & 6.

How the source database columns and destination database columns mapped? How the data is transferred using SSIS? Which technique ( eg bulkcopy or bcp or etc) is used? How much data is involved?

Phil Parkin SSC Guru Points: 246986 More actions · Answer 10

Both Redgate SQL Compare and VS Schema Compare provide the ability to preserve data in the case where object renaming has taken place.

In the case of SQL Compare, this is done through the use of the 'Table Mapping' tab in your schema comparison.

In the case of VS, through the use of F2 / rename and refactor logs.

Where table have been moved, standard SSIS dataflows are used, with fast load set to true.

Some of the tables held millions of rows, and for these all indexes were removed before performing the inserts and then recreated afterwards – this approach was generally faster.

The smaller tables were loaded with direct dataflows. Remember, as the tables were empty in the new database (as they've just been created), these were inserts only. Simple and fast.

Make a database read-only something like this:

use master;

go

alter database [DBname]

set single_user

with rollback immediate;

go

alter database [DBname]

set read_only

go

alter database [DBname]

set multi_user

go

And to take a database off-line:

alter database [DBname] set offline with

rollback immediate

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel