SQLServer 2005 SSIS Replication Database with customized Tables & Columns

Question

Post reply

SQLServer 2005 SSIS Replication Database with customized Tables & Columns

princa

SSCommitted

Points: 1819
More actions
May 3, 2010 at 4:11 pm

#142807

Hi folks,
I am a SSIS beginner and I am exploring SSIS right now for some projects.
The logic is very simple:
Copy the whole DB1's to DB2, except some tables' some columns.
So, I choose Transfer SQL Server Objects Task to replicate the whole DB with all the indexes, sp, views, which is awesome.
However, I couldn't specify which columns in the tables I could like to include and which not. So if I copy the table, then I need to copy all the columns of that table.
Then I choose dataflow to transfer data, then I could specify the columns, but however, I couldn't replicate the indexes, which is annoying.
Does any body know better solution for such problem? Thanks very much! 😀

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 1

I'm curious what you are trying to accomplish.

Transfer objects and data-flow components have very different purposes.

If you tell us more about the problem you are trying to address we can probably answer your question better..

CEWII

princa SSCommitted Points: 1819 More actions · Answer 2

Sure.

My purpose is:

1. Replicate whole DB1 to DB2 with all the tables, views, sps, indexes, user role settings, BUT EXCEPT table1's 2nd, 3rd columns.

After that:

2. Update only DB2's table3 to DB1.

After that:

3. Setup time interval as 30mins after second action, then update whole DB1 to DB2 again.

It's kind of my regular operation process.

Please help me out with the best SSIS solutions. Thanks very much. 😀

Phil Parkin SSC Guru Points: 247224 More actions · Answer 3

If all of DB2's data has been copied from DB1, what is the reason for Step 2?

How complex is the db? Have you considered BACKUP / RESTORE / ALTER TABLE as a possible method?

princa SSCommitted Points: 1819 More actions · Answer 4

I might clarify again, sorry for that.

1. DB1 is maintained by other group, they might update DB1 regularly, so we need to replicate their updates regularly to DB1.

2. DB2 is ordinarily using database by us, and only DB2's table3 could be updated, and push back to DB1.

3. DB1 will receive DB2's table3 updates and then run the logic to update other DB1's tables.

4. Then like I said, after 30 mins push back to DB1, we need to replicate DB1 to DB2 to get whole new DB.

5. Database is 100+ tables, and data in each table are 100k+.

6. Need to schedule a SSIS package and setup to Windows Jobs for automatic operation.

Thanks.

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 5

For the initial copy I would probably just do a backup/restore. For the others I would probably build SSIS packages to copy the data back and forth. The issue I see is how do you know what has changed?

CEWII

princa SSCommitted Points: 1819 More actions · Answer 6

Elliott W (5/4/2010)
For the initial copy I would probably just do a backup/restore. For the others I would probably build SSIS packages to copy the data back and forth. The issue I see is how do you know what has changed?
CEWII

Good Point dude!

1. I just don't want to separate tasks into to two places, therefore, I would like to combine all the operations in SSIS package, though still might need to split into two or three dtsx files.

2. There are two kinds of updates for DB1,

one: is "HIDEN" for me, which means maintained by other groups, and I don't know when they will do the update, so I need to schedule a job for every night or every weekend's replication. (Here I am visioning that I should backup DB2, then delete current DB2, and then do the replication from DB1.)

Two: is maintained by me, but the regular operations are all SELECT, only table3 could be updated, and push back to DB1.

3. SO, any updates from DB2 in table3, say I will push back to DB1 everynight or every weekend before I do the replication from DB1.

4. Since I push DB2's table3 back to DB1, then other groups will update DB1's whole tables according to my DB2's table3 update, which will take 30 mins after my push back. So that's why I need to wait 30 mins after my push back to DB1, then do the replication from DB1 to my DB2.

My question is:

1. Since transfer SQL server objects task couldn't specify which table's columns I could copy or not, the level only on table level, not on column level. Because I don't want to replicate whole DB1, and I shouldn't do that according to the requirement, thus, which is the best way to filter table's column when I do the replication from DB1 to DB2, but at the same time keep all the indexes at least.

Thanks. 😀

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 7

First I want to stop using the term replication, the reason is that it implies some things that aren't in play. In SQL Server there are several types of replication, like transactional, merge, and snapshot, we aren't talking about any of those.

What it sounds like is that you have a database that is largely maintained by another group but that you periodically need to get updated data from and push data back to.

How big is this database?

I not really sure what your biggest concern is here, keeping the structure the same or maintaining the data. You've mentioned indexes a couple times, why do you care what indexes are used on the main database, shouldn't you be more worried about what works for you and periodically just refreshing the indexes so you have a semblence of the same structure. Also why don't you just update the main server instead of what seems like a convoluted process?

As an aside how much of the main database do you need? If you only need a few tables then just copy those whole. The transfer task will not do a partial table, it does all or none. How often do the structures change? And are you making too big a deal out of that?

Sorry, I have more questions than answers.

CEWII

princa SSCommitted Points: 1819 More actions · Answer 8

First Thanks for your fast response and more questions could help us to clear all the confusions, which is good. 😀

You are right that REPLICATION is not using here at all, we are talking about more or less related with backup/restore.

1. The DB1 is around 300M+, contains 100+ tables, and each table contains 100k+ data, and is increasing everyday. So it's pretty big and I need all of them.

2. Since I need all of them to work so I need to backup/restore DB1 to my DB2 entirely. However, when I am doing COPY from DB1 to DB2, some tables' columns I shouldn't copy to DB2, one simple reason is data is sensitive, so I shouldn't COPY them to DB2, let's say table2's 1st column and table3's 2nd column. That's why I am concerning partial restore not the full. And I am looking for a better way to handle it.

3. DB1 is not much that often be changed structure, but by requirement, I need to be able to restore DB1 to DB2 everytime, so that we won't concern when and how it will change, because every time I COPY from DB1 to DB2, I will get the newest DB1.

4. Since the most operations in DB2 are SELECT, so index is important for me to increase the performance, and since there are 100+ tables, and in DB1, the indexes are settled down already, and maintained by other groups, so I'd like to COPY the indexes into DB2 and keep using them without rebuild after COPY from DB1.

Therefore, in this case, could you give me some advices?! Thank you very much.:D

Phil Parkin SSC Guru Points: 247224 More actions · Answer 9

Phil Parkin

SSC Guru

Points: 247224

May 4, 2010 at 10:28 am

#1161934

Are both these databases on the same server?

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 10

Ok, 300M is nothing, even 5-10GB is really not all that big in my opinion.

Your concern over sensitive data is good. I see a couple options, both of which have costs. After your restore you could immediately delete those columns from the schema, this would make them unavailable to SQL, there may be some data still in the pages on disk, but short of looking at it page by page that should be ok. The other option is to do a mass update on the sensitive columns where you effective blank them out. I work in an environment with a fair amount of sensitive data and in that case the DBA does a restore and then the data is scrubbed and then it is available to users.

My issue about indexes isn't that you don't need them but tied to the fact that the structure doesn't change often any indexes you get the first time will probably be the same for a while so picking them out of any other structure is probably not a necessary focus.

By your questions it appears that there might be some confusion on what comes with a backup.. When you do a backup and restore you get EVERYTHING that was in the databases, structure, code, indexes, statistics, EVERYTHING. So when you when you do your next restore you will still have then indexes because the exist in DB1, what you would lose is any indexes you might have created on DB2.

Does this clarify things for you?

CEWII

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 11

Elliott Whitlow

SSC Guru

Points: 102402

May 4, 2010 at 10:32 am

#1161938

Good question Phil.

CEWII

princa SSCommitted Points: 1819 More actions · Answer 12

Phil Parkin (5/4/2010)
Are both these databases on the same server?

No Phil, they are far away from each other. 😀

princa SSCommitted Points: 1819 More actions · Answer 13

Thanks very much Elliott.

As I understand your advices combining with my thought, please correct me if I am wrong in the following:

Assume I don't have DB2 at all now, and restore from DB1 as the first step.

1. Use SSIS package - Transfer SQL Server objects Task to restore DB1 to DB2 entirely.

2. After it, delete columns which I shouldn't have in DB2 in SSIS package (though I do not look into it yet, and don't know how to setup that.)

3. Do the regular operations on my DB2.

4. Update the only UPDATEDALBE table (Table3) in DB2 and push back to DB1 by using SSIS package - data flow task, just append all data from DB2 to DB1. (Reason: I will never delete data in DB2 in table3, and each time I get the latest update from DB1, any insert will get the greatest ID of current table, and since no one will touch DB1's table3, thus by appending data to DB1 won't cuz duplicate key issue.)

5. After push back table3 from DB2 to DB1, wait 30 mins, which could be done in SSIS with a empty loop with 30 mins setup.

6. Backup DB2 on local server in SSIS package by using - backup database task.

7. Restore DB2 from DB1 by using - Transfer SQL server object task.

(One question: initially, I don't have DB2, so restore will create tables first. Now Do I need to delete DB2 and then restore DB2 from DB1 by creating new in order to get the latest structure? or Transfer SQL server object task could do that automatically?)

8. After restore, do the delete sensitive columns again.

I think it's pretty much the process which I need to finish the logic. Is that correct?

Thanks 😀

Phil Parkin SSC Guru Points: 247224 More actions · Answer 14

Do your backups and restores using Execute SQL tasks.

After backing up on server 1 you'll need to transfer the file to server 2 before it is restored. I would probably use a script task to do the file transfer.

You can restore over the top of another database - no need to delete it first.