SSIS I/O

Question

SSIS I/O

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

John Rowan SSC Guru Points: 56440 More actions · Answer 1

Rowan (8/23/2009)
Please note that you don't necessarily need to use staging tables becasue of complex queries. Moving a pacakge that uses a staging table to not need one should not change the complexitiy of your source queries and any comparision that you would do in T-SQL using staging tables can be done inside SSIS memory without using the database files (much).
Hi John
I don't really understand or know how I would use SSIS memory instead of staging tables(database files).
Is there a certain way of doing this?
Regards

Elliot has answered correctly. That is exactly what I meant. I see many folks using staging tables in the database as part of their SSIS ETL processes when they are most of the time not needed. SSIS is a full ETL (Extract-Transform-Load) tool where you have the ability to perform all of the data transformations before loading the data into the database.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 2

Elliott Whitlow

SSC Guru

Points: 102402

August 25, 2009 at 10:18 am

#1044343

Thanks.

CEWII

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 3

Thanks Guys for all the replies.

I am currently currently changing one of my more complex existing packages that does use staging tables to mainly using transformations. There are times where staging tables are needed but these are few and far between, and by using the transformations I can now reduce the use of staging tables substantially.

Thanks again.

Regards

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 4

Hi

I have just one more question, I hope

I have now implemented a complex package without using staging tables and it works prefectly.

I am now busy on another package which is less complex but uses a large volume of data.

I draw data from a DB2 DB (24 mil records)

Is it best to draw this large amount of data into a staging table first, or is it bets to use the DataReader Source in the pipline from the get go.

Thanks

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

Probably depends on what you are doing with the data after you suck it out of DB2. If you are just shoving it into a final resting spot perhaps with some simple transforms, I wouldn't use a stage. I might also look for options that allow for effective parallelization of the stream as well.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 6

I have a couple questions myself..

1. How often do you run this process (or will you)?

2. Do you need to pull all 24M records EVERY time?

3. If you need all the records can you build a delta process to only get the new or updated ones?

I just can't see pulling 24M each time this runs unless it is 24M new or updated records. I have to question the underlying plan.

CEWII

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 7

The problem with using delta operations is that I am drawing from a DB2 DB and SSIS only allows for a lookup transformation from a SQL Server source, to do an incremental load

I will only need to draw the data once a month.

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 8

Is there some field in the data that reliable shows an update date? If so then keep track of the last date and use it for the query. You don't have to use a lookup in this case, at least not at the beginning, if you use one there then you have to suck all 24M rows down and at that point you have already lost the battle.

We can also take things we know into account, like there are 5-15K records added each day and they have a process date that shows the day they were processed. Also, we know that records in this table are NEVER changed. This gives us a basis to say we can look at the data we have and get the max process date and then ONLY get the stuff after that. We have just limited our pipeline to 5-15K records using just what we know. Clear?

I really think your goal should be to cut down the amount of data over the wire. I've dealt with about as much data as your are talking about in DB2 and pulling it down to SQL. I had to find a way to tell what had changed to get any kind of reasonable performance.

CEWII

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 9

Is your database on simple recovery mode?

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 10

Rowan-283474

SSCommitted

Points: 1560

September 9, 2009 at 4:15 pm

#1051480

No it's in full recovery mode

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 11

Rowan (9/9/2009)
No it's in full recovery mode

Best if you switch it to simple recovery then; it doesn't look like you are taking transaction-log backups (are you?), and switching to SIMPLE recovery will help keep your log-file size in check, while also alleviating some of the performance issues you are having.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]