SSIS I/O

Question

SSIS I/O

Rowan-283474

SSCommitted

Points: 1560
More actions
August 17, 2009 at 9:47 am

#385197

Hi,
I do exception reporting for a Large Life Insurance company's risk management dept.
I currently have over 600 packages running.
This is the basic concept of how things work
...
I pull data from an AS400 and then run queries on this drawn data, which I then create an exception report from.
Some reports that I create do use the same data that I pull from the AS400
Now because alot of these reports that I create are critical for the business, I can't really store data and re-use the data for other similar reports because the data changes frequently and this is why I need to draw data very often.
Now this obviously creates alot of I/O's and my server is starting to take strain, because and any given moment I have at least 10 or more packages/jobs running.
I am thinking about going over to a SAN infrastructure.
Currently I have a qaud core, windows 2003, 7gig RAM server
Is there anything that I can do in SQL Server that will help with handling the amount of I/O's I have.
Kind Regards

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 1

Are you re-pulling the data every time? If so is there any way to perform more of a delta operation, this would significantly reduce I/O and latency and increase re-use. You could also batch reports that use the same base data together to achieve better re-use.

CEWII

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 2

Rowan-283474

SSCommitted

Points: 1560

August 17, 2009 at 11:08 am

#1040369

Hi Elliot

Delta operation?

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 3

Only pull over what has changed as opposed to a full copy each time. If you can tell that a record has changed easily then all you do is pull those.

CEWII

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 4

Thanks, I will visit this option, but because I'm drawing data from a DB2 database I will have to see how this can be done.

Just to give you an idea, my log file grows from 2 MB to 60Gig in about 24 hours.

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 5

The methodolgy is the same regardless of the source. Try to find a column that reflects a date that the field was last changed in any way, if you can find a field like that then you can query your database for the last change date and look for records on the source that are later. Or you can use business information such as no record changes after three months so that any record older than three months doesn't need to be updated or pulled again after three months and a day. The goal is to limit the amount of data pulled over the wire.

CEWII

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

Rowan (8/19/2009)
Thanks, I will visit this option, but because I'm drawing data from a DB2 database I will have to see how this can be done.
Just to give you an idea, my log file grows from 2 MB to 60Gig in about 24 hours.

1) DO NOT SHRINK YOUR LOG FILE! Just leave it where it is, because it will grow back. Also, what are your growth increments for database and tlog? Have you checked for OS-level file fragmentation? I sure hope you don't shrink your database too!

2) As someone else suggested, it would sure be nice if you could just load changed information as needed (or on some interval). Wonder if you can do replication from DB2 to SQL Server. Or perhaps a linked server mechanism?

3) how many spindles are underlying your existing IO subsystem? Do you have separate spindle sets for data and logs?

4) how much total data? 7GB isn't very much RAM for a database server.

5) likewise a single CPU isn't much either, especially with so much concurrent activity.

6) have you done any IO stall analysis or waitstats analysis to determine where the 'slowness' truly lay?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

John Rowan SSC Guru Points: 56440 More actions · Answer 7

Rowan,

A couple of things. First, I like you name! Secondly, all of the advice given here is spot on. One thing I'd like to mention in addition to what everyone else has said is that you may want to consider what the souce of the I/Os may be.

An ETL operation via SSIS should only need to consume I/O resources for lookups (much of which may be in the buffer) and to write the final data set into your destination. If you are creating staging tables as part of your ETL operation, you are consuming additional I/Os and most likely un-necessarily burdening the database engine and it's underlying disk subsystem. Most of the work that SSIS needs to do should be kept in the SSIS memory space and using staging tables will at least double the I/O overhead.

Also, like SQL Guru stated, 7 GB is not much for a database server. Especially if you are running SSIS on the same box. Ideally, you'll want to give SSIS it's own slice of memory so it's not competing with the database engine.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 8

TheSQLGuru (8/20/2009)
Rowan (8/19/2009)
Thanks, I will visit this option, but because I'm drawing data from a DB2 database I will have to see how this can be done.
Just to give you an idea, my log file grows from 2 MB to 60Gig in about 24 hours.
1) DO NOT SHRINK YOUR LOG FILE! Just leave it where it is, because it will grow back. Also, what are your growth increments for database and tlog? Have you checked for OS-level file fragmentation? I sure hope you don't shrink your database too!
2) As someone else suggested, it would sure be nice if you could just load changed information as needed (or on some interval). Wonder if you can do replication from DB2 to SQL Server. Or perhaps a linked server mechanism?
3) how many spindles are underlying your existing IO subsystem? Do you have separate spindle sets for data and logs?
4) how much total data? 7GB isn't very much RAM for a database server.
5) likewise a single CPU isn't much either, especially with so much concurrent activity.
6) have you done any IO stall analysis or waitstats analysis to determine where the 'slowness' truly lay?

Hi,

1) Yes I notice that sometimes the log file grows very quickly and then sometimes it doesn't. When it gets to a point where it reaches about 100GB I have to shrink it to free up HD space. The growth increment is 10%. I have not checked my OS-level file fragmentation. I'm not actually the DBA on the server but will request this if it has not been done yet.

No I do not shrink my DB aswell, I have heard this is not always the right thing to do, I take it this causes fragmentation?

2)Could I maybe, using a linked server use the

3) I'm not sure... I will find out

4)Total amount of data on the server is about 90GB including log files

5) Point noted

6)I'm in the process of doing this, just looking to find a tool to do this for me?

Thank you for your reply, all of the above points have been noted

Your input is appreciated

Regards

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 9

Sorry, I did not complete point 2) of my reply. Here it is

2)Could I maybe, using a linked server use the 'Lookup Transformation' and 'Conditional split' tasks in SSIS, to do an incremental data load?

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 10

John Rowan (8/20/2009)
Rowan,
A couple of things. First, I like you name! Secondly, all of the advice given here is spot on. One thing I'd like to mention in addition to what everyone else has said is that you may want to consider what the souce of the I/Os may be.
An ETL operation via SSIS should only need to consume I/O resources for lookups (much of which may be in the buffer) and to write the final data set into your destination. If you are creating staging tables as part of your ETL operation, you are consuming additional I/Os and most likely un-necessarily burdening the database engine and it's underlying disk subsystem. Most of the work that SSIS needs to do should be kept in the SSIS memory space and using staging tables will at least double the I/O overhead.
Also, like SQL Guru stated, 7 GB is not much for a database server. Especially if you are running SSIS on the same box. Ideally, you'll want to give SSIS it's own slice of memory so it's not competing with the database engine.

John,

I like your name too! 🙂

Thanks for your reply

I do use staging tables often, and the reason being for this is, that alot of our queries are complex and using one huge query instead of staging the query is not always possible. Alot of the times we need to do comparisons between yesterday's an today's data and I need staging tables for these type of comparisons.

I will try use as little staging tables as possible from now on in, as I do realise the IO implications of using them.

In terms of giving SSIS it's own slice of memory, I have not visited this option yet and I will see what I can do about this. I am not the DBA of the server so all this is new to me.

Thanks again for your reply it is appreciated

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

A) yes, shrinking database causes severe fragmentation

B) Not an SSIS guru - others can advise on best mechanism for doing incremental loads

C) Since you have issues with tlog growth and disk space, 2 options come to mind:

1) do the 'scratch' work in another database that is in simple mode, which as long as you don't have one huge overarching transaction will flush out committed work regularly to prevent unchecked growth.

2) regularly backup the log in your current database (which I presume is in FULL recovery mode) while doing your reporting work to flush out tlog bloat

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

John Rowan SSC Guru Points: 56440 More actions · Answer 12

Rowan (8/21/2009)
John Rowan (8/20/2009)
Rowan,
A couple of things. First, I like you name! Secondly, all of the advice given here is spot on. One thing I'd like to mention in addition to what everyone else has said is that you may want to consider what the souce of the I/Os may be.
An ETL operation via SSIS should only need to consume I/O resources for lookups (much of which may be in the buffer) and to write the final data set into your destination. If you are creating staging tables as part of your ETL operation, you are consuming additional I/Os and most likely un-necessarily burdening the database engine and it's underlying disk subsystem. Most of the work that SSIS needs to do should be kept in the SSIS memory space and using staging tables will at least double the I/O overhead.
Also, like SQL Guru stated, 7 GB is not much for a database server. Especially if you are running SSIS on the same box. Ideally, you'll want to give SSIS it's own slice of memory so it's not competing with the database engine.
John,
I like your name too! 🙂
Thanks for your reply
I do use staging tables often, and the reason being for this is, that alot of our queries are complex and using one huge query instead of staging the query is not always possible. Alot of the times we need to do comparisons between yesterday's an today's data and I need staging tables for these type of comparisons.
I will try use as little staging tables as possible from now on in, as I do realise the IO implications of using them.
In terms of giving SSIS it's own slice of memory, I have not visited this option yet and I will see what I can do about this. I am not the DBA of the server so all this is new to me.
Thanks again for your reply it is appreciated

Please note that you don't necessarily need to use staging tables becasue of complex queries. Moving a pacakge that uses a staging table to not need one should not change the complexitiy of your source queries and any comparision that you would do in T-SQL using staging tables can be done inside SSIS memory without using the database files (much).

As far as the SSIS memory goes, it is just another process thread running on the box and needs CPU and memory to operate optimally. If you have 8 GB of memory in your DB server and you've configured your database instance to grab 7 GB, you've left only 1 GB for the OS and SSIS to share. That's not much in terms of memory for a database server.

Let's say you had 16 GB in that server. You could leave 2 for the OS, 4 for SSIS, and configure the SQL Server's Max Memory setting to 10 GB for the database instance.

Anyhow, configuring the Max Memory setting is what I was referring to when I mentioned SSIS having its own memory. It always has it's own, but just like a DB instance, it needs to have enough to do the job. You may check Windows performance monitor and see how much page file usage SSIS is causing.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Rowan-283474 SSCommitted Points: 1560 More actions · Answer 13

Please note that you don't necessarily need to use staging tables becasue of complex queries. Moving a pacakge that uses a staging table to not need one should not change the complexitiy of your source queries and any comparision that you would do in T-SQL using staging tables can be done inside SSIS memory without using the database files (much).

Hi John

I don't really understand or know how I would use SSIS memory instead of staging tables(database files).

Is there a certain way of doing this?

Regards

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 14

By doing transformations in the dataflow from the source to the destination the data doesn't get written to disk before it ends up at the destination. Since there are very few things that can't be readily accomplished in a dataflow task this usually works.

CEWII