SSIS I/O

  • Hi,

    I do exception reporting for a Large Life Insurance company's risk management dept.

    I currently have over 600 packages running.

    This is the basic concept of how things work

    ...

    I pull data from an AS400 and then run queries on this drawn data, which I then create an exception report from.

    Some reports that I create do use the same data that I pull from the AS400

    Now because alot of these reports that I create are critical for the business, I can't really store data and re-use the data for other similar reports because the data changes frequently and this is why I need to draw data very often.

    Now this obviously creates alot of I/O's and my server is starting to take strain, because and any given moment I have at least 10 or more packages/jobs running.

    I am thinking about going over to a SAN infrastructure.

    Currently I have a qaud core, windows 2003, 7gig RAM server

    Is there anything that I can do in SQL Server that will help with handling the amount of I/O's I have.

    Kind Regards

  • Are you re-pulling the data every time? If so is there any way to perform more of a delta operation, this would significantly reduce I/O and latency and increase re-use. You could also batch reports that use the same base data together to achieve better re-use.

    CEWII

  • Hi Elliot

    Delta operation?

  • Only pull over what has changed as opposed to a full copy each time. If you can tell that a record has changed easily then all you do is pull those.

    CEWII

  • Thanks, I will visit this option, but because I'm drawing data from a DB2 database I will have to see how this can be done.

    Just to give you an idea, my log file grows from 2 MB to 60Gig in about 24 hours.

  • The methodolgy is the same regardless of the source. Try to find a column that reflects a date that the field was last changed in any way, if you can find a field like that then you can query your database for the last change date and look for records on the source that are later. Or you can use business information such as no record changes after three months so that any record older than three months doesn't need to be updated or pulled again after three months and a day. The goal is to limit the amount of data pulled over the wire.

    CEWII

  • Rowan (8/19/2009)


    Thanks, I will visit this option, but because I'm drawing data from a DB2 database I will have to see how this can be done.

    Just to give you an idea, my log file grows from 2 MB to 60Gig in about 24 hours.

    1) DO NOT SHRINK YOUR LOG FILE! Just leave it where it is, because it will grow back. Also, what are your growth increments for database and tlog? Have you checked for OS-level file fragmentation? I sure hope you don't shrink your database too!

    2) As someone else suggested, it would sure be nice if you could just load changed information as needed (or on some interval). Wonder if you can do replication from DB2 to SQL Server. Or perhaps a linked server mechanism?

    3) how many spindles are underlying your existing IO subsystem? Do you have separate spindle sets for data and logs?

    4) how much total data? 7GB isn't very much RAM for a database server.

    5) likewise a single CPU isn't much either, especially with so much concurrent activity.

    6) have you done any IO stall analysis or waitstats analysis to determine where the 'slowness' truly lay?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Rowan,

    A couple of things. First, I like you name! Secondly, all of the advice given here is spot on. One thing I'd like to mention in addition to what everyone else has said is that you may want to consider what the souce of the I/Os may be.

    An ETL operation via SSIS should only need to consume I/O resources for lookups (much of which may be in the buffer) and to write the final data set into your destination. If you are creating staging tables as part of your ETL operation, you are consuming additional I/Os and most likely un-necessarily burdening the database engine and it's underlying disk subsystem. Most of the work that SSIS needs to do should be kept in the SSIS memory space and using staging tables will at least double the I/O overhead.

    Also, like SQL Guru stated, 7 GB is not much for a database server. Especially if you are running SSIS on the same box. Ideally, you'll want to give SSIS it's own slice of memory so it's not competing with the database engine.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • TheSQLGuru (8/20/2009)


    Rowan (8/19/2009)


    Thanks, I will visit this option, but because I'm drawing data from a DB2 database I will have to see how this can be done.

    Just to give you an idea, my log file grows from 2 MB to 60Gig in about 24 hours.

    1) DO NOT SHRINK YOUR LOG FILE! Just leave it where it is, because it will grow back. Also, what are your growth increments for database and tlog? Have you checked for OS-level file fragmentation? I sure hope you don't shrink your database too!

    2) As someone else suggested, it would sure be nice if you could just load changed information as needed (or on some interval). Wonder if you can do replication from DB2 to SQL Server. Or perhaps a linked server mechanism?

    3) how many spindles are underlying your existing IO subsystem? Do you have separate spindle sets for data and logs?

    4) how much total data? 7GB isn't very much RAM for a database server.

    5) likewise a single CPU isn't much either, especially with so much concurrent activity.

    6) have you done any IO stall analysis or waitstats analysis to determine where the 'slowness' truly lay?

    Hi,

    1) Yes I notice that sometimes the log file grows very quickly and then sometimes it doesn't. When it gets to a point where it reaches about 100GB I have to shrink it to free up HD space. The growth increment is 10%. I have not checked my OS-level file fragmentation. I'm not actually the DBA on the server but will request this if it has not been done yet.

    No I do not shrink my DB aswell, I have heard this is not always the right thing to do, I take it this causes fragmentation?

    2)Could I maybe, using a linked server use the

    3) I'm not sure... I will find out

    4)Total amount of data on the server is about 90GB including log files

    5) Point noted

    6)I'm in the process of doing this, just looking to find a tool to do this for me?

    Thank you for your reply, all of the above points have been noted

    Your input is appreciated

    Regards

  • Sorry, I did not complete point 2) of my reply. Here it is

    2)Could I maybe, using a linked server use the 'Lookup Transformation' and 'Conditional split' tasks in SSIS, to do an incremental data load?

  • John Rowan (8/20/2009)


    Rowan,

    A couple of things. First, I like you name! Secondly, all of the advice given here is spot on. One thing I'd like to mention in addition to what everyone else has said is that you may want to consider what the souce of the I/Os may be.

    An ETL operation via SSIS should only need to consume I/O resources for lookups (much of which may be in the buffer) and to write the final data set into your destination. If you are creating staging tables as part of your ETL operation, you are consuming additional I/Os and most likely un-necessarily burdening the database engine and it's underlying disk subsystem. Most of the work that SSIS needs to do should be kept in the SSIS memory space and using staging tables will at least double the I/O overhead.

    Also, like SQL Guru stated, 7 GB is not much for a database server. Especially if you are running SSIS on the same box. Ideally, you'll want to give SSIS it's own slice of memory so it's not competing with the database engine.

    John,

    I like your name too! 🙂

    Thanks for your reply

    I do use staging tables often, and the reason being for this is, that alot of our queries are complex and using one huge query instead of staging the query is not always possible. Alot of the times we need to do comparisons between yesterday's an today's data and I need staging tables for these type of comparisons.

    I will try use as little staging tables as possible from now on in, as I do realise the IO implications of using them.

    In terms of giving SSIS it's own slice of memory, I have not visited this option yet and I will see what I can do about this. I am not the DBA of the server so all this is new to me.

    Thanks again for your reply it is appreciated

  • A) yes, shrinking database causes severe fragmentation

    B) Not an SSIS guru - others can advise on best mechanism for doing incremental loads

    C) Since you have issues with tlog growth and disk space, 2 options come to mind:

    1) do the 'scratch' work in another database that is in simple mode, which as long as you don't have one huge overarching transaction will flush out committed work regularly to prevent unchecked growth.

    2) regularly backup the log in your current database (which I presume is in FULL recovery mode) while doing your reporting work to flush out tlog bloat

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Rowan (8/21/2009)


    John Rowan (8/20/2009)


    Rowan,

    A couple of things. First, I like you name! Secondly, all of the advice given here is spot on. One thing I'd like to mention in addition to what everyone else has said is that you may want to consider what the souce of the I/Os may be.

    An ETL operation via SSIS should only need to consume I/O resources for lookups (much of which may be in the buffer) and to write the final data set into your destination. If you are creating staging tables as part of your ETL operation, you are consuming additional I/Os and most likely un-necessarily burdening the database engine and it's underlying disk subsystem. Most of the work that SSIS needs to do should be kept in the SSIS memory space and using staging tables will at least double the I/O overhead.

    Also, like SQL Guru stated, 7 GB is not much for a database server. Especially if you are running SSIS on the same box. Ideally, you'll want to give SSIS it's own slice of memory so it's not competing with the database engine.

    John,

    I like your name too! 🙂

    Thanks for your reply

    I do use staging tables often, and the reason being for this is, that alot of our queries are complex and using one huge query instead of staging the query is not always possible. Alot of the times we need to do comparisons between yesterday's an today's data and I need staging tables for these type of comparisons.

    I will try use as little staging tables as possible from now on in, as I do realise the IO implications of using them.

    In terms of giving SSIS it's own slice of memory, I have not visited this option yet and I will see what I can do about this. I am not the DBA of the server so all this is new to me.

    Thanks again for your reply it is appreciated

    Please note that you don't necessarily need to use staging tables becasue of complex queries. Moving a pacakge that uses a staging table to not need one should not change the complexitiy of your source queries and any comparision that you would do in T-SQL using staging tables can be done inside SSIS memory without using the database files (much).

    As far as the SSIS memory goes, it is just another process thread running on the box and needs CPU and memory to operate optimally. If you have 8 GB of memory in your DB server and you've configured your database instance to grab 7 GB, you've left only 1 GB for the OS and SSIS to share. That's not much in terms of memory for a database server.

    Let's say you had 16 GB in that server. You could leave 2 for the OS, 4 for SSIS, and configure the SQL Server's Max Memory setting to 10 GB for the database instance.

    Anyhow, configuring the Max Memory setting is what I was referring to when I mentioned SSIS having its own memory. It always has it's own, but just like a DB instance, it needs to have enough to do the job. You may check Windows performance monitor and see how much page file usage SSIS is causing.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Please note that you don't necessarily need to use staging tables becasue of complex queries. Moving a pacakge that uses a staging table to not need one should not change the complexitiy of your source queries and any comparision that you would do in T-SQL using staging tables can be done inside SSIS memory without using the database files (much).

    Hi John

    I don't really understand or know how I would use SSIS memory instead of staging tables(database files).

    Is there a certain way of doing this?

    Regards

  • By doing transformations in the dataflow from the source to the destination the data doesn't get written to disk before it ends up at the destination. Since there are very few things that can't be readily accomplished in a dataflow task this usually works.

    CEWII

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply