advice on archiving data

  • I have the following problem...

    We have a manufacturing system database that continues to grow. Each day, new manufacturing orders and purchase orders are inserted and new transactions such as inventory transactions, labor transactions, etc. continue to occur. We have customers that are exceeding one million rows in some tables.

    The product manager wants options on how to archive the data. I know that's a loose definition. At this time, it would probably mean that an administrator would determine that all data older than 2 years (for example) should be archived. That would mean selecting the manufacturing order and all related data (item masters, quality testing results, etc.) be saved somewhere and physically deleted from the OLTP tables.

    Does that mean data warehousing?

    Or, moving the data to another database (or table within the same database) with the same table structure? Or, write C#.NET code?

    Any SQL services or tools to aid in this?

    Users can have multiple instances of our software in the same instance of SQL Server and the name of the databases are user-defined. So whatever approach I come up with has to allow the database name(s) to be easily changeable.

    Also, the development manager told me he read something about SQL 2012 that might help with this. He cannot remember where he saw it or what it was called. I installed the SQL 2012 preview but could not find what he might of been referring to.

    Any thoughts would be appreciated.

  • Joe B-478020 (11/21/2011)


    I have the following problem...

    We have a manufacturing system database that continues to grow. Each day, new manufacturing orders and purchase orders are inserted and new transactions such as inventory transactions, labor transactions, etc. continue to occur. We have customers that are exceeding one million rows in some tables.

    That's really kind of tiny. I use a million rows as baseline for optimization tests so I have enough data to get out from under 100 milliseconds in most cases.

    The product manager wants options on how to archive the data. I know that's a loose definition. At this time, it would probably mean that an administrator would determine that all data older than 2 years (for example) should be archived. That would mean selecting the manufacturing order and all related data (item masters, quality testing results, etc.) be saved somewhere and physically deleted from the OLTP tables.

    Does that mean data warehousing?

    No. Maybe once upon a decade, when the 'warehouse' was where the old businesses shoved ancient paperwork to rot. These days Data Warehouses are active, current, and relevant. What the difference is to a Warehouse (OLAP), versus a Transactional system (OLTP) is that the warehouse is optimized for large data reporting while the OLTP system is optimized for single row manipulation. There are other differences but that's the most basic.

    Or, moving the data to another database (or table within the same database) with the same table structure? Or, write C#.NET code?

    Both are an option. For truly archived and 'dead' data, I prefer the second database. This lowers restore times, index manipulations, and other possible size issues for data that is irrelevant to the activities the database is created for.

    Any SQL services or tools to aid in this?

    Not really.

    Users can have multiple instances of our software in the same instance of SQL Server and the name of the databases are user-defined. So whatever approach I come up with has to allow the database name(s) to be easily changeable.

    There are ways to deal with that once you know how you want to approach it. Most will have at least some dynamic SQL in it if you're going to make it generic.

    Also, the development manager told me he read something about SQL 2012 that might help with this. He cannot remember where he saw it or what it was called. I installed the SQL 2012 preview but could not find what he might of been referring to.

    News to me. Archiving is a design decision. If there's a tool wrapped around the concept in SQL Server that would surprise me. You could choose to archive on just about any set of rules you like. There might be a few things available that will HELP you design the archiving mechanics, but not a wrapper that will do it for you.

    Can you give us a few ideas as to why this archiving is necessary? Back to my original statement, a million rows is a blip. That's not enough data that it should drive mandatory archiving, at least not in today's day and age. My desktop with a single IDE drive can handle a million rows reasonably while I run a few other memory intensive programs.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (11/21/2011)


    Any SQL services or tools to aid in this?

    Not really.

    What about table partitioning or even partitioned views?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Joe B-478020 (11/21/2011)


    I have the following problem...

    We have a manufacturing system database that continues to grow. Each day, new manufacturing orders and purchase orders are inserted and new transactions such as inventory transactions, labor transactions, etc. continue to occur. We have customers that are exceeding one million rows in some tables.

    The product manager wants options on how to archive the data. I know that's a loose definition. At this time, it would probably mean that an administrator would determine that all data older than 2 years (for example) should be archived. That would mean selecting the manufacturing order and all related data (item masters, quality testing results, etc.) be saved somewhere and physically deleted from the OLTP tables.

    Does that mean data warehousing?

    This is a two parts answer.

    1- OLTP system should store only the data needed to process whatever the OLTP system is supposed to process - any extra data is just unnecesary overhead.

    2- All reporting should happen in a database other than OLTP either a "Reporting" database or a properly structured "Data Warehouse"

    Having said that, you should start thinking on a Data Warehouse - in the mean time and until the data warehouse modeling is complete you can certainly "archive" OLTP generated data into ODS a.k.a. "Operational Data Storage" tables - which are nothing but tables that still have the same structure they have in the OLTP system, just holding data until you figure out what to do with it.

    Joe B-478020 (11/21/2011)


    Any SQL services or tools to aid in this?

    This is a really open question and all depends on the definition of "tools".

    If you mean something that does it for you... nope, there is none.

    If you mean SSIS, Informatica or other ETLish pieces of software... yes but, you have to do the leg work and make the "tool" do whatever you want it to do.

    I've seen in a follow up question you have asked for table partitioning - table partitioning may help to more efficiently administer/maintain/"archive" your data but this solution is intrussive on you OLTP system meaning that you will have to make modifications at the current structure of OLTP.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (11/22/2011)


    All reporting should happen in a database other than OLTP either a "Reporting" database or a properly structured "Data Warehouse"

    Although that's the general concensus, I've found many cases where there's simply no need to double up the data like that. "It Depends" a lot on how the data is structured and how the code is written.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/24/2011)


    PaulB-TheOneAndOnly (11/22/2011)


    All reporting should happen in a database other than OLTP either a "Reporting" database or a properly structured "Data Warehouse"

    Although that's the general concensus, I've found many cases where there's simply no need to double up the data like that. "It Depends" a lot on how the data is structured and how the code is written.

    @jeff & Paul: Adding another thought here. What about Transactional Reports e.g. Items processed today EOD? Any database including OLTP systems can't escape from reporting requirements. I would say OLAP / Data Warehouse are most suited for Analytical Reports. If your OLTP system can accept the load of reporting (general, not analytic) there is less need of data warehouse.

    Note: I am assuming there is latency in refreshing to data warehouse. I have worked on DW where refresh rate is very high, are almost real-time. In such cases the above queries / reports can be pulled from data warehouses.

  • Jeff Moden (11/21/2011)


    Evil Kraig F (11/21/2011)


    Any SQL services or tools to aid in this?

    Not really.

    What about table partitioning or even partitioned views?

    You're correct, Jeff, I'm just concerned about using partitioning in a vendor-driven environment where each client is going to have different levels of concerns and data management. I've become loathe to recommend partitioning to anyone I'm not sure has a very strong level of expertise in the rest of the SQL Server Engine.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Dev (11/28/2011)


    Jeff Moden (11/24/2011)


    PaulB-TheOneAndOnly (11/22/2011)


    All reporting should happen in a database other than OLTP either a "Reporting" database or a properly structured "Data Warehouse"

    Although that's the general concensus, I've found many cases where there's simply no need to double up the data like that. "It Depends" a lot on how the data is structured and how the code is written.

    @jeff & Paul: Adding another thought here. What about Transactional Reports e.g. Items processed today EOD? Any database including OLTP systems can't escape from reporting requirements. I would say OLAP / Data Warehouse are most suited for Analytical Reports. If your OLTP system can accept the load of reporting (general, not analytic) there is less need of data warehouse.

    Note: I am assuming there is latency in refreshing to data warehouse. I have worked on DW where refresh rate is very high, are almost real-time. In such cases the above queries / reports can be pulled from data warehouses.

    Jeff would agree with you there, as would I. I rarely implement data warehouses. I might do mirror/snapshots to get the load off the main OLTP system if there's some heavy duty reports that run, but rarely full on warehousing. Under most circumstances I'd agree with you here Dev. OLAP/Full on Warehouses are really more useful when you're doing long term trending and you're looking to have dedicated access to the data for cube building and the like.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (11/29/2011)


    Jeff Moden (11/21/2011)


    Evil Kraig F (11/21/2011)


    Any SQL services or tools to aid in this?

    Not really.

    What about table partitioning or even partitioned views?

    You're correct, Jeff, I'm just concerned about using partitioning in a vendor-driven environment where each client is going to have different levels of concerns and data management. I've become loathe to recommend partitioning to anyone I'm not sure has a very strong level of expertise in the rest of the SQL Server Engine.

    Heh... understood and appreciated. I just think it's time for some people to start pressing on the sides of the database envelope a bit especially when they say they have million row tables. For some folks, that seems to be to borderline where they need to start thinking about some additional scalability methods and techniques while it's still small enough to convert to without (no pun intended) huge problems. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/29/2011)


    Evil Kraig F (11/29/2011)


    Jeff Moden (11/21/2011)


    Evil Kraig F (11/21/2011)


    Any SQL services or tools to aid in this?

    Not really.

    What about table partitioning or even partitioned views?

    You're correct, Jeff, I'm just concerned about using partitioning in a vendor-driven environment where each client is going to have different levels of concerns and data management. I've become loathe to recommend partitioning to anyone I'm not sure has a very strong level of expertise in the rest of the SQL Server Engine.

    Heh... understood and appreciated. I just think it's time for some people to start pressing on the sides of the database envelope a bit especially when they say they have million row tables. For some folks, that seems to be to borderline where they need to start thinking about some additional scalability methods and techniques while it's still small enough to convert to without (no pun intended) huge problems. 😛

    *opens and shuts mouth a few times*

    Point taken. I don't think that'll remove my loathe-iness from trying to explain it to newbies to midlins though...


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply