OLTP/OLAP Data Warehouse

  • We have a logging application where we are primarily inserting/updating only two tables, a Sessions table and an Session Detail table.  Each Session record will have anywhere from 1 to 20 or so Detail records.  Both inserts and updates are performed on the Sessions table, and inserts only are performed on the Detail table.  We have several customer sites where we deploy this application, and the number of records varies greatly between customers.  Anywhere from 500 to 10000 Sessions records per day - anywhere from 500 to 200000 or so Detail records per day. 

    The other part of the application is reporting.  We have implemented a web-based front end to provide a few canned reports, as well as some ad-hoc query capability.  No updates or inserts.  We expect a maximum of 3 concurrent users (and that is stretching it), and they may only use the system once a day or so just to see what is happening on their system.

    We have seen some reporting performance issues on the larger systems, potentially a result of poor design on the front end - the page with the worst performance executes about 30 queries for high level statistics and also writes detail records to the screen, all based on the user selected search criteria. 

    We are now designing relase 2.0.  My question is basically this:  Do we have the need to create two instances of the database, one for OLTP, the other for OLAP?  It seems to me because of the low number of front end users plus the small number of tables and data, this would be unnecessary, and that our focus should be on query optimization and front end design.  However, most of the resources I've found on the web say that you should always use separate databases for OLTP and OLAP.  One other thing - depending on the customer's IT resources, we may or may not have dedicated web servers and/or database servers.  It may be possible that the database server and web server actually reside on the same box.  This is directly related to transaction volume.  Typically, customers with high transaction volume do have dedicated web/database servers.  Any advice?

  • A seperate OLAP instance would be advantageous. Given that your base tables may have high transaction volume, you don't want to be delaying transaction processing while a report is being run. Especially as some adhoc querying can be very, very resource intensive. Your indexing plan will also vary greatly between the OLTP and OLAP instances.

     

    --------------------
    Colt 45 - the original point and click interface

  • What are some methods of getting data from the OLTP instance to the OLAP instance?  Triggers or scheduled stored procs?  Is there anything within SQL Server (some type of replication) that will do this, or will we need to write all the code?  Again, this seems that it would create quit a bit of unnecessary overhead.

  • TANSTAAFL.

    The mindset between OLAP and OLTP is somewhat different. Usually you have a denormalized schema in OLAP db's. At least this transformation from OLTP to OLAP you have to code yourself.

    We do have job here scheduled that do this whole ETL processing.

    Here's something worth reading:

    http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx

    http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/anservog.mspx

    http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ansvcspg.mspx

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • This depends on many factors.  How "Real-time" do the reports need to be?  What is the typical usage pattern for your transaction processing?  200000 records a day doesn't really tell the whole story, it could be a fairly light load, or it could be an extremely heavy load (not all transactions are created equal!)  Is there anything that can be done to optimize the transactions? 

    Generally, I assume that the transaction processing is the highest priority and that reporting is secondary.  If so, then you could schedule a DTS package to move data to your reporting tables during "quiet" times. 

    If the load on the server isn't that heavy, then you could use a NOLOCK hint in your reporting queries to avoid any interference with tran processing.

    Another wrinkle to consider is whether your reports generally do a lot of aggregation.  If so, then you should certainly consider reporting tables (or seperate database on another server).  Analysis Services can also be used to great effect in many situations.

    I know, I've thrown out a bunch of options and very few answers, but that's pretty unavoidable given the nature of the question.  There are tons of options, and which one is best for a given situation can only be determined by someone who is familiar with both the options and the situation.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply