Need some advice on a potential data warehouse

  • Hello all,

    We have the opportunity to get involved with a project involving reporting and analysis of a subset of data from an OLTP system we develop and maintain.  The client currently receives some paper/pdf reports generated out of our system but would like the raw data so they can do their own slicing and dicing, do some previous YTD comparisons of summary and aggregate data, etc, nothing too complicated or far out (that we've seen so far).  Depending on their final requirements, we were thinking of providing an Excel front end to a SQL back end with the requisite structure.

    The data involved currently resides in a SS2K database across a dozen or so tables, with the largest of these tables holding maybe in the order of tens of thousands of rows.  Most of the tables hold thousands of rows.  They are updated rather infrequently, maybe several times per week for 6 months out of the year.  We'd like to dump this data to a separate database, allowing us to customize it for their needs.  Not a problem, and we don't want them messing around in the live system anyhow.

    So:

    1. Given the volume of data involved and the relatively low frequency of updates, does it make sense to utilize this project within the framework of a data warehouse, or should we stick to a roll-our-own solution?  I should mention that we have zero to little experience with building a "true" data warehouse.

    2.  If the answer to #1 is to go the data warehouse route, can anyone suggest a good starting point, maybe some books, case studies, tutorials, etc to help us get going?  If it makes a difference, the calculations involved are mainly averages, sums, and standard deviations.  Nothing too fancy or complicated. 

    Hopefully by now none of you are shaking your heads in disgust thinking that I'm already out of my depth with zero to go on.

    Thanks in advance for any input,

    Vik

  • Vik,

     

    A data warehouse is appropriate for all but the most rudimentary reporting systems. Hitting the transactional system directly is always risky -- especially considering the price of hardware today. Also, you'll probably find that you will need to add external data to report on the business the same way the client talks about it. For example, salesreps often group customers in ways not present in the system.

    Pick up "The Data Warehouse Toolkit" and "The Data Warehouse Lifecycle Toolkit" by Ralph Kimball. These books provide a very good start.

    Don

    Donhttp://www.biadvantage.com

  • Having been involved with database developments for over 20 years, I don't know where to start with your proposal.  On the other, with all the grey hair, I should have some wisdom to pass on.  !!

    I can give you specifics - too much to cover.  I'll tell you this:

    The reporting is the system.  To the users, that's what they see, and that's 100% of the value to them.  (put yourself in their shoes)  You have to worry about the infrastructure and integrity, but they don't give a hoot about it.

    Thus, focus on the output.  Interview the users; make sure you get the detail on what they want in the reports, and what they can do.  (do they have the skill to use a pivot table, or do they need hard copy flat report)

    The closer you are to that; then the solution to build it becomes almost obvious. 

    For a few tables and thousands of records, even Access may do you for a reporting database.   Doesn't have to be SS.

    Mike

  • Vik -

    In my experience the success of a data warehouse project for reporting purposes is greatly improved if the client can provide access to what I think of as a "super user", that being someone who  1) understands the clients business issues & rules;  2)  fiddles with databases (even Access);  and  3) is suspicious enough of data that they will beat on it (test their results) to ensure that what they are getting makes sense.

    Given the size of the database you should be able to bring the data over during low-usage periods (typically daily or weekly at night) via DTS.  With some testing you can see whether it's quicker to only bring in new data vs. truncating the datawarehouse tables, dropping indexes, repopulating the tables and re-creating the indexes - I've had a lot of success with the latter method.

    Once you've got the raw data updated there are lots of things you can automate - rerunning the data into new tables using queries/SPs which provide for better reporting (adding client-specific business rules, denormalizing the data for faster reporting, summarizing by time period, whatever).  Once again you can simply tack these processes on to your other DTS processes.

    One very real benefit to the client is that as the data becomes more visible it helps them to clean up their input processes.  In addition to business reporting, well-designed reports can provide wonderful feedback to managers who are responsible for data entry processes.

    I could go on and on.  These things are fun.

    Good luck

    K2

  • Thanks for the input, guys, much appreciated.

    To Don -- I have heard of that Kimball fellow, I've always heard he's The Man in data warehouseland.  Sounds like our likely option is to go the data warehouse route from what you say.

    To K2 -- fortunately/unfortunately the 4 or 5 end users of this application are very hands-off, database-wise, so no Access for them.  I'm hoping that giving them final base data to work with in an Excel format so they can play around with it will be sufficient as the front end.  Great advice on the DTS front, BTW.  Our updates to the warehouse come from an OLTP-oriented database that's totally under our control, so we're assured of it's consistency.

    Thanks again.

    Vik

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply