Data Warehousing, from where to Start

  • Hi,

    I have to implement data warehousing on SQL Server database (6 GB). I never worked on it before. I have no idea from where to start. I am searching for some tips and tools to use. Please guide me.

    We are using SQL Server 2005. We have web based application, with daily user hits of 50K.

    we are keeping site tracking and user activity in database and providing educational material on web. Our reporting is usually users site tracking activity and their credit based on users activity on specific page visitings.

    Please help me how to start and develop warehousing. we want to keep our production data of 1 week and move all data in ware hourse so that users who like to see site tracking activity would result in fast and based on their tracking admin users will give credit to employees.

    we have almost 450k unique users right now.

    Shamshad Ali.

  • Described scenario looks more like a reporting database than a data warehouse.

    Here is where you can start.

    1- Read and understand Kimball's approach.

    2- Read and understand Inmon's approach.

    3- Choose

    4- Assuming you go with Kimball identify Datamarts

    5- For each datamart

    ... a) Design datamart

    ... b) Design ETL

    ... c) Design Reporting strategy

    ... d) Develop

    ... e) Deploy

    ... f) Tune and maintain

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I haev to agree with the advice about learning the Kimball method for data warehousing, grab a copy of this book..

    http://www.amazon.com/Data-Warehouse-Lifecycle-Toolkit-Developing/dp/0471255475

  • [font="Verdana"]Start with the Kimball methodologies (as recommended above.) He has an excellent web site and another book that are well worth reading.

    When you come to tool sets, if you are just using SQL Server, you will need to learn:

    SQL Server Integration Services -- you may be able to do all of your data integration with just T-SQL, but I would think it unlikely.

    SQL Server Reporting Services -- to write and support reports delivering information from the database.

    SQL Server Analysis Services -- this packages sets of data up into "cubes" that are designed for supporting interactive queries from tools like Excel.

    Make sure your organisation is using Office 2007 and SQL Server 2008. Yes, you can do this stuff on earlier versions, but they both have a lot of additional value for working with data warehouses.

    You can also look at tools like Wherescape RED (which is the data warehouse prototyping tool we use) to help you design and build the actual warehouse.

    It's a big job to do well. You may be better off taking an interim route and just building a reporting database for now.

    Feel free to throw up any additional questions. Good luck!

    [/font]

  • Thank you all for your reply. You are right that we only need data warehousing only for the purpose of reporting, nothing else. We have slightly changed our plan, that is data archiving. as we have number of users online. and the admin of every region like to view stats and give credit to individuals and send reports/statistics to upper management.

    When these reports ran the users have to face bad performance experience both admins and clients.

    So rather we keep all data of past 5 years in online database we are willing to move all data except 1 month (past 30) days online and rest on archived database. Is this a good strategy? if yes, what steps are involved implementing this technique?

    Please guide me once again with data archiving technique to be used. we will query data from archive database if the date range is greater than 30 days. Sometimes we have to run union queries on online and archive data both. the reporting option with date criteria has many options like, ToDate, Today, Yesterday, This week, Last week, This month, Last month, quater, fiscal year last year and so on.

    if someone has already implemented this architecture, please guide me with links articles and key points to make it succcess.

    Shamshad Ali.

  • Scenario looks like a text book case of archive & purging where OLTP users access a small database having just NN days worth of data while reporting is conducted against the archive database -is that correct?

    Archive and purge process is pretty straight forward but it all depends of volume and business specs.

    One way or the other process will copy data from OLTP to Archive database then get rid of the same data set on OLTP database.

    Depending on volume you may want to consider table partitioning but I'll suggest to address the issue one table at a time and only go for the partitioning strategy if partitioning will help both Archive & Purge AND Reporting queries.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Actually we are doing this technique (archiving) only becaues of Performance issue. i would like to know which option will be best? either archiving or table partitioning?

    here a guy has implemented what i am looking for.

    http://sql-server-performance.com/Community/forums/t/29365.aspx

    Moving data through SSIS, keeping logs etc. He was facing issue only because of deleting from operational data once that data is archived. My case is only to keep 30 days data with operational - rest will be moved to archive db. coz users are able to view their individual reports for any given period.

    Please let me know if this is a correct way or not. also we are going to implement Peer to Peer replication on Operational/OLTP database with load balancing on web servers, atleast 3 web servers with 1 SQL Server per web server each. the database will be replicating data via Peer to Peer repliction setup.

    Would it be a good architecture which we are planning to implement? Please guide me.

    Shamshad Ali.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply