Datawarehousing for beginners

  • Hello All,

    I am confused about Data Warehousing.

    I 've read couple of articles about it, but still out of the loop.

    Might be a good time to get into it.

    BOL gives rather vague (at least to me) explanation on that.

    SQL 2000 Programming book says " a data warehouse is a data store that holds the data collected during the company's conduction of business over a long period of time. The data warehouse uses OLTP system that collects the data from everyday activities and transactions....

    A data warehouse is usually built to support decision-making and OLAP...."

    How is Data Warehousing is different from simply moving an ASCII file(?) from legacy system thru FTP/DTS to SQL Server table?

    Would it be moving data from Mainframe to SQL Database(s) (and storing it there) consider a Data Warehouse?

    Should it always support decision-making/OLAP or it can be used as a read-only in OLTP environment?

    Thank again.

    ad

  • The SQL Server db would be the warehouse. The movement of the ASCII file would be the ETL (extraction, transformation, and loading) phases.

    A warehouse can be used for querying, bilding cubes, etc.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

    The Best of SQL Server Central.com 2002 - http://www.sqlservercentral.com/bestof/

    http://www.dkranch.net

  • Thanks Steve,

    How is the design for Data Warehouse database would be different from a normal database design?

    Do you happen to know any links (not very advanced) on the web on that matter?

    Thanks again.

    ad

  • A data warehouse design would be different than a transactional design.

    For me, the difference is how the data will be used. I think of a transactional database as one that will be updated a row (or several rows) at a time. A data warehouse will be a bulk update or maybe not updated at all.

    In a transactional database you maintain the balance between indexing and data. I always ask my developers to let me know if they have added more than five indexes to a table because any updates or inserts will also update the indexes. In a data warehouse, since the data is not updated as often (if at all) indexing is not an issue.

    Data warehouses are used for reports.

    I hope this helps.

    Dr. Peter Venkman: Generally you don't see that kind of behavior in a major appliance.

    Patrick

    Quand on parle du loup, on en voit la queue

  • VERY basic description:

    'Normal database' - used frequently. All types of transactions are done on it. Usually one database for each specific function/application (HR, Security, Sales, etc...).

    Datawarehouse - used infrequently. Normally just reads (once in a while inserts/updates will be done to make the data 'current'). Combines all types of function/application databases. Usually an ARCHIVE database.

    -SQLBill

  • You should also be aware of the industry terminology. A "data mart" is generally considered to be data extracted from one OLTP application. Once you have several data marts, then you start to build a data warehouse to consolidate that data.

    The data warehouse contains data that has been scrubbed or validated and this "good data" is saved for seven to ten years, verses the couple years or less retention in the OLTP systems. I hope this highlights the differences between a read-only copy of data from an OLTP system and a true data warehouse. The foremost source on data warehousing is Ralph Kimball, so any book from him is sure to have a better explanation.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply