Data Warehouse Documnetation

  • Hi SQL Peeps,

    I have recently come into a new role as an application DBA for a dotcom. They have a fairly typical set up with a couple of heavily hit OLTP databases and quite few SSIS packages that feed a Data Warehouse overnight.

    The ETL packages (which I never wrote) are very complicated for what they actually do. Basically I want to document and map the end to end processing for these packages and have them in a single doc next to infomation regarding the schema's of the databases.

    i.e OLTP Database1 , Table1 , Column1 is transfered via Package1 to OLAP Database1 , Table1 , Column1

    Would probably want to put a bit meat on the bones than that but was wondering if there is any recommened documentation tips or tools for such a task. At present I'm thinking of just doing it in a spreadsheet but somebody might out there might of had to do this before.

    Many thanks

    Alan

  • I have done documentation in both Word and Excel.

    I am fairly sure that PragmaticWorks sells a BI documentation tool, which probably means there are other tools out there as well.

  • You appear to be concentrating on what I would term the "mechanical" aspects. What I would deem that should be included, is a word description of what the desired results are. from a business perspective,

    and what data is required not in the mechanical sense, but say items like "monthly sales of widgets". Your Excel spread sheet would be an attachment to the word doc for the mechanical aspect of how to obtain the "monthly sale of widgets".

    Equally important and often over looked, is what should NOT be done, that might be something along the lines of "monthly data shall not be overwritten for any data collected in prior years by this years data"

    Hope this gives you a new train of thought, that eventually results in a superior document.

    Another item / method to consider which I over looked in my original comment, is the use of "Extemded Properties". The advantage of this method as an adjunct to the written word / Excel sheet is that the "Extended Property" is in the database, and will remain available so long as the DB exists, and of course no one deliberately deletes the property (not so common as many familiar with T-SQL / table definitions etc. are even aware of Extended properties). The drawback of using Extended properties, a lot of typing, time consuming.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Alan,

    What you're describing is what I've always known as a data flow diagram. They are very powerful but you can't keep them up to date forever, the documentation on them takes as long as writing the code in the first place. You typically use them to analyze an existing system to locate patterns that got modularized poorly. I personally use Visio for that, being the most versatile tool for image inserts, showing flows, and adding notes.

    I would add that unless you're doing a deep system analysis try to keep it high level. If you start getting down to column by column transformations and explicit temporary structures, you're in for a lot of work. Sometimes they're necessary but they're incredibly painful to do.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Many thanks for the feedback 😉

    This has given me something to think about for sure. We have be deliver an end to end SQL OLTP and OLAP that is not ideal , not well documented and has very little if any validation on the data that is being populated to the DW.

    I like the idea of looking at each of the business aspects rather than the mechanical aspects of the data flow but I think becuase of the lack of understanding of the actual data flows then I might have to bite the bullet and start documenting each package using a separate spreadsheet for each package. This might well be a pain but it could well be that we are going to have to redesign from scratch and without a full understanding of what is in place at the moment (which is just about fulfilling the business requirements) I'm not going to have a good handle on where to start on a redesign...... Joy.......

  • I like the idea of looking at each of the business aspects rather than the mechanical aspects of the data flow but I think becuase of the lack of understanding of the actual data flows then I might have to bite the bullet and start documenting each package

    Trying not to be long winded in my original reply, let me point out another advantage of starting with the business aspects. It is fairly common that a company earns 80-90 percent of its profits from 20 percent of its products and/or services. Although not without its own problems in identifying which products are in that 20th percentile, if the sale of "widgets" and / or other items falls in that group, what you have learned is a priority of which items needs to be addressed first, thus gaining the greatest yield for your endeavors, in the shortest time period.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • I'm in agreement with Ron in the sense that Business Requirements documentation is needed but I think poster is right when trying to get a proper picture of ETL mappings - after all marbles are counted, at 2AM in the morning, when you get a page because ETL process failed what you need is to address the "mechanical" part of the issue and you can't care less about general, high level specs 🙂

    In short, these are two different projects and in the long term they will require a lot of effort and resources to keep documentation up to date; the only thing worse that having no documentation is having documentation that is not up-to-date.

    Let me address the original question posted here. To easy the pain I'll rely on a third party "documenter" - plenty of them in the net like http://ssisdocumenter.com/ssisdocumenter.aspx

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply