How Do You Handle Database Documentation?

  • I've been digging into database documentation at work lately. It's the same old song. Very little documentation exists. The documentation that does exist is so out of date that you can't rely on it.

    Plus, databases tend to be tightly coupled to the applications that use them. So, the design considerations are very complex. (For example: "Record Status 0 means Current, Record Status 1 means historical and Record Status 2 means deleted. If a particular Record ID has a status of 2, then that same Record ID can't have a status of 0." Maybe we should have modeled it differently, but we can't change that now.)

    No one person knows how everything works. The people who do know how some things work don't have time to tell you about it. The people who have time to talk to you don't know anything.

    Does any of this sound familiar? I'll bet it does. So, here's my question: How do you handle documentation of a database in an environment like this? What solutions do you use at work.? What solutions would you like to use? What would you recommend to someone starting out with a complex database environment and no existing documentation?

    I'd like to get a wide variety of opinions. Enquiring Minds Want to Know! 🙂

  • I do some digging around and when needed I'll run the profiler. When all that fails, I send my client to ask the questions to those who designed the damn thing and if none of them are around, it's time to tell them it is an undocumented project and the people responsible for managing those who left without leaving documentation should be fired.

    Works most of the time.

    Really, in the end, you can only do so much to reverse engineer a bad design. It is 200% true if you have no access to the source code that interacts with the database.

    Next time, get a job in a place where they mean it when they say that "you will be responsible for approving (or not) what gets done in anything relating to SQL." 😀

  • If it's a home grown application generally I'll ask the developers or BSAs. Generally they can help you understand the application and if you can understand that then you can document the database.

    If it's a purchased or vendor application it can be a bit more difficult. Again look at the design of the application and use that to determine how the database works.

    In either case, profiler can help however on high transaction production systems, this can be difficult to sort through and potentially cause performance issues. You can also use database diagrams using the built in tool.

    And then there is my favorite tool, Redgate SQL Dependancy Tracker: http://www.red-gate.com/products/sql-development/sql-dependency-tracker/.... This tool can create a full diagram of your database, show dependancies and lay them out quickly and neatly. While it can't tell you what everything does, it call tell you how things are interrelated and THAT is 1/2 the battle IMHO.

  • Well, I should add that I am not the primary database adminstrator. I just take care of the Business Intelligence data warehouses. The company itself is huge, and so is the IT department. There are entire divisions of developers. So, tracking down just who did what is not exactly easy, and then you have to persuade them to take out time for you.

    I might also add that the platform itself is quite successful and critical to the business. It has been developed within the company over a period of years. In the beginning, no doubt, a single person could understand everything about it. Also, there is some documentation. It's just out of date. (This is very common in my experience.)

    I suppose we could adopt the Netscape approach: junk everything and start over. We'd just have to figure out how to survive for two or three years with no revenue and how to regain all of our market share when we were ready to start up again.

    I ask the question because I think this is a very, very common situation. Surely someone out there has come up with an approach that works.

  • Based on your additional information I would strongly recommend checking out that red gate tool. It's truly amazing. It's a good place to start. If I'm not mistaking there is a free 14 day trial.

  • Red Gate does have some awesome tools. Currently I am using SQL Doc to reverse engineer the database. Best of all, I can publish the documentation in HTML format and post on the team Sharepoint site. Its main shortcoming, in my opinion, is that it does not create an ERD diagram. For seeing the big picture there really is nothing like ... uh, well, a picture. 🙂

    I am also playing around with Dependency Tracker and SQL Compare to see what I can do with those tools. My main problem with Dependency Tracker is that I can't publish my output in HTML format. (If you know of a way to do this, please let me know!)

    I guess what I am really trying to get a handle on is the process of database documentation. How do I create good, useful documentation which is easy to maintain? Maintenance is the most important part, in my opinion. I think most documentation is obsolete as soon as it is published - because things keep changing. So, how do you keep up without making it a full-time job?

    Here are the basic things that I have figured out so far. (Please feel free to jump in with your suggestions!)

    1. As much as possible, I want the database documentation to be stored within the database itself.

    2. For example, rather than create a spreadsheet or Word document itemizing the SQL Agent jobs, I have written an SSRS report to pull that information out of the msdb database. Then I make sure that all of my comments about the job are included in the Description field in the job (or job step) itself. Likewise, I try to enter all of the information I can about tables and columns

  • Red Gate does have some awesome tools. Currently I am using SQL Doc to reverse engineer the database. Best of all, I can publish the documentation in HTML format and post on the team Sharepoint site. Its main shortcoming, in my opinion, is that it does not create an ERD diagram. For seeing the big picture there really is nothing like ... uh, well, a picture. 🙂

    I am also playing around with Dependency Tracker and SQL Compare to see what I can do with those tools. My main problem with Dependency Tracker is that I can't publish my output in HTML format. (If you know of a way to do this, please let me know!)

    I guess what I am really trying to get a handle on is the process of database documentation. How do I create good, useful documentation which is easy to maintain? Maintenance is the most important part, in my opinion. I think most documentation is obsolete as soon as it is published - because things keep changing. So, how do you keep up without making it a full-time job?

    Here are the basic things that I have figured out so far. (Please feel free to jump in with your suggestions!)

    1. As much as possible, I want the database documentation to be stored within the database itself.

    2. For example, rather than create a spreadsheet or Word document itemizing the SQL Agent jobs, I have written an SSRS report to pull that information out of the msdb database. Then I make sure that all of my comments about the job are included in the Description field in the job (or job step) itself.

    3. Likewise, I try to enter all of the information I can about tables and columns in the extended properties for each given object (table, property, etc.) Many tools (like SQL Doc) will display information from the Extended Properties in a very accessible way.

    That is really as far as I have gotten. I would very much like to know what tips and tricks other DBA's have stumbled onto! 🙂

  • This is a tough one. It's hard to know what you need to know in that situation. The main thing I'd focus on finding out is, which databases can the business not live without. Then worry about the backup processes. After that, assuming we're talking 2008, I'd use extended events to capture rpc and sql statements in order to understand what's going on with the system.

    And yes, Red Gate's SQL Doc is pretty cool. Plus they just came out with a new upgrade. It's a whole lot faster than it used to be.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (10/7/2011)


    This is a tough one. It's hard to know what you need to know in that situation. The main thing I'd focus on finding out is, which databases can the business not live without. Then worry about the backup processes. After that, assuming we're talking 2008, I'd use extended events to capture rpc and sql statements in order to understand what's going on with the system.

    And yes, Red Gate's SQL Doc is pretty cool. Plus they just came out with a new upgrade. It's a whole lot faster than it used to be.

    Right now the databases are SQL 2005, but we are preparing to migrate to 2008 R2. So, I will definitely look into using extended events.

    I also agree that disaster recovery is of prime importance. If (God forbid) something awful does happen, I sure would like for it to be an opportunity for me to shine, rather than a personal disaster (i.e. unemployment). 😀

    I'm after something a little bit different here, though. I currently support a Business Intelligence development team. Even though the team comprises more than a dozen people, there are many items that only one person can do. If something goes wrong with a particular process and that particular person happens to be out that day, it is very, very hard to fix. So, I am trying to shine a little light into the dark places in our database environment. My idea is that as developers work through problems, they can annotate the database itself with their insights via the extended property fields for the various columns and tables. SQL Doc then makes these comments easily accessible to everyone else.

    In addition to Red Gate's SQL Doc, I have been playing around with an open source solution called SchemaSpy. Quite frankly, it isn't as good as SQL Doc, but it has a couple of GREAT features that I would love to see in SQL Doc.

    1. It generates a graphical ERD diagram based on the foreign key relationships AND the implied relationships in a database. (Implied relationships are defined as fields with the same name and data type where a foreign key relationship does not exist.) It does this on a couple of different levels. First, it creates a diagram for the entire database. (This can be quite large.) Second (and this is the clever part) it creates a separate diagram for each table showing the other tables (and views) related to it. Optionally, you can extend this to second degree relationships. This is a very useful feature. (Red Gate, are you listening?)

    2. For each column in a table, it creates hyperlinks to the other columns in the database that are related to it. (This is also very useful.) It also has a page with a sortable list of every column in the database. When you sort this list by column name, you can see at a glance all of the other columns in the database with the same (or similar) name. This allows you to tease out relationships that the application itself does not detect.

    I don't like the idea of maintaining two separate documentation systems, but it looks as though I might need to. :unsure:

    I'd love to hear what tools everyone else is using! 🙂

  • And yes, Red Gate's SQL Doc is pretty cool. Plus they just came out with a new upgrade. It's a whole lot faster than it used to be.

    Hey! You know what else I'd like to see? I'd like to see the system tables and views listed in the documentation, too. At least there should be an option to display them. 🙂

  • All of my deployments are generally documented as a combination of Visio diagrams and Word docs detailing the complete setup from host names and IP addresses to storage.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • dmoutray (10/5/2011)


    I've been digging into database documentation at work lately. It's the same old song. Very little documentation exists. The documentation that does exist is so out of date that you can't rely on it.

    Plus, databases tend to be tightly coupled to the applications that use them. So, the design considerations are very complex. (For example: "Record Status 0 means Current, Record Status 1 means historical and Record Status 2 means deleted. If a particular Record ID has a status of 2, then that same Record ID can't have a status of 0." Maybe we should have modeled it differently, but we can't change that now.)

    No one person knows how everything works. The people who do know how some things work don't have time to tell you about it. The people who have time to talk to you don't know anything.

    Does any of this sound familiar? I'll bet it does. So, here's my question: How do you handle documentation of a database in an environment like this? What solutions do you use at work.? What solutions would you like to use? What would you recommend to someone starting out with a complex database environment and no existing documentation?

    I'd like to get a wide variety of opinions. Enquiring Minds Want to Know! 🙂

    It's time to get management involved since they were the ones that didn't require the documentation to begin with. They need to 1) make proper documentation a required part of the job that you're evaluated on, 2) give people the extra bit of time to do the documentation (and it really doesn't take that long when done correctly) and 3) require code reviews for every piece of code that goes into the system.

    For things like the "record status" that you're talking about... there should be an FK table with those values in it and the table and status column rules should be documented in the extended properties because no amount of automated reverse engineering in the world will tell you things like "a particular Record ID has a status of 2, then that same Record ID can't have a status of 0."

    We went through the same thing at one of the companies I worked at. It took (for example) 2 days of some pretty intense research to make even the simplest of changes to stored procedures and then most people still got the change wrong. What should have taken an hour or so to change and retest would frequently take a week because of going back and forth with test until it was right.

    It took more than a year to document within our stored procedures and tables as we reworked different procs but, after that year, the total time to rework a small problem went down from an average of 32 employee hours to less than 1 and the bug rate plummeted from nearly 90% (some bug fixes actually introduced additional bugs) to very nearly 0. "Change controls" to promote code to production dropped from more than 6 hours including sanity checks to being able to promote dozens of stored procedures and do the sanity checks in as little as 20 minutes.

    Every SQL Statement in every stored procedure should be documented not as to what it is doing (ie: Update the Customer table), but the reason WHY it's doing it (ie: Give high-volume customers a discount according to the schedule in the Discount table). Every column should have extended properties descriptions justifying the purpose of the column and any special business rules for it. Every table should have extended properties descriptions to describe the content of the table. Every developer making a change to any of those should be tasked with updating such documentation.

    Until management gets on board with the idea of proper embedded documentation, your problems will continue to be nothing but a death spiral.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Outstanding! Now, this is the kind of insight that I was after! 🙂

  • dmoutray (10/8/2011)


    Outstanding! Now, this is the kind of insight that I was after! 🙂

    There's more. There's a saying that's actually a paraphrase of a longer version written by a fellow by the name of Edmund Burke...

    "All that is necessary for the triumph of evil is that good men do nothing."

    In order to be successful with the recommendations I gave, 4 things must happen...

    1. A written set of specifications must be written and agreed upon by management. Remember that anything too long will simply not be read. Once approved, management needs to have a meeting with all of the developers, managers, etc, and read through the entire document so that everyone is aware of the content. Left to their own devices, developers and managers simply won't take the time to read it.

    2. Rules without enforcement are useless. EVERY piece of code must be reviewed by designated reviewers who have the power to turn code back even if it means a schedule won't be met because no one ever goes back to document code in production. The problem has to be solved before it reaches production. If SQL embedded in application code is allowed (not allowed in my shop for more reasons than will fit on this page), then the embedded SQL must be reviewed to meet all of the specs including comments.

    3. Management must "buy in" 100% and must agree with all reviewer decisions on "turn backs" (obviously, the reviewers must be seriously qualified to review). They must agree that "turn backs" for code and objects not meeting the specs will be allowed no matter what the bloody schedule says. They should also be provided a record of turnbacks and the reasons. Managment MUST also buy into some short term schedule relief to do the documentation as items are reworked for other reasons. The long term payoff will be huge. The documentation effort also has some huge marketing potential because very few companies can brag about either their bug rate or their rework turnaround rate like your company will if they elect to do things right. They have to be convinced that "If you want it real bad, that's the way you'll usually get it." 😉

    4. While you're writing the specs and if you haven't already done so... find the average bug rate, rework rate, research hours, etc, for how things are right now. You'll need this to 1) justify this major paradigm shift to get it started and 2) justify it's continuation over time.

    "Taking control" for things like this will always be difficult and I've been through this paradigm shift a couple of times now. I have to tell you that EVERYBODY will bitch about it for the first 6 months. Don't give in to the naysayers. Make no exceptions just because of schedule... it doesn't take but a bit of extra time to do things right. People just need to be made aware that it's now a part of the job they will be graded on and in 6 months to a year, they'll wonder how they did without it.

    Have management give its rewards for folks doing their best. A $20 gift card (restaurant, store, coffee, whatever) for those with only 1 or 2 "turn backs" per month is a huge incentive. For quarterly awards, those people with only 1 or 2 turnbacks and a near zero bug rate for the quarter should be given an extra day of vacation or something else nearly as substantial.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is really good stuff! In my particular situation, I am the DBA for the Business Intelligence team at my company. The Application Development group is huge (and completely outside my "chain of command").

    However, I am working on the principle that we should "set our own house in order" before pointing out any problems that other groups may have. If nothing else, we can "lead by example". I also know that my managers are very committed to documentation, good development practices, etc. Maybe if we can triple our throughput, other teams will start asking us, "How did you do it?"

    I'll definitely take this up with my managers. Thank you, very much! I really appreciate your insight. 🙂

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply