Introduction
Most data warehouse and data mart OLAP training and examples center on sales and manufacture scenarios to provide business intelligence for making financial decisions. Occasionally other business problems arise that can make use of this technology and provide quick and accurate analysis of other types of data. The challenge can be taking something you know and finding a new application for it. This solution integrated SQL 2005, ASP.NET, Reporting Service and Analysis Services to rapidly develop and deploy an application for data entry and analysis.
Problem 1: Create a user interface
It was a typical Friday afternoon and I was trying to wrap-up some projects prior to the weekend when one of the department's Directors came to me with a problem: create an interface for survey results input with real-time results reporting. In addition it needs to be completed in three days. The 8000 paper surveys had already been mailed and the questions and format established. The survey results need to be tabulated quickly due to critical time constraints. Using optical recognition is out of the question and it is too late to use a third-party or COTS (commercial off the self) application. Manual tabulation or a spreadsheet would be too time-consuming. Enter the Database Administrator.
The Basics - Day 1
Some quick relational database design based on the survey requirements should not be too difficult. A transactional database with three or four tables to store the survey questions and results should work. Each of the surveys was, fortunately, marked with a unique number to make sure copies were not made and that an individual could only respond once. The survey id is the primary key to work with. Although this is supposed to be a one-time event, I plan for the future and create a structure for future surveys and include a "survey" table to hold information about a specific survey: survey "from date" and "to date", who initiated the survey, what system is being surveyed. I add another table for the survey results: a column for that corresponds to each question number and comments. I also decide to add a table to enter the questions that were asked on the survey. If the original paper survey is lost the data would be meaningless. The last table I add is one for the possible results for each question. A few of the survey questions are open-ended but most are either "yes" or "no".
Figure 1: Database Diagram
The next step is to build an interface. I ask myself, "Who will need access and what application should I use?" Rapid development and testing would be straightforward using ASP.Net. I begin by building a simple form for data input, including some field validation and a method for identifying the user entering the data. Now, it is time to modify the results table to add a user id field to store the identity of the data entry operator. Two of the questions have multi-part answers which is easy to accommodate.
Figure 2: User Interface
As I begin testing the input form I realize that the yes/no questions are not really "bit" data types. Someone could choose not to answer the question. So now there are three choices: yes, no or not answered. I decide to use the integer data type for these questions. I now change the structure of the answers table to add an entry for each question and the value that corresponds to a "yes", "no" or "n/a". The survey also contains demographic questions: age ranges, salary ranges, years of service ranges, gender, etc. What started as a simple 13 question survey is now beginning to look more complex so I need to rethink the database diagram and relationship.
Day 2
Once the kinks are worked out and some additional data validation is added to the input form the data entry works. Using reporting services I can quickly create a report to show the results as they are entered in real-time. With a few additions to the report I have added some pie charts. The question numbers, the text of the questions, the possible answers and the number of responses are presented in the Reporting Service Report. All of which are pulled from the database tables. Things are looking good, so I think. The data entry operators and Director like the input form and things are going smoothly. Duplicate numbers are caught, and the integrity of a single user response is validated. The survey results start pouring in: 2500 results which are about 30%. Not bad for a mailed survey.
Figure 3: Reporting Services Survey Summary Report
Problem 2: Time to Re-think
The data is now being viewed as information. However, one of the "higher-ups" wants to analyze the information and compare the summarized results to the "audience" demographics. I could export the results to a spreadsheet and then start sorting and filtering the data. This would be cumbersome and the survey results deadline is fast approaching. How can I start to make this analysis easy and available to the multiple people in the office who need it? It is hard to know what questions to ask about how they want to analyze the data because those questions depend being able to group the data by multiple demographics. This sounds like a perfect application for an OLAP database. I will have to re-think how the OLTP database structure and data will translate and transform into and OLAP database
Figure 3a: OLTP to OLAP Diagram
Now the re-thinking of Analysis Services comes in. I had developed data marts and cubes at my last job for transactional sales and manufacturing data. I have experience extracting and converting transactional history records and converting the data to a format that can be used to build facts, dimensions and measures. The question is "how do I create measures and facts from transactional data that has 3 possible results, counts, and no financial information or time data that I have traditionally work with in data warehouse projects?" These kinds of questions keep me up at night.
Day 3
The survey results have been entered into a transaction database and the easy part of tabulation reporting is complete. I must figure out how to translate the data into a fact table and dimension tables. I wake up at 3:00 am with the solution and write it down. The next morning I can't remember were I put the paper that I wrote the solution on, but I remember what I need to do. That morning I create 6 views for dimension data based on the demographic information stored in the survey answer table. The facts are extracted for the cube using a view based on the results table. By using a case statement in the view select, I can assign a 1 or zero value to the each of the survey answers: yes = 1 or 0, no = 1or 0, n/a = 1 or 0. That way the cube can total each of the possible answers and sum the total for each.
Figure 4: OLAP Database Diagram
The views for the dimensions use keys to link to the fact table. One other consideration was to translate a pay-grade into a more meaningful salary scale. Noticeably, a time dimension is missing. Since this survey is not tracking any historic transactions there is no need to create a time dimension. The cube will only need to be processed once since all of the survey results have been entered into the relational tables.
Figure 5: OLAP Cube Explorer
I was able to use Reporting Services to create reports based on the cube and show results based on each dimension. However, with out a third party application interface available for slicing and dicing the OLAP cube I publish an html version of Excel pivot table with interactive cube functionality to our intranet. Now the front office could really analyze the data and slice and dice across results and demographics. This information provides more insight into the response rather than just a summarized tally of the overall results.
Figure 6 - Reporting Service Analysis Services Report
Figure 7 - HTML Excel Pivot Table Reports
What's next? And other considerations
Although the solution was quick and provided the analysis the office needed there is room for improvement. The transactional database could be redesigned to allow flexibility in adding new surveys and questions. An interface could be developed to allow the office to create a new survey and populate the fields in the database. It would also be helpful to identify demographics up-front. The view that is used for the fact table could also be redesigned to use a stored procedure to pull the possible answers for each corresponding question rather than hard coding the case statements in the view. The other consideration is should there be one cube for each survey?
Complete survey results analysis available at: http://www.vermonttreasurer.gov
The code is included in the resources section below for download.
AUTHOR BIO: David Pruden (David.Pruden@state.vt.us) is currently a Systems Developer III for the State of Vermont Treasurer's Office. He is responsible for web application development, database administration and design. David has a Masters in Applied Information Technology from Towson University and his article "Self-Service Subscriptions" was published in the December 2006 issue of SQL Server Magazine.