Are you ready for Data Warehouses?
I have just been reading a fascinating thread in SQLServerCentrol.com triggered by an editorial 'Two Weeks' by Steve Jones.
He got invited to the first annual 'Microsoft Business Intelligence Conference' in May 2007. Among other things, he wonders whether DBA's are ready for the complexities of data warehouse technology and whether BI is ready for prime time or even a candidate.
Well, from my vantage point at the bottom of the trenches I can tell you that if you want to stay in IT, you better get ready because BI is already upon us. It doesn't matter whether you have all the tools or formal training, you have got to get ready and do the best you can with what you have. You might be surprised with what you can do if you try.
It has never been much different. In the summer of 1997 I did my first data warehouse project for a software company developed software for higher education. Some of us did not even know it was a data warehouse when we started. Our project was simply an application to collect student data from the transaction system so that the college could use it to create statistical reports. At that time, Bill Inmon had just introduced the OLAP concept and data warehouses were new to most people; there were not many BI tools to help us do the job. Some of us on our development team never heard of data warehouses before. We just buckled down and did a lot of research and studying to learn what we had to do and how to do it.
The Blind Led the Blind
There were twelve people on the team including a project leader, a
system architect, a business analyst, an Oracle DBA and database
developers. I don't think anyone could be considered a data warehouse
expert. The company had two transaction systems - one was written in
COBOL and used VSAM files, the other one used Oracle Forms and an
Oracle 7 database. The new product had to work with both transaction
systems. We decided to build our data warehouse using Oracle 8i and
learned what a star schema was.
The business analyst and the transactional developers worked together to determine which dimensions were needed. The main dimensions were time, student, geographic, demographic, college department, course, and faculty. Some of the dimensions had hierarchy built in so users could drill down for more detail. For example the geographic dimension had country, region, county, state, city and zip code members. The measures were students count, and tuition or financial aid money.
Almost every step was a challenge. We had to learn any way we could. We built Meta Data tables that described each field in the data warehouse tables. These tables contained valid values for certain fields in the data warehouse as well as valid values in the transaction systems so we could transform them into one unique value in the data warehouse. Since Oracle had Web Application Server, the developers were allowed to access Oracle database by using PL/SQL with embedded HTML tags to create web pages. This allowed the user to input and update the Meta data tables very easily through the web.
The ELT (Extract, Load and Transform) system was built using two different languages. It was a particular challenge and we really had to wing it. The developers used PL/SQL to extract data from the Oracle transaction system and put it in staging tables. For the main frame transaction system, the developers wrote COBOL programs to extract its data into text files. Then they used SQL loader to load the text files into the Oracle staging tables. All the staging tables were cleaned and transformed using values from the Meta Data tables. We used PL/SQL to read the staging tables that contained transaction values, and used the Meta data table to do the cleansing and transformation. We had an error report to show which records had problems. The clean records were then loaded into dimension and fact tables by PL/SQL.
The web GUI interface was another special challenge. Not only were data warehouses pretty new back then, but web interfaces were not exactly standard features. One of the requirements was that the users could build their own query by selecting either two dimensions and one measure or one dimension and two measures. The GUI interface was written using PL/SQL and JAVA. The user selected the dimension and measure from the web and the JAVA program passed them into the PL/SQL program using http, just like opening a web page but with the input value in the link e.g., http://www.abc.com?name='abc'. The web page would actually be the PL/SQL program because we used Oracle Web application server. The PL/SQL program got the dimension and measure, did a dynamic query, then sent the information to a static web site with data only - we had symbols to distinguish rows and columns. Then the JAVA program would read that web page and generate the result on the user web page. The users could choose to display the result in tabular or graphical chart format. Also the users could save their favorite query and put it on the menu so they did not have to choose the dimension and measure over and over again. I always thought this module was very intuitive and was proud of it.
The DBA certainly did his part. Oracle had just released 8i at that time so the DBA was able to partition the data warehouse using the time dimension. The DBA also took care of creating the user security to check which user had the right to sign on to the system. One of the developers had developed a role based security using PL/SQL so different levels of the university personnel could have different security levels, for example the university president could see everything while the dean of a department could only see the data in his own department.
If you were to ask any of us if we were ready for data warehouses in 1997, I think you would have got a universal "no way" answer. But we did it, and we enjoyed it, even though we had to work like crazy and did not know what we were doing much of the time. The point is that we learned what we had to learn, and data warehouse was on the platter. BI is now on the platter, and I am sure if you want to stay in IT you will learn it.
Today it's easier than yesterday - so get ready!
There are a lot of easier ways to build data warehouses today. There are even books that clearly explain what you have to do, step by step. You can argue long and hard about what tools are best and what database is easiest to use for your data warehouse project.
But whatever database you use or tools you are given, the concept of building the data warehouse is the same. You design and build dimension tables, fact tables and Meta data tables. You mostly use the star schema. There must always be an ETL process and it will always be difficult. Cleansing and transformation procedures will always be required to guarantee the data quality. Nowadays there are so many ETL tools for cleansing and building the data warehouse it is difficult to keep track of them. There are so many software products to choose to build all kind of BI reports - web analysis reports, planning reports, forecast reports, interactive reports and even dashboard reports. But so what! I do not need those tools to build a very dynamic data warehouse and report today any more than I did twelve years ago. And neither do you! Start with what you have.
Twelve of us spent two years working and learning on our first data warehouse project. It will take you some time too. We all learned a lot about data warehouses and so must you. You will be given your first data warehouse opportunity sometime soon. Learn all you can about it and all the BI technology associated with it now. Don't worry about the tools you don't have. Before you get that opportunity, get ready!