Introduction
These days there are so many companies trying to build data warehouses and there are a lot of companies selling Business intelligence products. The new version of SQL Server 2005 has the Business Intelligence Development Studio and this is one of its main upgrade from SQL Server 2000. It enhanced the DTS environment and changed its name to Integration Services (SSIS), which is similar to an ETL tool. It integrated with .NET so it can easily build reports or online applications. Microsoft is definitely putting its energy in the data warehouse business.
Also SQL Server is gaining on the market share of other RDBMSs and most experts said it was because of the BI integration. However, does your company need a data warehouse? And how do you choose the right BI product for your company?
First it must be the business direction that produces the driving force to build the data warehouse and not the IT department. The company must identify the goals and objectives of 'why install a data warehouse'.
For example, my old company was a bank; it decided to build a data warehouse so it could keep track of a loan from the application process until repayment. One objective was the business wanted to find out how long it took the loan from application status to approval status, and if the loan got denied, they wanted to find out what the reason was. However when the company decided to build the data warehouse, the IT department helped the organization to understand the scope of the project. The business has to identify what the enterprise needs. It has to decide how much they want to spend on the project, what kind of data they want to capture, etc. This helps to determine what kind of hardware, operating system, communication links, RDBMS and application program IT needs. The business also has to understand that it is a continuous project, both business and IT has to decide who is responsible to fix problem, who is responsible for new requests and who is responsible for tuning. It needs continuous support from the business.
The IT department has to know what they need in order to build a successful data warehouse. First they need to understand the business data. It needs an implementation plan, a realistic timeline, a team of people with knowledge to build the data warehouse - a data architect, a data modeler, business analyst, data quality administrator, meta data administrator, security administrator and good database developers. The scope of the project has to be reasonable; sometimes the business tends to ask too much at once, the IT manager has to understand one thing - never takes on too much too fast. It tends to fail.
Data Quality
Data is an important asset to the business; it helps the business to make daily decision and direction of the organization. Data quality is the most important element of the data warehouse. Data quality and integrity can directly affect many decisions. The data warehouse is not just capturing data. It can also minimize data redundancy, increase customer satisfaction, improve data quality, and provide good customer reports. So the business needs to set up a metrics to measure if the data warehouse is successful. The response time, the quality of the data, the satisfaction of the customers are some of the measurements the business needs to look at. If the data is incorrect and no one trusts the data warehouse, it becomes a whole bunch of tables sitting in a database.
In a lot of companies, the data can come from ERP systems, legacy systems, outside sources and even may be Excel sheets on someone's PC. The data warehouse team has to take responsibility to validate the data, to work with the business to set up business rules and to work with data owner to set up security for the data. This is part of the ETL process.
For example, the ERP system had a customer name 'John Doe'; the legacy system had a customer name 'Johnny Doe'. Are they the same person? Did they have the same address? It they did, that meant they were the same person. This is part of the cleansing process. It is important because the customer count will be off if the data is not validated. It may mess up the reports for marketing, sales and the finance departments.
The database has to be well designed to eliminate redundant data. The data has to be standardized and consistent. It also needs to have business rules to process certain data. For example if the customer has an invalid zip code, should it be put in the data warehouse or should it be put in an error report until the data gets corrected. Meta data is one way to standardize the data; it also documents the source of the data, where the data stores in the database, the definition of the data and the business rule of the data. In this way, the IT department can easily identify if the data is in the wrong database or comes from the wrong source; it can also explain the definition of data to the customer.
Business Intelligence Products
So now the company decides to build the data warehouse, how do you find the right tool? First the company has to evaluate what RDBMS to use for the data warehouse. You have to know how much data does the company have, how often does the company need to load into the data warehouse and how fast it needs to load the data. There are SQL Server, Oracle, Teradata, DB2, and others. The hardware, the communication links and the support is also an important factor to consider choosing the right database. After serious consideration, my company decided to use SQL Server 2000 for our data warehouse because of the size of our data warehouse and the cost. We also look ahead the new features of SQL Server 2005 and its integration with .NET for future development.
Then we have to choose the ETL tool. There are a lot of ETL tools in the market, for example Ab Initio, Informatica Power Center, IBM WebSphere as well as good old DTS packages in SQL Server. One of the data warehouse experts had said that the important part of ETL is that the people must understand the data. The best tool is the tool that you can retrieve and load the data efficiently and correctly and is the most cost effective for the company. Do not just look at all the fancy and expensive tools, evaluate every aspect. Is the tool easy to set up? Is the tool easy to learn to use? If it takes a developer six months to learn to use the tool, is it worth the time and effort?
I used Ab Initio to move the same amount of data from Oracle 9i to SQL Server 2000 and at the same time I used DTS package to do the same thing. The response time was almost the same. But Ab Initio costs a lot of money and DTS package is free. I am not saying Ab Initio is not a good product, if the company has to retrieve zillions of data everyday, it is definitely a good product to consider. Just liked I go from place A to place B (approximately 10 miles) and I go there driving a Honda Civic. It is cheap and reliable. At the same time I can also go there driving a Hummer. Do I prefer a Honda Civic than a Hummer? Of course I would like to drive a Hummer if I am as rich as Bill Gates, but my bank account tells me a Honda Civic is good enough! But if I have to drive up to Mount Rushmore in South Dakota in December, I may consider another vehicle. In my case we use good old DTS packages and stored procedures to extract, cleanse and load our data warehouse and it is doing a very fine job.
A business intelligence tool is used to help the customers to access and analyze the data in the data warehouse. After you build the data warehouse, then you have to choose BI tool. It must be the one provide the most benefits to the business and match the business expectations. The tool has to provide security and privacy of the data. The users can access it and learn to use it fast. If it does, then it is a good business intelligence tool. As I said before SQL server 2005 has Business Intelligence Development Studio to do the job. There is also a lot of other companies selling BI products; for example Cognos, Hyperion, and MicroStrategy. The business and IT have to evaluate which product works the best for the customers. Right now we use Cognos for reporting, but with the rising cost of Cognos and with the new features in SQL Server 2005, who knows, anything can change in the future.
Well, does your company need BI and DW? You have to ask your CIO and CTO, and it is up to you to explain to them the importance of a data warehouse to your company, also you have explain all the pros and cons of choosing the right product.