December 23, 2011 at 7:27 am
there are a lot of popular data warehousing techniques that are plain wrong.
This is a very dangerous statement. There are very few things in our area that are "just plain wrong." What exactly makes a technique wrong versus inadvisable. It is completely dependent on the business needs. Following a text book about how a database/warehouse should be designed will get you fired if it does not meed the requirements of the business.
Denormalizing is very rarely a good practice.
I think you forgot to finish the sentence and yet again, a sweeping statement that completely depends on the use case and business needs.
Jared
CE - Microsoft
December 23, 2011 at 1:31 pm
Dev (12/22/2011)
Lamprey13 (12/22/2011)
Koen Verbeeck (12/22/2011)
<snip>It is perfectly acceptable to have a denormalized database design in a relational database, for example in a data warehouse. SQL still works, and it is still pretty performant (even more than in a normalized database, as less joins are used to retrieve data)
I would have to disagree with you on this. Perhaps, that is just a bad example?
I'm not trying to say anything about you or your knowledge about data warehousing, but there are a lot of popular data warehousing techniques that are plain wrong. Denormalizing is very rarely a good practice.
I have to disagree on your disagreement. Denormalization is serving BI domain for many years and many successful implementations been done based on it. It was Denormalization that gave birth to ETL else why somebody need to pull Transactional Data and populate it in another database, 10 times bulky than original. Business was already satisfied with Transactional Reporting (I didn’t say Analytical Reporting).
I assume (please provide the feedback) you work (or have worked) on Teradata database. Teradata (and few more) is one of database solutions provider that prefers their data warehouse in 3rd Normal Form. And their proprietary hardware design supports it well. Databases like SQL Server & Oracle that provide hardware & platform independent database solutions can’t follow the same methodology.
Dev, This is off-topic but I think worth pursuing. Denormalization is not the origin or the purpose of ETL (meaning the population of a data warehouse from source systems). As you probably know, a data warehouse is an Integrated, Subject-Oriented, Non-volatile and Time-variant database used for decision support. Denormalization is not and never has been a requirement for that. The usual reason for creating a data warehouse is because the source systems frequently don't have those four essential properties. It has nothing to do with denormalizing data.
The best data warehouses I have worked on and used have typically been designed to at least 5th Normal Form (or something close to it). A denormalized warehouse is generally harder to refactor and scale as the data model evolves and the ETL is almost inevitably much more complex and often difficult to maintain - which of course are all good reasons why people do create and use normalized data warehouses. Normalization is in my opinion indispensable if you need to manage efficiently 100s of entities for a scalable, evolving enterprise data warehouse.
I've never worked on a Teradata project. The data warehouses I've worked on were Oracle, DB2 or SQL Server based. I'm happy to confirm from experience that Oracle and SQL Server are perfectly suitable platforms for building normalized data warehouses.
When Lamprey criticises "popular data warehousing techniques" he possibly has in mind some of the once all too popular stuff that goes under Ralph Kimball's name - the relentless over-promotion and misguided application of which has caused many data warehouse projects to come to grief (and incidentally generated much $$$ revenue for consultants who have to fix those projects). The Kimball approach is just one subset of common data warehouse and BI techniques. Tellingly, it is as far as I'm aware the only methodology which embraces denormalization as a central characteristic of a data warehouse. Lamprey is therefore far from alone if he disagrees with that approach.
December 25, 2011 at 4:40 am
David thanks for sharing your views here.
I agree that you can find Normalized Data Warehouse in Industry. I also agree that Oracle & SQL Server database can be used for Normalized data warehouse. There is no conflict in view that De-normalization is NOT a requirement for data warehouse.
You can have a Normalized data warehouse but your data warehouse won’t be more than a Data Archival Server. You will kill your BI applications if you start mining your data. For OLAP analysis if I need to join 20 tables (5th Normal Form) that design is worthless.
I guess you are influenced by Bill Inmon and his CIF theory. I appreciate it too. But I read Ralph Kimball as well and I am not biased for Inmon or Kimball. I understand the difference in point-of-view (same is applicable in our case :-)).
My definition of data warehouse is pretty simple. I pull historical data (ETL) in another system for ad-hoc analysis and this analysis won’t be smooth with normalized structure.
December 25, 2011 at 7:36 am
David Portas (12/23/2011)
Dev (12/22/2011)
Lamprey13 (12/22/2011)
Koen Verbeeck (12/22/2011)
<snip>It is perfectly acceptable to have a denormalized database design in a relational database, for example in a data warehouse. SQL still works, and it is still pretty performant (even more than in a normalized database, as less joins are used to retrieve data)
I would have to disagree with you on this. Perhaps, that is just a bad example?
I'm not trying to say anything about you or your knowledge about data warehousing, but there are a lot of popular data warehousing techniques that are plain wrong. Denormalizing is very rarely a good practice.
I have to disagree on your disagreement. Denormalization is serving BI domain for many years and many successful implementations been done based on it. It was Denormalization that gave birth to ETL else why somebody need to pull Transactional Data and populate it in another database, 10 times bulky than original. Business was already satisfied with Transactional Reporting (I didn’t say Analytical Reporting).
I assume (please provide the feedback) you work (or have worked) on Teradata database. Teradata (and few more) is one of database solutions provider that prefers their data warehouse in 3rd Normal Form. And their proprietary hardware design supports it well. Databases like SQL Server & Oracle that provide hardware & platform independent database solutions can’t follow the same methodology.
Dev, This is off-topic but I think worth pursuing. Denormalization is not the origin or the purpose of ETL (meaning the population of a data warehouse from source systems). As you probably know, a data warehouse is an Integrated, Subject-Oriented, Non-volatile and Time-variant database used for decision support. Denormalization is not and never has been a requirement for that. The usual reason for creating a data warehouse is because the source systems frequently don't have those four essential properties. It has nothing to do with denormalizing data.
The best data warehouses I have worked on and used have typically been designed to at least 5th Normal Form (or something close to it). A denormalized warehouse is generally harder to refactor and scale as the data model evolves and the ETL is almost inevitably much more complex and often difficult to maintain - which of course are all good reasons why people do create and use normalized data warehouses. Normalization is in my opinion indispensable if you need to manage efficiently 100s of entities for a scalable, evolving enterprise data warehouse.
I've never worked on a Teradata project. The data warehouses I've worked on were Oracle, DB2 or SQL Server based. I'm happy to confirm from experience that Oracle and SQL Server are perfectly suitable platforms for building normalized data warehouses.
When Lamprey criticises "popular data warehousing techniques" he possibly has in mind some of the once all too popular stuff that goes under Ralph Kimball's name - the relentless over-promotion and misguided application of which has caused many data warehouse projects to come to grief (and incidentally generated much $$$ revenue for consultants who have to fix those projects). The Kimball approach is just one subset of common data warehouse and BI techniques. Tellingly, it is as far as I'm aware the only methodology which embraces denormalization as a central characteristic of a data warehouse. Lamprey is therefore far from alone if he disagrees with that approach.
All valid points, but I still believe that most business users will appreciate a simple star-schema with a fact table and some dimensions more then a 5th Normal form design to gather their data from for an ad-hoc analysis.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
December 26, 2011 at 7:06 am
Hi ,
Looking at what you are up to, I would plumb for a single product table with additional attributes supported by additional tables.
The reason why I would do this is when you get to ordering you will only have a product ID to reference from one table not 5 or 6.
In the purest sense each are distinct "entities" in their own right and so probably need their own table, but for practical purposes of managing the downstream processes I would follow the style noted above.
My cents worth.
Cheers
Peter
I work for 1st Consulting, a small ISV based in the UK. We build web and desktop applications using the MSFT stack. Find us at www.1stconsulting.biz
December 28, 2011 at 4:25 am
Dev (12/25/2011)
You can have a Normalized data warehouse but your data warehouse won’t be more than a Data Archival Server. You will kill your BI applications if you start mining your data. For OLAP analysis if I need to join 20 tables (5th Normal Form) that design is worthless.
My definition of data warehouse is pretty simple. I pull historical data (ETL) in another system for ad-hoc analysis and this analysis won’t be smooth with normalized structure.
It's worth bearing in mind however that there are many thousands of people successfully using normalized data warehouses for OLAP and ad-hoc analysis purposes. They don't all consider them "worthless" or mere "archives"!
5th Normal Form is typically the best structure for truly ad-hoc analysis because it minimises design bias - the tendency of denormalized designs to simplify some queries at the expense of making other queries more complex, ambiguous or inefficient. Denormalization is usually only applied when the database designer has some advanced knowledge (or assumption) about the type of analysis to be done and he constructs the database design around that knowledge. When the type of analysis changes then a schema designed in that way may prove unsuitable. A normalized design is the natural way to go if you want to support the most general set of future needs, maintain data integrity and minimise support costs and incremental development costs. That is the rationale for creating normalized data warehouses and confining denormalization to subject-specific or purpose-specific data marts.
Hope this helps.
December 28, 2011 at 7:11 am
Amazing! How this thread that started as a question about the design of a OLTP Travel application derived into a discussion about Data Warehousing is something that puzzles me 😀
Having said that - the discussion around normalized vs. de-normalized data warehouses goes back in time to Inmon vs. Kimball just to set a timeframe - trust me, we are not going to settle it today 😀
My position about this matter revolves around Deng Xiaoping's comment when he said "I don’t care if it’s a white cat or a black cat. It’s a good cat as long as it catches mice" and in my experience both approaches deliver.
Last but not least, remember that in the real world you have to deal with real world constraints and since for some reason the new generation of developers appears to be exclusively trained on the star-schema approach then; at least you have budget and time to retrain dozens of developers over time better to go with the flow and embrace the star-schema approach.
Let me make here a parallel... remember IBM's OS/2 Operating System?... much, much better and advanced that Windows was, isn't it? -- now tell me, why are you browsing this forum on a Windows or Windows alike based computer and not on an OS/2 one? Answer is simple, the market creates de-facto standards and there is nothing we can do about it.
Just my point of view.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.December 28, 2011 at 7:26 am
PaulB-TheOneAndOnly (12/28/2011)
Amazing! How this thread that started as a question about the design of a OLTP Travel application derived into a discussion about Data Warehousing is something that puzzles me 😀
Me too 😛
Having said that - the discussion around normalized vs. de-normalized data warehouses goes back in time to Inmon vs. Kimball just to set a timeframe - trust me, we are not going to settle it today 😀
Take your own time buddy 😉 However I am out of discussion 😛
December 28, 2011 at 10:51 am
Back to the original question:
Let's assume the travel ageny will offer cruises as well. How would you include it in the design you have in mind?
What would happen if a train ride with a sleeper instead of a hotel will be added to the list? All of a sudden you're faced with a "star-rating" requirement for a train...
I think the db design depends should also consider the business concept: if it's the concept to have "every month one new way to spend your vacation" the fixed table design might not be the best solution. But if the focus is to provide rather "standard solutions", the fixed table design may be a valid solution....
December 29, 2011 at 2:42 am
Thanks to all
I am following the generalization concept, it will be easy to implement and may be in future there will be new product, which can be easily implemented in this design(Supertype/Subtype)
December 29, 2011 at 4:56 am
Hi Dilip,
The design basically depends upon your requirement and keeping that in mind you need to decide.
In your current scenario - I would suggest to go for the second approach where you can have a dimension with "Travel Options" as an attribute of that Dimension. Link this dimension to all your Facts.
Advantage-
If in future a new travel option comes then you need not to create any new table or Dim.
It is more flexible and scalable.
Easily maintainable by the administrator.
In one of my project for a BPO organization, I was having the similar situation where I had different countries to be added in future(as the BPO business expands). So I had created a Dimension called DimCountry and had linked this with all the required facts.
Note:- Please make sure that all your transactional data(Fact Data) has the "Travel Option" as one of the attribute or with any other indirect relation with this new Dim.
I would suggest you can check with your customer as well.
Regards,
Chandrashekhar
BI Consultant
Viewing 11 posts - 31 through 40 (of 40 total)
You must be logged in to reply to this topic. Login to reply