January 19, 2012 at 2:14 am
Hi All,
I have a large dimension, with approximately 60, 000 records and a growth rate of 1600 a day (put another way, there will be around 1600 changes everyday). Would it be so big for anlaysis if I make it type 2?
Any help is appreciated!
January 19, 2012 at 8:20 am
mulukenn (1/19/2012)
Hi All,I have a large dimension, with approximately 60, 000 records and a growth rate of 1600 a day (put another way, there will be around 1600 changes everyday). Would it be so big for anlaysis if I make it type 2?
Any help is appreciated!
Depending on the type of application, 60k isn't really that much. I'm not 100% sure whether your question is "would it be too big for analysis" or "would it be that big"...so I'll address both.
Would it be that big? Yes, and bigger if you start tracking changes...which is by definition what a type-2 dimension is all about.
It won't be too big for analysis. Assuming the 1600 changes you mention is the amount of type-2 changes you anticipate, you will add roughly 500k records a year. There are many customer or product dimensions with a lot more data than that, and through proper indexing and design (choosing which fields to track changes on wisely) it shouldn't be a problem.
Hope this helps...
January 19, 2012 at 8:30 am
Hi Martin,
Thanks. Yes that was my question and you answered it correctly. It helps a lot.
In addition, this dimension will be joined to many fact tables. There are other dimension tables which are related to the other fact tables via the dimension I mentioned. Now, I decided to make them snow flaked. I think that is right? and I hope it will not be a problem performancewise? Now, there are some dimensions which have a many to many relationship with this dimesnion table, and with some of my fact tables for that matter. How do you think is the best to handle this situation?
Thanks once again.
January 19, 2012 at 8:44 am
mulukenn (1/19/2012)
I have a large dimension, with approximately 60,000 records and a growth rate of 1600 a day (put another way, there will be around 1600 changes everyday). Would it be so big for anlaysis if I make it type 2?
Do not worry about that volume - that's a really really tiny table.
On the other hand, the decision of doing Type 2 SCD on it shouldn't be driven by storage capacity but by business needs.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.January 19, 2012 at 8:47 am
mulukenn (1/19/2012)
Hi Martin,Thanks. Yes that was my question and you answered it correctly. It helps a lot.
In addition, this dimension will be joined to many fact tables. There are other dimension tables which are related to the other fact tables via the dimension I mentioned. Now, I decided to make them snow flaked. I think that is right? and I hope it will not be a problem performancewise? Now, there are some dimensions which have a many to many relationship with this dimesnion table, and with some of my fact tables for that matter. How do you think is the best to handle this situation?
Thanks once again.
Snow-flaking is always bad for performance. Personally, I only go the snowflake route if I need some of the "snow-flaked dimensions" available for other things.
A good example of this would be if I have a summarized fact table which needs to be populated by product type for instance. But even in such a case, I ensure that I have all the product type attributes in the product dimension as well...this eliminates the need for the extra join (or hop) and obvious overhead that goes along with that.
Bidge tables are the best way to deal with many-to-many relationships, and it isn't uncommon to have bridge tables between 2 dimensions or between a dimension and fact table. Presenting the information from many-to-many relationships to end-users will be the most challenging part, and you have to ensure that all your aggregations are multiplied by some factor to remain accurate.
Do some searching on many-to-many relationships in dimensional models, there's a lot of good material out there to help.
January 19, 2012 at 9:00 am
Oh yes, that is great. I hate anything that affects performance ;-). But again, what do you think if I denormalize those tables into the table which is used to link them, actually the 'large dimension table I mentioned'? It will have many columns and there will be many hierarchies created in that case. Do you think this is a good idea?
January 19, 2012 at 9:05 am
mulukenn (1/19/2012)
Oh yes, that is great. I hate anything that affects performance ;-). But again, what do you think if I denormalize those tables into the table which is used to link them, actually the 'large dimension table I mentioned'? It will have many columns and there will be many hierarchies created in that case. Do you think this is a good idea?
Heheh...your "large dimension" isn't that large 😛
In my opinion, denormalizing it into the "large" dimension is the right way to go. That is what I usually do...maybe I didn't articulate that well enough in my previous post.
Remember that it is always easy to create a snow-flake later...when you need it. Don't just snow-flake for the sake of it.
January 19, 2012 at 9:10 am
Great, that really helps. May i invite you for a drink 🙂
January 19, 2012 at 9:13 am
mulukenn (1/19/2012)
Great, that really helps. May i invite you for a drink 🙂
Hehe...sure...anytime 😉
Glad I could help a little...that's what this forum is all about.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply