Use schema to distinguish between dimension and fact tables

  • Hi to all,

    Long time reader of various SQL Server Central topics but first time contributor.

    I've been designing and developing data marts and data warehouses for many years now and lately came into a questionable (in my opinion) design forced by one of my customer. To make a long story short, their base design is the star schema, which I'm entirely fine with, but with dimension and fact tables placed into distinct schemata (apparently, that's the plural form of 'schema'): dimension tables are in the 'Dimension' schema and fact tables are in, you guessed it, the 'Fact' schema.

    In short, they are using schemata as prefixing devices, nothing more. To achieve the same result, I usually name dimension tables Dim... and fact tables Fact..., which I feel is pretty typical, but, quite frankly, this is merely a convenient mnemonic device and I could easily get away without using any prefixing scheme whatsoever. On the other hand, I've never ever considered using schemata to do so. In my mind, schemata are logical containers mainly (dare I say solely?) used for ownership/security considerations: since I cannot think of any useful scenario in which fact tables would be accessed without being somehow joined to dimension tables (in a MDM context, I could think of viable scenarios the other way around but that's beyond the point), I feel using a 'Dimension' schema for dimension tables and a 'Fact' schema for fact tables (with that logic, why not a 'Bridge' schema for bridge or outrigger tables?) is, I don't know, cheap and "gimmickish".

    And what about performance? Would constantly joining dimension and fact tables in distinct schemata somehow impact the query execution plan? Initially, I don't believe so but I might be wrong (and in all honesty, I'm a bit too lazy to test it at this point).

    Anybody wants to share his/her thoughts on this matter?

    Regards,

    AL

  • Can't speak to the performance, but I tend to agree though - I could understand a 'datamart.' schema, where all of these related objects could live, but to have fact and dim schema(ta) seems to be illogical, because you're now separating, via schema, objects that IMHO _must_ always be used together. Even using a 'datamart' schema would lose all value to me if the only objects in that DB _were_ datamart objects (ie not trying to fit a DM inside an existing OLTP DB).

    Steve.

  • As you do I prefix my factual tables as FACT_ and my dimensional ones as DIM_, I also use the STG_ prefix for the staging ones; God knoww there are other prefixes out there for bridge tables, snowflake dimensions, operational data storage, etc.

    I'm having problems to understand what would be the value of creating separate schemas for each set of tables. I do not feel either there would be any performance issues by deploying that way but, I'll stick with my current methodology.

    What I have seen is having separate schemas for different datamarts; problem is, some tables like DIM_DATE are by design expected to be shared so, what to do with them? put them in a shared_resources schema?

    My take is, naming convention should suffice to tell both the type and content of each table in a consistent way - if you do that you will be just fine.

    Just my two cents.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Just came here to say that I too use the prefixes of fact_ , dim_ and stg_ for my tables. I've played around with using schema's as a form of defining the source of the data in the tables, and regretted it at the point of no return.

    fact_, dim_ and stg_... can we make this an unofficial standard and pass it along as accepted practice when teaching others? 🙂 I'm in.

  • I don't see a real (technical) problem with using schema's to seperate facts and dimensions. It's just a question of taste I guess.

    I use schema's to seperate data marts and yes, I have a seperate schema for conformed dimensions 😀

    Within those schema's, I use prefixes to distinguish between facts, dimensions, bridges et cetera.

    But I think schema's don't really have that security aspect anymore, that was more in the earlier versions of SQL Server. So I just use it to custom order my tables 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • [jeff122877]: You seem to mention regretting using a schema based approach to a point of no return: care to explain that a little bit further?

    Anyway, thanks for your reply.

    AL

  • Well, thanks to everybody who shared thoughts on this matter.

    I guess the bottom line is using a schema based approach as opposed to a prefix based one is more a matter of taste than anything else. I still feel a schema based approach is a cumbersome mechanic, while prefixes are so simple to use and, quite frankly, pretty standard. I could recognize some value in using a schema based approach in a data warehouse composed of multiple data marts, say, one schema per subject area, but then you would have the problem of conformed dimensions shared across subject areas: I guess one could solve this by placing shared dimensions in a 'Shared' schema but since it is likely that many conformed dimensions would be shared, might as well named that 'Shared' schema, 'Dimension', which leads us back to the start of this discussion...

    Maybe it's just me but I still feel a schema for dimension tables distinct from one for fact tables is somehow wrong: but my "killer" argument here might be nothing more than a hunch. That being said, I can think of some introspection tools and SQL generator wizards (granted, old) that would have a hard time in automatically determining relationships in tables in separated schemata: that alone might actually be a good argument in favor of prefixes.

    Again, thank you all for your time. Appreciated.

    AL

  • fact_, dim_ and stg_... can we make this an unofficial standard and pass it along as accepted practice when teaching others? I'm in.

    I use ftblName, dtblName, and my staging tables don't have a prefix at all and are normally in a temp schema, and in a staging database to boot.

    I've used fact and dimension in tables in different schemas to represent different customers when that customer for whatever reason had their own cube, but normally I have them all under dbo.

  • alexandre.lepage 4666 (5/17/2011)


    Maybe it's just me but I still feel a schema for dimension tables distinct from one for fact tables is somehow wrong: but my "killer" argument here might be nothing more than a hunch.

    When in doubt - trust your instinct. That's what I do and it works fine most of the time.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I also use dim_ and fact_ prefixes/suffixes on tables as a practice, whichever my client prefers based on their naming standards.

    I have seen schemas used only once in a data warehouse where it made sense. This particular DW had multiple data marts (stars) where the marts had no conforming dimensions or other relationships between them. The schemas made it clear which subject area you were working with, and they helped because some table names were slightly similar, so if you worked in the right schema you wouldn't have any confusion as to what data you were querying.

    LinkedIn - http://www.linkedin.com/in/carlosbossy
    Blog - http://www.carlosbossy.com
    Follow me - @carlosbossy

  • Carlos Bossy (5/19/2011)


    I have seen schemas used only once in a data warehouse where it made sense. This particular DW had multiple data marts (stars) where the marts had no conforming dimensions or other relationships between them. The schemas made it clear which subject area you were working with, and they helped because some table names were slightly similar, so if you worked in the right schema you wouldn't have any confusion as to what data you were querying.

    That's a very good example on when schemas make sense - nice post Carlos.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply