Simple Junction table design question

  • I have a disagreement with the programming team on an app that I designed. It's not a major disagreement. It's more of a design question.

    The database that I designed has many tables. Two of the tables are:

    • Communities - Which is a list of communities where people can live
    • Persons - Which is a list of people who live in those communities

    Because a Person can live in multiple communities there is a many to many relationship between these two tables. Therefore I created a table called: PersonCommunities that serves as a junction table.

    The PersonCommunities has the following fields:

    • ID
    • CommunityID
    • PersonID

    I was surprised to find that the programmers removed the ID field from the PersonCommunities table. When I asked them about it, they said that the ID field served no purpose. Since I always create tables with an ID field I'm not sure if they are right or I am right. Are ID fields necessary on junction tables?

    Thanks

     

     

     

     

     

    • This topic was modified 1 month ago by  eichnerm.
  • ID isn't necessary.  The combination of CommunityID and PersonID should be unique, because a person cannot belong to a community multiple times.  Since you already have a unique key, I see no point in adding an additional unique key, especially an artificial one.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Yep, definitely remove ID.  The key to a "relationship" table should be the relationship keys, NOT an ID column.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • They are probably right. There is probably no value in that column (not to mention the ambiguous "ID" name) unless you are using an ORM or other tool/framework that insists on a single-column primary key.

    There is a popular tendency to throw an identity key ID column on every table. That can be great for performance, and even efficiency of writing queries, when there is no efficient natural key (one or few "small" columns -- not big strings, nor even many otherwise "small" integer columns). Joining on a 4-byte integer is much more efficient than joining on a 100 byte string or a dozen 4-byte integers.

    Other than the cases mentioned above (ORM/framework requirement), an identity-key id may be useless in terms of performance for junction tables where the only joins or filters are on the primary keys of the tables being joined or perhaps other related columns, and that identity key column is never referenced. Or on a lookup table where the natural key is a tiny string -- e.g., U.S. state abbreviations (which are only two bytes char, and rarely change over one or more lifetimes).

  • Thank you all for getting back to me. I appreciate the advice. Because of your advice, I won't make a big deal out of removing the ID. Oddly, I spoke to my brother in-law today. He was appalled by the idea that they didn't have an ID field. Unfortunately, I couldn't understand his explanation of why removing the ID field was a problem.

    Thanks again.

  • eichnerm wrote:

    Thank you all for getting back to me. I appreciate the advice. Because of your advice, I won't make a big deal out of removing the ID. Oddly, I spoke to my brother in-law today. He was appalled by the idea that they didn't have an ID field. Unfortunately, I couldn't understand his explanation of why removing the ID field was a problem.

    Thanks again.

    Just to make you feel better about getting rid of the ID column on this "bridging" table...

    People are trained to have a guaranteed "uniquifier" for a table and they have been trained to make it as narrow as possible and only ever increasing as well as a couple of other things.  It has been taught as a "Best Practice" for decades.  The problem with such "Best Practices" is that a lot of people simply stop thinking about alternatives and forget that such "Best Practices" are NOT a panacea.

    Obviously, the two table id columns combined will make a "unique" pair of columns which will also make a a nice clustered primary key.  It might end up being mostly "ever increasing" according to the inserts on one of the other tables depending on the leading column of the 2 column clustered PK. (You may need a non-clustered index with the opposite column order as well and, yea, that will double the footprint could be very worth it depending on your workload.

    People also use an IDENTITY column to control fragmentation at some expense to queries but it's also not likely that this table will suffer massive page splits throughout and, if it happens, logical fragmentation really doesn't matter on this table except if you have "spinning rust" for hard drives and then it will usually only matter if you retrieve thousands of rows at a time and then only the first time for the day (or however long it stays in cache).

    Scott Pletcher and others have been talking about how people waste a good constraint and a good clustered index on identity columns for years.  That's also not a panacea but he's right that having an ID column by rote should not be a panacea.

    There are other places where you'd be totally off your rocker to not have and ID column.

    As with all else, "It Depends".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree with the advice given so far, and have a couple more ideas for you to consider.

    1. Had you named your tables in the singular (Person, Community), your junction table's name would be consistent with those names (PersonCommunity). Also, in my experience, this is the more common way of naming tables.
    2. ID is an abbreviation for 'Identity Document' (such as a passport). When naming columns, you are abbreviating the word 'Identifier', and therefore 'Id' is a more accurate form.

     

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • I believe the more common, and better, naming for tables is plural.  I can't think of a major relational dbms that uses singular system table names.

    For example: sys.objects (SQL Server), DBA_OBJECTS (Oracle), SYSTABLES (DB2), INFORMATION_SCHEMA.TABLES (ISO standard across all relational dbmses).

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks Jeff. You are 100% correct. I always put an ID column on tables without giving it much thought. In this case I clearly don't need it.

  • ScottPletcher wrote:

    I believe the more common, and better, naming for tables is plural.  I can't think of a major relational dbms that uses singular system table names.

    For example: sys.objects (SQL Server), DBA_OBJECTS (Oracle), SYSTABLES (DB2), INFORMATION_SCHEMA.TABLES (ISO standard across all relational dbmses).

    True! But I did say 'in my experience' – for user tables, the convention has been to use singular at all of the places I've worked.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Wow, that's exceptional.  I've never been at a place that had, say an "order" table vs. an "orders" table.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    I believe the more common, and better, naming for tables is plural.  I can't think of a major relational dbms that uses singular system table names.

    For example: sys.objects (SQL Server), DBA_OBJECTS (Oracle), SYSTABLES (DB2), INFORMATION_SCHEMA.TABLES (ISO standard across all relational dbmses).

    And for the record, I believe in the opposite.  I believe that a table should be named after what one row contains.  It also prevents complications when the simple addition of an "s" does not make a proper plural.  For example, Companys is not the proper plural for Company.

    And, just because MS did something that I believe is incorrect, doesn't mean that I'll make the same mistake. 😀

    I will, however, follow the "standard" that a company has adopted provided that it doesn't produce a danger to the data or the system the data resides on because there are bigger battles to be won.  I am, however, fairly well disgusted at "tbl-ing" and having dashes or spaces in SQL object names .

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply