Number of columns in table

  • I've been researching issues on the number of columns in a table vs performance.

    I have a Customer table, 87 columns, 10,000 records.

    Has (boolean) columns for whether customer wants certain newsletters. E.g. AllowSalesNewsletters, AllowGeneralNewsletters, AllowClearanceNewsletters

    I may add another 5-10 newsletter types, so that would require adding more columns to the table. I could create a related "NewsletterPermissions" table (with foreign key CustomerID), but then I would have 10+ rows in that table for each customer, and I would have to use JOINS all the time when retrieving or updating newsletter data for the customer (seems like a hassle).

    Any thoughts? Thanks.

  • I would go with the ancillary table. Yes, you would have to join to the table to know what newsletters a customer authorized, but the benefit is all you need to do is add a row of data when there is a new newsletter rather than having to add a new column to the table to reflect the new newsletter that is available. A DML change versus a DDL change.

    This is also part of normalizing your data.

  • Hmm.. thank you!

  • I've thought about it some more, and to create an ancillary table seems like much more work. If I have 12 different newsletters then each time a Customer record is created I've got to create 12 new records in the ancillary table. Then my ancillary table will have 10,000 * 12 = 12,000 rows. Seems excessive. Why not just have 12 columns in the Customer table. These are easily updated. There is no redundant data.... Add new column if a new newsletter is desired.

  • That sounds logical if you don't want a true relational database. Remember the database engine is designed to work well with joins. When you have to keep adding columns what happens to any view you have? And the stored procedures?

    Also what happens if someone created view and procedures with that bad habit, select * from tablename. The new columns won't appear until you recompile.

    It is also a bit of a nightmare to manage single large tables, hence the development of normalisation for database storage. This is why we use database engines and not just flat files. Remember most of the system lifecycle is in production so you want a system that is easy on the maintenance.

    Setting up this type of related data is the best practice method, divulge the data to a table where repeating values will occur.

  • matt6749 (4/6/2014)


    If I have 12 different newsletters then each time a Customer record is created I've got to create 12 new records in the ancillary table.

    Correct

    Then my ancillary table will have 10,000 * 12 = 12,000 rows.

    12000 rows of a couple of ints and a bit. Trivial in terms of size. 12000 rows is nothing.

    Why not just have 12 columns in the Customer table.

    What happens when 10 new newsletters are added with an extra attribute or two is needed for each one (don't say it'll never happen) 20 or 30 more columns?

    These are easily updated. There is no redundant data....

    So's the clientNewsletter table. It has no redundant data either

    Add new column if a new newsletter is desired.

    And change every single piece of code which refers to the newsletter columns. Adding the column's not the problem (well, it is in an environment with a strict change control process). Doing the impact analysis and changing all the referencing code, now that's hard.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you all... now I'm convinced! I'm going to use an ancillary table.

    Thanks again,

    Matt

  • If you really want to use a single table, you could use a single smallint or int column with a bit assigned to each newsletter. That isn't necessarily a terrible way to do it, it just depends on what you need.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (4/7/2014)


    If you really want to use a single table, you could use a single smallint or int column with a bit assigned to each newsletter. That isn't necessarily a terrible way to do it, it just depends on what you need.

    I have to disagree. Having it all in one table is a form of denormalization that should be avoided for a lot of the reasons Gail mentioned and more.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/7/2014)


    ScottPletcher (4/7/2014)


    If you really want to use a single table, you could use a single smallint or int column with a bit assigned to each newsletter. That isn't necessarily a terrible way to do it, it just depends on what you need.

    I have to disagree. Having it all in one table is a form of denormalization that should be avoided for a lot of the reasons Gail mentioned and more.

    Indeed, we disagree. I believe there are times when basic rules can be overridden for efficiency or another known purpose. As with all denormalization, it should be done by experts for a specific, known reason. I don't know whether this case truly qualifies or not.

    As things go more mobile to devices with much less relative RAM, there are huge advantages to bit-based approaches vs storing other complete structures.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (4/8/2014)


    Jeff Moden (4/7/2014)


    ScottPletcher (4/7/2014)


    If you really want to use a single table, you could use a single smallint or int column with a bit assigned to each newsletter. That isn't necessarily a terrible way to do it, it just depends on what you need.

    I have to disagree. Having it all in one table is a form of denormalization that should be avoided for a lot of the reasons Gail mentioned and more.

    Indeed, we disagree. I believe there are times when basic rules can be overridden for efficiency or another known purpose. As with all denormalization, it should be done by experts for a specific, known reason. I don't know whether this case truly qualifies or not.

    As things go more mobile to devices with much less relative RAM, there are huge advantages to bit-based approaches vs storing other complete structures.

    We aren't talking about storing the entire CRM database on a Smart Phone or other mobile device. How this information is transmitted and provided to a mobile device has no impact on how the database is designed and implemented on a server. That becomes an interface issue between the mobile device and the database/application.

    In this case I have to agree with Gail and Jeff. Plus, depending on how many newsletters you end up tracking even using a small int or int could result in DML changes down the road to handle additional newsletters.

  • Lynn Pettis (4/8/2014)


    ScottPletcher (4/8/2014)


    Jeff Moden (4/7/2014)


    ScottPletcher (4/7/2014)


    If you really want to use a single table, you could use a single smallint or int column with a bit assigned to each newsletter. That isn't necessarily a terrible way to do it, it just depends on what you need.

    I have to disagree. Having it all in one table is a form of denormalization that should be avoided for a lot of the reasons Gail mentioned and more.

    Indeed, we disagree. I believe there are times when basic rules can be overridden for efficiency or another known purpose. As with all denormalization, it should be done by experts for a specific, known reason. I don't know whether this case truly qualifies or not.

    As things go more mobile to devices with much less relative RAM, there are huge advantages to bit-based approaches vs storing other complete structures.

    We aren't talking about storing the entire CRM database on a Smart Phone or other mobile device. How this information is transmitted and provided to a mobile device has no impact on how the database is designed and implemented on a server. That becomes an interface issue between the mobile device and the database/application.

    In this case I have to agree with Gail and Jeff. Plus, depending on how many newsletters you end up tracking even using a small int or int could result in DML changes down the road to handle additional newsletters.

    I want the best design regardless of beginning "rules" or assumptions. You can never encode anything if you prefer, but I don't believe that's optimal. I'm opening to considering it based on specific circumstances.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thank you all.

    I'm going to have to create an ancillary table with 7 notification (child) records for each Customer record.

    Can anyone help me with a query that can insert these 7 child records for each parent?

    This query only inserts one child record for each parent:

    INSERT INTO Notifications (CustomerID, NotificationID, GetsNewsletter)

    SELECT Customer.CustomerID, 1 as NotificationID, true)

    FROM Customers

  • Does the NotificationID reference some sort of lookup table for the type of Notification (which I assume are your types of newsletters)?

    If so, your query could be constructed as such:

    INSERT INTO Notifications (CustomerID, NotificationID, GetsNewsletter)

    SELECT Customer.CustomerID, NotificationTypeID, true)

    FROM Customers

    CROSS JOIN NotificationType

    Note that this will result in X number of records for each customer, where X is the number of NotificationType records you have

  • Oh man, that works! You rock!

    Thanks, Matt

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply