column count per table best practice

  • I want to design a database for an OLTP app where it mainly involves customer and transaction. I know we have up to 1024 columns in a table, but I couldn't yet figure out what's best practice performance wise.

    e.g. for payment table, I thought maybe I should split it into two tables like payment_primary and payment_secondary where I'd store those columns which are accessed most per transaction (10 columns) in primary, and put the rest in secondary table...  same for "person" table where there are about 40 columns.

    Would this help in performance anyway or its just not necessary and would complicate things in future?

     

     

    • This topic was modified 3 years, 5 months ago by  Ben.
  • Hello hamedavodi

    In principle, it is best if all columns fit within one data page.  However, most important is to size columns as small as possible.  For example, where you can, use small or tiny int in preference to int.   Watch sizing on decimal data types.  Gets more complex with data types such as varchar, nvarchar.  Blobs are held separately so don't 'count' in the same way.

    At the design stage, would be reluctant to artificially split a table 'for performance reasons'.  There are other techniques.  For example using column store indexes and using include columns with non column store indexes.   Prefer to keep the design clean and only split, denormalise etc if the performance improvement can be demonstrated.

    All the best with your project.

    Bredon

     

  • There is no precise guidance. A table could be as little as one column or as many as the max, 1024. However, in general, you'll probably see no more than 10-15 columns in a table in a well normalized database. But there are exceptions, so I wouldn't get too hung on it. Focus on normalizing appropriately and let the column count fall where it falls. Less is generally better. Smaller data sizes, as was mentioned, are generally better. Other than that, just build out the structure.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ben wrote:

    I want to design a database for an OLTP app where it mainly involves customer and transaction. I know we have up to 1024 columns in a table, but I couldn't yet figure out what's best practice performance wise.

    e.g. for payment table, I thought maybe I should split it into two tables like payment_primary and payment_secondary where I'd store those columns which are accessed most per transaction (10 columns) in primary, and put the rest in secondary table...  same for "person" table where there are about 40 columns.

    Would this help in performance anyway or its just not necessary and would complicate things in future?

    Forget about the column count.  What's important is "Normalization" and datatypes.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you all!

    One more question though not fully related to this topic, for UNIQUE fields, is there a difference to define UNIQUE at column definition and/or while defining index? Are they both necessary or just one will do the job?

  • Ben wrote:

    Thank you all!

    One more question though not fully related to this topic, for UNIQUE fields, is there a difference to define UNIQUE at column definition and/or while defining index? Are they both necessary or just one will do the job?

    To be clear, "fields" are not and cannot be defined as "UNIQUE".  Only CONSTRAINTs are.  Yep, I know... seems like a small difference but the difference is important.  Unique constraints are enforced by a unique index that is automatically created when you create the constraint.  You can have a unique index without having a named constraint but you cannot have a named constraint without a unique index to enforce it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Post withdrawn... gotta check on something...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... I'm pretty sure that I was correct so here it is again.

    You can't actually define a "field" as being unique.  Instead, an unique index must be created.  That can be done in one of two ways... either create the index in a standalone fashion or create a unique constraint, which will also create a unique index to enforce the uniqueness.

    There are some actions like defining a PK that will automagically create both a constraint and the underlying index whether you explicitly name the constraint or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for researching and posting the valid answer 🙂

  • Presumably you are defining the PRIMARY KEYs as such, rather than using UNIQUE constraints?  PRIMARY KEYs 'default' to being clustered.  Broadly speaking clustering improves performance.  These days, for permanent tables, I use PRIMARY KEY and, for any alternate keys, UNIQUE.    As Jeff says, both these constraints will create associated unique indexes.

    There are special cases.  For temporary tables, I use unique indexing for both.  As recent as SQL Server 2016 I have been caught out with odd concurrency problems when using temporary tables with these constraints.    Concurrency problems are tricky.  For table variables, there is no option but to use constraints.  That said, reasons behind these two special cases may change in the future.

     

  • Bredon wrote:

    Presumably you are defining the PRIMARY KEYs as such, rather than using UNIQUE constraints?  PRIMARY KEYs 'default' to being clustered.  Broadly speaking clustering improves performance.  These days, for permanent tables, I use PRIMARY KEY and, for any alternate keys, UNIQUE.    As Jeff says, both these constraints will create associated unique indexes.

    There are special cases.  For temporary tables, I use unique indexing for both.  As recent as SQL Server 2016 I have been caught out with odd concurrency problems when using temporary tables with these constraints.    Concurrency problems are tricky.  For table variables, there is no option but to use constraints.  That said, reasons behind these two special cases may change in the future.

    If by concurrency problems you meaning because constraint name must be unique in a database, you can create constraints such as PKs without naming the constraint.  You can also create multiple identically name indexes on different tables with no problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ben wrote:

    I want to design a database for an OLTP app where it mainly involves customer and transaction. 

    Then definitely start the design before tables.  That is, do a logical data model first.  That is part of the process of normalization, although a lot of people that claim to do "normalization" skip the logical design.  But that is a terrible mistake (and skips a critical step without which you didn't really normalize at all).

    To do a logical model, you will need to define all the data attributes ("columns") you will need, as best you can determine at this point, and then determine entities (which later become a table(s)) (*) and assign those attributes to the entities based on keys you've chosen for the entities.  The steps of normalization will guide you in properly determining this; indeed, that is why those steps, and normalization itself, exist.

    For example, it's pretty clear you'll have a "customer" entity and a "transaction" entity (transaction will likely be a supertype entity, with subtypes of "sale" and "return" at least).  Now, look at the data attributes you have and create new entities / assign them to existing entities as the normalization process helps you to do.

    As to entity names, you can use either singular or plural but be consistent.  Either all are singular or all are plural.

    When you convert the logical model (entities) to a physical model (tables), table names should all be plural.

    (*) It's technically an "entity type" of customer, etc., not an entity.  An entity is technically one actual, real-world instance/occurrence of an entity type, but you can ignore that, just call it an "entity" like almost everyone else does.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You are absolutely right! I should do this before taking any other step now.

    I can see what you mean in what I've done so far because, even though I had a lot of concept notes in advance, after typing a few tables in couple of schema, I already went over it again and again while being forced to revise as always something came up which I didn't foresee! And in the end, I still have the feeling that something will come up again... 🙁

    Well, it's my first experience to built something that will worth mentioning... so probably I should get used to this!

    I'll probably post couple of tables after I'm confident on them (for my level), though an MVP developer will still point out lots of revision 😉

    Thank you!

  • Great, sounds good.

    Again, though, your initial design should NOT be "tables", it should just be data.  It should be at a business level, not at a techy level.

    If at all possible, meet briefly with business people and ask them what data they expect to need: order dates, customer numbers, p.o. nums, etc..   It's 100% wrong to meeting only (or mostly with tech / development people to develop a logical model, you want to identity business data and business requirements, not tech specs.

    Thus, it's also critical to not consider performance during a logical design (and thus not indexes, constraints, etc.).  Keys in the logical model are just unique identifiers, or potential identifiers, for an entity, such as a customer number or a product code.  Performance is irrelevant: computers are assumed to be as fast as needed, no adjustments for disk speed, etc..  Design the data first, not the app / implementation.

    There are many tutorials online for how to go thru the steps of normalization.  Some of them are, frankly, rather poor, but at least will help get you started with the process.  Be patient, it will take you some time to get used to the process, but once you do you will have a vastly better table design and total system when you are done.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Heh... so THAT's the reason why so many people end up with "slow databases" and IDENTITY columns everywhere.  😉  A lot of the important stuff isn't even considered up front.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply