A Design Question - To Use an Identity column or not

  • meichner (6/8/2010)


    I am using SQL Server 2005. Your idea however was interesting. I didn't know about the sparse setting.

    Ah, sorry. Sadly, it is only available in 2008.

    In older databases I do tend to split such columns off into their own tables so I don't end up with tons of NULLs, however if I'm going to need that column in most of my queries, I'll likely put it in the main table to avoid all the extra joins.

    If you have the time, try it both ways and test to see if there's a performance degradation with it in a separate table or if there's a large size increase in the table with it in the main table. If there are no space or performance issues, it becomes a matter of personal style and taste.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • think widely. I have had cases where we do store data in a table that is relevent to that entity, but is rarely queried. Keeping it in the same table, when 90+% of the time it isn't queried, makes the clustered index bigger, slows reads, increases IO, etc. Especially when you have other data that you can't, or don't want to cover in indexes because there is more of it than necessary.

    Vertical partitioning makes sense if you are not always querying both tables. If you are, then it doesn't work as well, but if you often need a fraction of the data in an entity, it makes sense. It's not "complicated"

  • Steve Jones - Editor (6/8/2010)


    think widely. I have had cases where we do store data in a table that is relevent to that entity, but is rarely queried. Keeping it in the same table, when 90+% of the time it isn't queried, makes the clustered index bigger, slows reads, increases IO, etc. Especially when you have other data that you can't, or don't want to cover in indexes because there is more of it than necessary.

    Vertical partitioning makes sense if you are not always querying both tables. If you are, then it doesn't work as well, but if you often need a fraction of the data in an entity, it makes sense. It's not "complicated"

    Thanks Steve. What you said makes a lot of sense.

  • Steve Jones - Editor (6/8/2010)


    think widely. I have had cases where we do store data in a table that is relevent to that entity, but is rarely queried. Keeping it in the same table, when 90+% of the time it isn't queried, makes the clustered index bigger, slows reads, increases IO, etc. Especially when you have other data that you can't, or don't want to cover in indexes because there is more of it than necessary.

    Vertical partitioning makes sense if you are not always querying both tables. If you are, then it doesn't work as well, but if you often need a fraction of the data in an entity, it makes sense. It's not "complicated"

    'Zactly. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones - Editor (6/8/2010)


    think widely. I have had cases where we do store data in a table that is relevant to that entity, but is rarely queried. Keeping it in the same table, when 90+% of the time it isn't queried, makes the clustered index bigger, slows reads, increases IO, etc. Especially when you have other data that you can't, or don't want to cover in indexes because there is more of it than necessary.

    Vertical partitioning makes sense if you are not always querying both tables. If you are, then it doesn't work as well, but if you often need a fraction of the data in an entity, it makes sense. It's not "complicated"

    OK, I'm a bit thick here . "increases IO" would you please explain this one a bit more.

    I see this would be the case if the query is simply SELECT * even when some (mostly empty) columns are not even wanted in the first place. But if they are not even queried, then why should I/O increase ?

    And why should clustered indexes become bigger as opposed to maintaining to a second set of clustered indexes in the second table ?

    By " complicated" I meant an extra JOIN in the statement. If you are already combining other tables with JOINs and some of these tables are also " vertically partitioned" then the number of JOINs increases.

    Thanks for your time and Regards.

  • Happy to elaborate:

    I have an object/entity with 20 fields, perhaps some larger, varchar(max) or large char fields, that exits in one table. I build a clustered index, which is the data. So let's guess and say that I have 1k rows to keep things simple.

    I query, need 800 rows (keeping math easy). It doesn't matter if I grabbed these row IDs from a nonclustered index or this is a scan. I need to load 100 pages of data from the IO system, or keep 100 pages in RAM. Possibly some read ahead as well, potentially snans through the table to get there in the IO system.

    Now, if I separate out 8 fields that I use often, leaving out perhaps things like an address, notes, etc. that may be appropriate to the entity at a 1:1 level, but I don't often query. Or I rarely query. I might cut down the size of the rows to 200kb. So instead of getting 8 rows/page, I get 40 rows/page. So every IO request, every read ahead, everything is pulling in 5x the number of rows. To get the same 800 rows, I need 20% of the IO that the larger rows need. Across a busy system, that can severely add up.

    Now I'm not talking this:

    CREATE TABLE Customer

    ( CustomerID int

    , CustomerName varchar(20)

    , Active int

    , Address varchar(200)

    , ...

    )

    CREATE TABLE CustomerDetail

    ( CustomerID int

    ,LastSale datetime

    , LastOrderID int

    , Notes varchar(max)

    , ...

    )

    where I'm dividing the table in half, or balancing fields. What I'm more talking about is splitting off rarely queried items:

    CREATE TABLE Customer

    ( CustomerID int

    , CustomerName varchar(20)

    , LastSale datetime

    , LastOrderID int

    , Active int

    , ...

    )

    CREATE TABLE CustomerDetail

    ( CustomerID int

    , Notes varchar(max)

    , Address varchar(200)

    , City varchar(200)

    , ...

    )

    They key is those items which are rarely queried, and you will find quite a few of these in many entities, are moved off so that they aren't a part of the clustered index. This reduces IO, and causes substantial savings as you get larger tables and larger loads.

  • " I might cut down the size of the rows to 200kb"

    I guess you meant 200 bytes, not kb (just to show off, I did pay attention :-).

    And now the time has come for me to delve into how the I/O system is involved when getting the data from a row.

    Thanks for your patience.

  • Yep, sorry, 200 bytes.

    It's always work, and on smaller systems, small tables, it's not enough with lots of hardware to matter. But it raises the level to which you system can scale.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply