Performance Tuning Big Table

  • Will converting varchar(8) to char(8) help? What is average length of the code values?

    What is exact scenario that exhibits poor performance on that table?

  • stolbovoy (4/2/2010)


    Will converting varchar(8) to char(8) help? What is average length of the code values?

    I mentioned that at 1117 yesterday. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Sorry, I missed it.

    Then just, "what is exact query that performs poorly?"

  • The OP has left the bulding...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey (4/2/2010)


    ...

    But seriously, I was thinking, and I'm assuming Lutz was thinking, adding a number or or substituting a number for the char field. Idle chatter really.

    Your assumption is correct. 🙂

    @TheSQLGuru

    It might be possible that the OP's requirement to use a character/number combination will be re-evaluated based on the information provided in this thread.

    So far it's unclear whether this requirement is a "have to have" or "that's how we're currently doing it" type...

    Either way, using a number as a clustered index has not that much to do with whether or not (s)he will use the CI directly or add the char field.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Everyone else seems to have had a go, so here are my thoughts:

    1. This table probably does not benefit from having a clustered index. If the insertions are essentially random, it will be hurting you a lot. If rows are never deleted, consider turning it into a heap. (Updates can never make a row wider so forwarded records will not be a problem).

    2. Use CHAR(8) instead of VARCHAR(8) to save on storage and fit more rows per page.

    3. Give the Code column an explicit binary collation like Latin1_General_BIN, unless case-insensitivity is required, in which case use a SQL collation like SQL_Latin1_General_CP1_CI_AI

    4. Split the table up. You should be running Enterprise Edition anyway, but if not, you can still use local partitioned views. Partitioning will decrease the index depth and make maintenance easier.

    5. Consider storing State = 0 rows separately from State = 1 rows.

    6. Consider adding a hash index: Add a computed column AS CHECKSUM(Code), do not persist it, but do index it. This will store a hash value in four bytes per row and allow you to check if a code exists faster. Since the column is not persisted, it will take no storage in the table itself. Code your routines carefully to use the hash index.

    7. Ensure your system has sufficient RAM to keep the hash index in memory.

    8. Use the READUNCOMMITTED isolation level when checking for code existence to reduce locking overhead.

    9. Maintain the hash index regularly (ONLINE if necessary) to keep fragmentation at acceptable levels. This will promote large-size I/O read-ahead when necessary. This is another vote for Enterprise Edition, which supports single I/O requests up to 1MB for read-ahead.

    Paul

    edit: removed huge quoted text

  • Paul White NZ (4/3/2010)


    Everyone else seems to have had a go, so here are my thoughts:

    1. This table probably does not benefit from having a clustered index. If the insertions are essentially random, it will be hurting you a lot. If rows are never deleted, consider turning it into a heap. (Updates can never make a row wider so forwarded records will not be a problem).

    ...

    Why would it help? Reads based on non-clustered index will result in extra page read. Considering that RID is bigger than tinyint, structures of clustered and non-clustered index pages are similar, except leafs, insert will require the same amount of I/O plus writing to a heap page. Storing that small row in a table with only non-clustered index looks like total waste of I/O, disc space and inefficient use of cache. Or am I missing something?

  • stolbovoy (4/4/2010)


    Why would it help?

    The record size of each row as the table is currently defined is 20 bytes.

    Changing to CHAR(8) reduces the record size to 16 bytes. (20% smaller).

    A non-clustered index on the hash value has a record size of 13 bytes (35% smaller).

    Integer values also tend to be faster to compare than strings, and may compress better too.

    Anyway, it's probably worth clearing something up here: My previous post represents a list of ideas to explore, which was the reason that just about every point featured the word 'Consider...'. 😉

    There is not enough detail in the thread to make any firm recommendations, so please don't misread my comments as being intended as in any way definitive. I simply intended to promote constructive discussion.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply