Performance Issue while retrieving data from Very Large Table

  • Hi Everybody,

    I have a table having Five Columns

    CREATE TABLE TABLE_X (

    Col1 INT NOT NULL,

    Col2 varchar(10) NOT NULL,

    Colc Datetime NOT NULL

    )

    PRIMARY KEY CLUSTERED IS ON COL1 & COL2.

    THERE is a seperate index on Col1 also which is non clustered and non unique.

    THERE are about 2.5 Billion Rows and i have to select on the basis of

    ColC i.e.

    SELECT * FROM TABLE_X WHERE

    Colc between '2007-12-01 00:00:00' and '2008-02-29 23:59:59'

    The total number of rows which will be retrieved using this query will be about 20 million.

    In the Execution plan i see that a clustered index scan is performed.

    The time duration which is being taken by this query is around 9 minutes which is not at all acceptable.

    Please guide me how to do the tuning of this query.

    Thanks

    Amit

  • Amit Tiwari (3/14/2008)


    Hi Everybody,

    I have a table having Five Columns

    CREATE TABLE TABLE_X (

    Col1 INT NOT NULL,

    Col2 varchar(10) NOT NULL,

    Colc Datetime NOT NULL

    )

    PRIMARY KEY CLUSTERED IS ON COL1 & COL2.

    THERE is a seperate index on Col1 also which is non clustered and non unique.

    THERE are about 2.5 Billion Rows and i have to select on the basis of

    ColC i.e.

    SELECT * FROM TABLE_X WHERE

    Colc between '2007-12-01 00:00:00' and '2008-02-29 23:59:59'

    The total number of rows which will be retrieved using this query will be about 20 million.

    In the Execution plan i see that a clustered index scan is performed.

    The time duration which is being taken by this query is around 9 minutes which is not at all acceptable.

    Please guide me how to do the tuning of this query.

    Thanks

    Amit

    Don't take this as absolute gospel and you will need to test this on your data, but I'd do two things. First, drop the second index on Col1. If Col1 is the leading edge of your compound clustered index, having a non-clustered index for Col1 is a waste of time & space. Second, add an index to ColC. I know this is sample data and the real world is different, but if there are only the three columns, then I'd also include the other two columns on that index.

    Test, test, test, test, test. And look at the execution plans.

    --update. Saw a typo. I typed "add a column" I meant "add an index"

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Can you switch your Clustered Index to column c? That particular query would run a lot faster if you did.

  • Looking at just that scenario, and ignoring any possibility of other queries, I'd do the following.

    Drop the NC index on Col1. As Grant said, it's not necessary. Make the PK nonclustered and move the clustered index to ColC.

    Do you require all the columns in that query? (ie, do you need to do select *?)

    If not, then as a second possibility, leave the PK as clustered and add a NC index on ColC with the columns you do need as INCLUDE columns. This may be quicker to implement than moving the clustered index which, on 2 billion rows, is not a trivial operation.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yes, having the clustered index on ColC would help. I would suggest dropping the second index on Col1 and recreating your PK as a non-clustered PK. You may want to consider putting the clustered index on ColC. Keep in mind that you will want to plan this as creating a new clustered index on a table of that size may take alot of time. Also, you've only shown us one query that hits this table. I suggest you analyze the impact a clustered index change would have accross all of your frequently used queries on this table.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Oh, and also. 20 million rows is not a trivial amount of data. Index seek or no index seek, it will take time to fetch that data from disk (if it's not in the data buffer) and it will take time to send that data to the client.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply