SUM(COLUMN) with Clustered Index Seek -- Too slow

  • I have a table that has about 70 million records, 130 columns. 36 of the columns are dimensions that the application may group or filter by. The remaining columns are metrics that the application will SUM. SQL is generated dynamically based on user selection. The performance is slow for many of the queries. SQL Server 2005. ALL performance is focussed on pulling FROM this table, we don't care (right now) about load times.

    General setup (right now, we've tried other options but this seems to be the best so far):

    CREATE TABLE [dbo].

    (

    [TABLE_ID] [int] IDENTITY(1,1) NOT NULL,

    [ACCOUNT_ID] [int] NOT NULL,

    ...

    [LOTS OF OTHER COLUMNS]

    ...

    CONSTRAINT [MBR_NORM_PK] PRIMARY KEY NONCLUSTERED ([TABLE_ID] ASC)

    )

    CREATE CLUSTERED INDEX [ixc_table] ON [dbo].

    ([ACCOUNT_ID] ASC)

    SELECT SUM(METRIC_COLUMN)

    FROM dbo.TABLE

    WHERE ACCOUNT_ID = 1

    The issue is that the select statement still takes almost 10 minutes to run. ACCOUNT_ID = 1 makes up 35 million rows. Any other account ID returns relatively fast, with the second largest account holding 8 million rows.

    If I create the index:

    CREATE NONCLUSTERED INDEX [ix_SPECIFIC_INDEX] ON [dbo].

    ([ACCOUNT_ID]) INCLUDE([METRIC_COLUMN])

    then the query time is ~10 seconds. The issue is I can't make an index for each possible combination the application may request. I'm assuming the poor performance is due to the IO caused by running through so many of the rows on such a wide table. Any thoughts/suggestions?

  • You may want to consider creating an indexed view that materializes and pre-aggregates the data.

  • You could also load the data into an OLAP Cube and reprogram the application to query that instead.

  • You could add more columns to the INCLUDE clause. This doesn't increase the size of the root or intermediate index pages, only the leaf level pages. If you don't want to have lots of columns in the INCLUDE, then create further indexes on the Account_ID with a subset of the Metric_Columns.

    Of course the best option is to pre-aggregate with a CUBE.

    Leo

    Is it Friday yet?

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply